基于信息抽取的個(gè)性化校園日歷系統(tǒng)的研究

發(fā)布時(shí)間：2018-08-14 14:03

【摘要】：伴隨著互聯(lián)網(wǎng)的飛速發(fā)展,信息數(shù)據(jù)也隨之越來越多樣化和復(fù)雜化,這也給用戶在查詢信息時(shí)帶來了很多的不便。如何從每天不斷涌現(xiàn)的大量的數(shù)據(jù)中提取出需要的信息的也成為自然語言處理研究的重點(diǎn)。而本文研究的信息抽取技術(shù)應(yīng)運(yùn)而生,將大量無序、不規(guī)則的信息抽取出來并結(jié)構(gòu)化存儲(chǔ),對(duì)推動(dòng)信息技術(shù)的發(fā)展具有重要作用。本文的特色是研究了以事件和時(shí)間為中心的信息抽取技術(shù),并且設(shè)計(jì)和實(shí)現(xiàn)了個(gè)性化校園日歷系統(tǒng)。主要?jiǎng)?chuàng)新點(diǎn)和研究成果如下：首先,設(shè)計(jì)和實(shí)現(xiàn)了一種將規(guī)則和統(tǒng)計(jì)模型相結(jié)合的中文實(shí)體關(guān)系抽取算法,該方法利用正則表達(dá)式抽取出準(zhǔn)確結(jié)果,采用條件隨機(jī)場(chǎng)模型和最大熵模型相結(jié)合的機(jī)器學(xué)習(xí)方法給出補(bǔ)充結(jié)果,提高了準(zhǔn)確率和召回率。該方法在TAC-KBP評(píng)測(cè)的SlotFilling任務(wù)中取得了較好的效果。其次,提出并設(shè)計(jì)實(shí)現(xiàn)了個(gè)性化校園日歷系統(tǒng),該系統(tǒng)在抽取事件信息的同時(shí)對(duì)事件中的時(shí)間信息進(jìn)行整理,為人們?nèi)媪私馐录峁┝司€索。此系統(tǒng)采用基于規(guī)則的方法抽取了文本信息中的時(shí)間表達(dá)式并對(duì)其進(jìn)行歸一化處理。在此基礎(chǔ)上,提出詞激活力模型的事件起止時(shí)間表達(dá)式的識(shí)別方法。事件的起止時(shí)間對(duì)于了解事件的發(fā)展進(jìn)程提供了更多的信息。該系統(tǒng)已經(jīng)在校園實(shí)體搜索引擎系統(tǒng)COSE中成功應(yīng)用并上線。第三,提出一種基于WAF的情感傾向詞表擴(kuò)展方法以及基于機(jī)器學(xué)習(xí)的文本的情感傾向性判斷方法。該方法在2011COAE評(píng)測(cè)的任務(wù)一觀點(diǎn)詞抽取與傾向性判斷的問題解決上取得較好成績(jī)。該算法模型為校園日歷系統(tǒng)添加了情感傾向性判斷功能。該功能可進(jìn)一步應(yīng)用于校園輿情監(jiān)控。
[Abstract]:With the rapid development of the Internet, the information data is becoming more and more diversified and complicated, which also brings a lot of inconvenience to users in querying information. How to extract the needed information from a large number of daily data has also become the focus of natural language processing. The technology of information extraction which is studied in this paper arises as the times require. A large amount of disordered and irregular information is extracted out and stored structurally, which plays an important role in promoting the development of information technology. The feature of this paper is to study the information extraction technology with event and time as the center, and design and implement the personalized campus calendar system. The main innovations and research results are as follows: firstly, a Chinese entity relation extraction algorithm combining rule and statistical model is designed and implemented. The machine learning method combined with conditional random field model and maximum entropy model is used to give the supplementary results, which improves the accuracy and recall rate. This method has achieved good results in the SlotFilling task evaluated by TAC-KBP. Secondly, a personalized campus calendar system is proposed and implemented. The system extracts the event information and collates the time information of the event, which provides a clue for people to understand the event comprehensively. In this system, the time expression of text information is extracted and normalized by rule-based method. On the basis of this, a method of identifying the expression of event start and end time based on word activation force model is proposed. The timing of events provides more information about the evolution of events. The system has been successfully applied in the campus entity search engine system COSE. Thirdly, an extension method of affective propensity lexicon based on WAF and a method to judge the affective tendency of text based on machine learning are proposed. This method has achieved good results in the task-viewpoint word extraction and tendency judgment of 2011COAE evaluation. The algorithm model adds the function of emotional orientation judgment for the campus calendar system. This function can be further applied to the monitoring of campus public opinion.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 劉克彬;李芳;劉磊;韓穎;;基于核函數(shù)中文關(guān)系自動(dòng)抽取系統(tǒng)的實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2007年08期

2 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計(jì)算機(jī)工程與應(yīng)用;2003年10期

3 張曉艷;王挺;陳火旺;;命名實(shí)體識(shí)別研究[J];計(jì)算機(jī)科學(xué);2005年04期

4 鄧擘;樊孝忠;楊立公;;用語義模式提取實(shí)體關(guān)系的方法[J];計(jì)算機(jī)工程;2007年10期

5 劉遷;焦慧;賈惠波;;信息抽取技術(shù)的發(fā)展現(xiàn)狀及構(gòu)建方法的研究[J];計(jì)算機(jī)應(yīng)用研究;2007年07期

6 車萬翔,劉挺,李生;實(shí)體關(guān)系自動(dòng)抽取[J];中文信息學(xué)報(bào);2005年02期

7 孫茂松，黃昌寧，，高海燕，方捷;中文姓名的自動(dòng)辨識(shí)[J];中文信息學(xué)報(bào);1995年02期

8 張小衡,王玲玲;中文機(jī)構(gòu)名稱的識(shí)別與分析[J];中文信息學(xué)報(bào);1997年04期

相關(guān)博士學(xué)位論文前1條

1 張素香;信息抽取中關(guān)鍵技術(shù)的研究[D];北京郵電大學(xué);2007年

本文編號(hào)：2183091

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2183091.html

上一篇：CALIS三期吉林省中心共享域平臺(tái)建設(shè)
下一篇：基于元搜索引擎的危機(jī)信息監(jiān)控系統(tǒng)的研究與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于信息抽取的個(gè)性化校園日歷系統(tǒng)的研究