漢越雙語語料庫建設(shè)及事件圖抽取方法研究
本文關(guān)鍵詞:漢越雙語語料庫建設(shè)及事件圖抽取方法研究 出處:《昆明理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 越南語 事件抽取 事件元素抽取 共指關(guān)系抽取 事件圖
【摘要】:新聞中的事件抽取是信息抽取的重要研究任務(wù)之一,其主要目標(biāo)是抽取出文本中蘊(yùn)含的事件。尤其是越南語新聞的信息抽取,對處理好與越南的國際關(guān)系對區(qū)域經(jīng)濟(jì)發(fā)展、政治穩(wěn)定有重要作用。一般來說,一篇新聞是由新聞文本中的多個事件組成的。在人們從新聞獲取信息的過程中,人們除了獲取新聞描述的多個子事件之外,還需要獲取到這些事件之間的關(guān)聯(lián)關(guān)系。這些關(guān)聯(lián)關(guān)系同樣是新聞的重要信息。因此,如何借助事件抽取來獲得事件及事件間的關(guān)聯(lián)關(guān)系顯得至關(guān)重要。本文針對漢越雙語新聞事件抽取這一問題,圍繞漢越雙語新聞?wù)Z料構(gòu)建、漢越事件抽取、漢越雙語事件圖構(gòu)建等問題展開深入研究,完成了以下特色研究工作:(1)構(gòu)建了漢越雙語新聞?wù)Z料庫。針對漢越新聞分析及事件抽取的需求,定義了語料標(biāo)注的內(nèi)容,包括事件描述,事件要素,事件時間關(guān)系、事件共指關(guān)系及跨語言事件對齊關(guān)系等要素。收集了 508篇漢越雙語新聞,采用XML語言進(jìn)行了語料標(biāo)注。為接下來的漢越雙語事件抽取及漢越雙語事件圖構(gòu)建提供重要支撐。(2)實(shí)現(xiàn)了基于機(jī)器學(xué)習(xí)和規(guī)則相結(jié)合的事件抽取方法。首先,選擇詞和詞性、上下文的詞及詞性、語義特征等特征,并將漢語事件識別結(jié)果作為指導(dǎo)特征融入越南語事件識別中,采用支持向量機(jī)訓(xùn)練事件識別模型,識別事件觸發(fā)詞。然后,根據(jù)漢語及越南語的語法句法規(guī)律,定義不同語法結(jié)構(gòu)的事件元素抽取規(guī)則,根據(jù)規(guī)則匹配抽取事件元素。最后,定義事件元素類型消解規(guī)則,通過規(guī)則匹配實(shí)現(xiàn)事件元素類型消解。對不符合事件元素類型消解規(guī)則的事件元素,通過與事件類型的詞義集進(jìn)行相似度計(jì)算來實(shí)現(xiàn)事件元素類型消解。實(shí)驗(yàn)結(jié)果表明提出的方法成功的提高了越南語事件抽取的效果。(3)提出了基于事件及事件間關(guān)聯(lián)關(guān)系的雙語事件圖構(gòu)建方法。首先,利用支持向量機(jī)模型抽取事件之間的共指關(guān)系及時間關(guān)系。然后,以事件為節(jié)點(diǎn),以事件間的關(guān)聯(lián)關(guān)系作為邊,構(gòu)建融合事件共指關(guān)系及時間關(guān)系的漢越雙語事件圖。最后,借鑒PageRank算法思想求解有向圖中節(jié)點(diǎn)的權(quán)重,實(shí)現(xiàn)對漢越雙語事件排序。實(shí)現(xiàn)雙語事件圖構(gòu)建表征漢越新聞。(4)利用上述研究成果,設(shè)計(jì)了漢越雙語新聞事件圖抽取原型系統(tǒng)。實(shí)現(xiàn)漢越雙語事件圖抽取。
[Abstract]:Event extraction in news is one of the important research tasks of information extraction. Its main goal is to extract the events contained in the text, especially the information extraction of Vietnamese news. It plays an important role in regional economic development and political stability in dealing with the international relations with Vietnam. Generally speaking, a news article is composed of many events in a news text, and in the process of people getting information from news. In addition to obtaining multiple sub-events of news description, people also need to obtain the relationships between these events. These relationships are also important information of news. It is very important to obtain the relationship between events and events by means of event extraction. This paper focuses on the construction of Chinese-Vietnamese bilingual news corpus and the extraction of Sino-Vietnamese events in view of the problem of Chinese-Vietnamese bilingual news event extraction. The construction of Chinese-Vietnamese bilingual event map has been deeply studied, and the following research work has been completed: 1) the Chinese-Vietnamese bilingual news corpus has been constructed to meet the needs of Chinese-Vietnamese news analysis and event extraction. The contents of corpus tagging are defined, including event description, event elements, event time relationship, event co-referential relation and cross-language event alignment relationship. 508 Chinese-Vietnamese bilingual news articles are collected. XML language is used to annotate the corpus, which provides important support for the next Chinese-Vietnamese bilingual event extraction and Chinese-Vietnamese bilingual event map construction. An event extraction method based on the combination of machine learning and rules is implemented. First of all. The features of words and parts of speech, words and parts of speech of context, semantic features are selected, and the results of Chinese event recognition are integrated into Vietnamese event recognition. Support vector machine (SVM) is used to train event recognition model. Then, according to the syntax rules of Chinese and Vietnamese, the extraction rules of event elements with different syntactic structures are defined, and the event elements are extracted according to the matching rules. Define event element type resolution rules and implement event element type resolution by rule matching. For event elements that do not conform to event element type resolution rules. The result of experiment shows that the proposed method can improve the effect of Vietnamese event extraction successfully by calculating the similarity with the semantic set of the event type to achieve the resolution of the event element type. A bilingual event graph construction method based on event and event correlation relationship is proposed. First of all. The support vector machine (SVM) model is used to extract the co-referential relation and time relationship between events, and then, the event is taken as the node and the correlation relationship between events is taken as the edge. Construct the Chinese-Vietnamese bilingual event graph combining event coreference relation and time relationship. Finally, use the idea of PageRank algorithm to solve the weight of nodes in directed graph. To achieve the ranking of Sino-Vietnamese bilingual events. To achieve bilingual event map construction representation of Sino-Vietnamese news. 4) to use the above research results. A prototype system of Chinese-Vietnamese bilingual news event map extraction is designed, and the Chinese and Vietnamese bilingual event map extraction system is implemented.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周晶晶;周楓;嚴(yán)馨;;基于依存樹的越南語新聞事件元素抽取[J];計(jì)算機(jī)工程與設(shè)計(jì);2016年08期
2 李發(fā)杰;余正濤;郭劍毅;李英;周蘭江;;借助漢-越雙語詞對齊語料構(gòu)建越南語依存樹庫[J];中文信息學(xué)報;2015年06期
3 徐霞;李培峰;朱巧明;;半監(jiān)督中文事件抽取中的模板過濾和轉(zhuǎn)換方法[J];計(jì)算機(jī)科學(xué);2015年02期
4 徐霞;李培峰;鄭新;朱巧明;;面向半監(jiān)督中文事件抽取的事件推理方法[J];山東大學(xué)學(xué)報(理學(xué)版);2014年12期
5 趙丹;;SVM核函數(shù)與選擇算法[J];數(shù)字技術(shù)與應(yīng)用;2014年09期
6 孟光勝;趙志宇;;基于兩層主動學(xué)習(xí)策略的SVM分類方法[J];河南師范大學(xué)學(xué)報(自然科學(xué)版);2014年02期
7 王健;吳雨;林鴻飛;楊志豪;;基于深層句法分析的生物事件觸發(fā)詞抽取[J];計(jì)算機(jī)工程;2014年01期
8 楊爾弘;曾青青;李婷婷;;事件信息結(jié)構(gòu)分析[J];中文信息學(xué)報;2012年03期
9 王偉;趙東巖;;中文新聞事件本體建模與自動擴(kuò)充[J];計(jì)算機(jī)工程與科學(xué);2012年04期
10 趙江江;秦兵;;基于BootStrapping的中文事件元素抽取系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[J];智能計(jì)算機(jī)與應(yīng)用;2012年01期
相關(guān)會議論文 前1條
1 周強(qiáng);王俊俊;陳麗歐;;構(gòu)建大規(guī)模的漢語事件知識庫[A];中國計(jì)算語言學(xué)研究前沿進(jìn)展(2009-2011)[C];2011年
相關(guān)博士學(xué)位論文 前1條
1 譚紅葉;中文事件抽取關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2008年
相關(guān)碩士學(xué)位論文 前3條
1 黃媛;中文事件論元抽取研究[D];蘇州大學(xué);2014年
2 潘清清;越南語新聞事件元素抽取方法研究[D];昆明理工大學(xué);2014年
3 趙妍妍;中文事件抽取的相關(guān)技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2007年
,本文編號:1430378
本文鏈接:http://sikaile.net/jingjilunwen/quyujingjilunwen/1430378.html