基于維基百科的人物關(guān)系抽取研究
[Abstract]:In the research of information extraction, personal-relation extraction is an important research topic. The research on personal-relationship extraction originated from the evaluation project of MUC conference and was replaced by ACE conference. At present, most of the data used in the study of Chinese character relations are structured evaluation materials from ACE conferences or more standardized news materials such as People's Daily. However, in practical applications, especially in the Internet era, people are more and more used to retrieve information from the Internet, such as people, events and so on. Wikipedia is one of the commonly used search engines. Wikipedia is an open knowledge base, which contains a wealth of personal-relationship information. At the same time, it is also a knowledge base which accords with the semi-structured features of network text. Therefore, Wikipedia-based personal-relationship extraction is more similar to real-life personal-relationship extraction. The main idea of personal-relationship extraction is to transform it into personal-relationship classification. The traditional extraction methods are mainly based on knowledge base, machine learning and pattern matching. The methods based on machine learning are mainly classified based on kernel and feature vector. In the process of personal-relationship extraction, the two main difficulties are human name recognition and personal-relationship recognition. In view of the above difficulties, this paper puts forward the corresponding solutions, which have the following innovations: (1) in order to solve the problem that the foreign transliteration recognition rate of the existing word segmentation tools in the human name recognition is not high, In this paper, we use the method of extracting the information box data from Wikipedia, and construct the Chinese character database based on Wikipedia. At the same time, a dictionary of transliteration names of foreign languages based on Chinese Wikipedia is constructed by using the Link data in Wikipedia. (2) in this paper, a hierarchical classification method based on pattern matching and feature vector method is proposed to classify people relationship, and DAG-SVMs multi-value classification method is used to solve the problem of multi-valued classification. In order to improve the execution speed and performance of the classification model, and to introduce the personal relationship into the division of the relationship between people, the phenomenon of "the same person does not have the same name" in Wikipedia can be alleviated. The feasibility of this method is verified by experiments. This paper uses the proposed method to construct a large Wikipedia database of characters and a dictionary of names. At the same time, the experimental results show that the performance of this paper is better in the recognition of personal relationship, especially in the classification of personal relationship and family relationship.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1
【相似文獻】
相關(guān)期刊論文 前3條
1 傅宛菊;陳木蘭;;中國新魔幻電影的類型化初探[J];東南傳播;2014年08期
2 丁海峰;;論電影《海洋天堂》中細節(jié)的運用[J];西部廣播電視;2013年05期
3 ;[J];;年期
相關(guān)會議論文 前3條
1 白勁鵬;;可怕的對稱——論《了不起的蓋茨比》中的主次人物關(guān)系[A];外語語言教學(xué)研究——黑龍江省外國語學(xué)會第十一次學(xué)術(shù)年會論文集[C];1997年
2 黃素影;;《天倫》創(chuàng)作小結(jié)[A];我的角色與我們的劇團——第六屆電影表演藝術(shù)學(xué)會獎文集[C];1997年
3 吳士余;;重視人物關(guān)系的典型化[A];《毛澤東文藝思想研究》第三輯暨全國毛澤東文藝思想研究會第三次年會論文集[C];1983年
相關(guān)重要報紙文章 前6條
1 本報記者 張悅;音樂劇《蝶》推出修排版[N];中國藝術(shù)報;2008年
2 記者 金朝力;網(wǎng)絡(luò)視頻業(yè)首推人臉識別功能[N];北京商報;2010年
3 本文實習(xí)記者 張柳青;紀(jì)念汶川地震一周年[N];中國電影報;2009年
4 許柏林;小成本拍出大境界[N];人民日報;2012年
5 張克丹 綜合整理;青春·理想·奮斗·奉獻[N];中國電影報;2009年
6 上海戲劇學(xué)院副教授 石俊;問號的力量[N];文匯報;2012年
相關(guān)碩士學(xué)位論文 前7條
1 唐丞博;談《追夢時刻》中人物關(guān)系的發(fā)展和變化[D];云南藝術(shù)學(xué)院;2016年
2 劉博佳;基于維基百科的人物關(guān)系抽取研究[D];北京交通大學(xué);2016年
3 潘云;基于中文在線資源的人物關(guān)系抽取研究[D];華東師范大學(xué);2015年
4 徐珊;孫昌涉初期小說的人物關(guān)系和作家意識研究[D];山東大學(xué);2009年
5 顧靜航;基于信息抽取的人物關(guān)系網(wǎng)絡(luò)構(gòu)建研究[D];蘇州大學(xué);2014年
6 許婷;基于話單挖掘的可視化人物關(guān)系分析系統(tǒng)的設(shè)計與實現(xiàn)[D];哈爾濱工業(yè)大學(xué);2014年
7 范少帥;基于特征向量的人物關(guān)系抽取方法研究[D];華東交通大學(xué);2015年
,本文編號:2310833
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2310833.html