基于維基百科的實(shí)體鏈接算法研究及系統(tǒng)實(shí)現(xiàn)
[Abstract]:The Internet enters the information explosion age, the information quantity is huge, the manifestation is diverse, the information is complex. How to get the information that users need from a large amount of information is an urgent problem to be solved. However, there is widespread ambiguity in natural languages. Entity ambiguity refers to the linguistic phenomenon in which the same entity refers to different real world entities in different contexts. Disambiguation of entities can help to better understand text information, and entity links are the right links to the corresponding entities in the knowledge base by linking pages, Weibo or the names of people, places and institutions in the dialogue. To solve the problem of entity disambiguation of synonym and polysemy, it is of great significance for information retrieval, automatic question and answer and complete knowledge base. Aiming at the core problem of entity link, the candidate entity ranking of entity reference is studied in this paper. The main work and innovation of this paper are summarized as follows: 1. A candidate entity ranking algorithm combining LDA and restarting random walk and a candidate entity ranking algorithm combining Word2Vec and PageRank are proposed to effectively improve the accuracy of entity link. The traditional candidate entity ranking algorithm often stays at the stage of feature extraction, and needs to extract a large number of features, and then training by supervised learning is very cumbersome, and its features are often some shallow features, such as the similarity of strings. Ignoring the semantic similarity between entities, this paper uses the link structure in entity Wikipedia, considering that entities under the same subject will link together, and entities that are more semantically relevant will be linked together. In order to solve this problem, this paper proposes a candidate entity ranking algorithm that combines LDA and reboot random walk, and a candidate entity ranking algorithm that combines Word2Vec and PageRank. Both algorithms utilize the graph structure of Wikipedia where the entity is located. The reboot random walk results in the vector of each candidate entity, and the PR value of each candidate entity is obtained by PageRank. The former incorporates the feature vector of the entity on the subject, and the latter integrates the semantic similarity between the entity and the entity. Both of them add semantic features to the graph model. The experimental results show that compared with the mainstream candidate entity ranking algorithm, the accuracy of entity link is improved. 2. Combined with two candidate entity ranking algorithms, an entity link system (LEL,) is developed. The system can link the entities in the text to the Wikipedia knowledge base and has strong interaction.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳斌;;結(jié)構(gòu)化實(shí)體圖——E-R方法的增強(qiáng)[J];計(jì)算機(jī)科學(xué);1986年06期
2 龐正剛;;在Auto CAD中繪制相交線的新方法[J];重慶工貿(mào)職業(yè)技術(shù)學(xué)院學(xué)報(bào);2006年02期
3 李灶福,李曉蘭,鄧小紅,包晨陽(yáng);關(guān)于Auto CAD中將三維實(shí)體圖轉(zhuǎn)換成平面三視圖的探討[J];機(jī)床與液壓;2003年03期
4 榮英;譚國(guó)萍;;CAD快速繪制組合體三維實(shí)體圖的方法和技巧[J];九江學(xué)院學(xué)報(bào)(自然科學(xué)版);2013年03期
5 J Miguel Gerlso;張勤勇;;TM——一適合CAD和所要求的數(shù)據(jù)庫(kù)功能的面向?qū)嶓w語(yǔ)言[J];國(guó)外導(dǎo)彈與航天運(yùn)載器;1989年08期
6 焦泉忠;;NX5實(shí)體圖與CAXA2007工程圖轉(zhuǎn)換[J];金屬加工(冷加工);2013年02期
7 范力軍;圖形變量化的實(shí)現(xiàn)技術(shù)[J];工程設(shè)計(jì)CAD與智能建筑;1999年11期
8 王斌;;CAD三維實(shí)體解決復(fù)雜形體看圖問題[J];實(shí)驗(yàn)室科學(xué);2007年03期
9 楊長(zhǎng)青;;AutoCAD三維實(shí)體教學(xué)體會(huì)[J];科技信息;2010年32期
10 徐景輝;苑偉政;常洪龍;謝建兵;;一種新型三維實(shí)體到標(biāo)準(zhǔn)工藝版圖的轉(zhuǎn)換方法[J];傳感技術(shù)學(xué)報(bào);2006年05期
相關(guān)博士學(xué)位論文 前1條
1 吳建華;矢量空間數(shù)據(jù)實(shí)體匹配方法與應(yīng)用研究[D];武漢大學(xué);2008年
相關(guān)碩士學(xué)位論文 前5條
1 薛昊原;領(lǐng)域文本資源實(shí)體鏈接算法研究[D];鄭州大學(xué);2015年
2 朱燦;實(shí)體解析技術(shù)研究與應(yīng)用[D];上海交通大學(xué);2015年
3 羅念;基于維基百科的實(shí)體鏈接算法研究及系統(tǒng)實(shí)現(xiàn)[D];華東師范大學(xué);2016年
4 何峰權(quán);基于屬性模式的實(shí)體識(shí)別框架[D];哈爾濱工業(yè)大學(xué);2013年
5 王瑋;從可比語(yǔ)料中抽取等價(jià)實(shí)體翻譯對(duì)的研究[D];哈爾濱工業(yè)大學(xué);2014年
,本文編號(hào):2271020
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2271020.html