一種基于BTM主題模型的命名實(shí)體鏈接方法研究
[Abstract]:With the expansion of network resources, the increasing of information makes it more and more difficult for people to obtain valuable information. However, with the development and popularity of short texts such as Tweets, Weibo, people are unable to get more interesting content from them, and it becomes a key and difficult point to study the ambiguity of named entity items. Named entity linking is an important method to solve this problem. Named entity link is the process of linking a given named entity in a document to an unambiguous entity in the knowledge base, including the merging of synonymous entities, disambiguation of ambiguous entities, and so on. This technology can improve the information filtering ability of online recommendation system, Internet search engine and other practical applications. In this paper, a named entity linking method based on BTM subject model is proposed for short text, which is short in content and random in language. In this paper, we first use offline Wikipedia to construct named entity knowledge base, synonym table and ambiguous lexicon. This paper uses a rule-based and statistical approach to identify named entities in short text. Because of the diversity of named entities in short text, the synonyms in the knowledge base are standardized, the candidate named entity collections are obtained from ambiguous word tables and pruned according to the context characteristics of named entities. Reduce the size of candidate entity set and improve the efficiency of candidate entity sorting. In this paper, the co-occurrence frequency and the single occurrence frequency of words are considered synthetically, and the MPM word co-occurrence measure is improved to calculate the cooccurrence degree coefficient by only considering the co-occurrence frequency and not considering the occurrence frequency of a single word. Secondly, based on the assumption that the words in the same document have similar topic distribution with named entities, this paper models and disambiguates the documents at the semantic level, and proposes a named entity linking method based on BTM topic model. This method uses BTM model based on cooccurrence coefficient to model named entity semantics, and uses Gyibug sampling method to solve parameters, which makes the model more simple and accurate, and provides a theoretical basis for the subsequent data processing. Finally, according to the cosine similarity between the location vector of the named entity and the candidate entity, the named entity in the given text is linked to an unambiguous named entity in the knowledge base.
【學(xué)位授予單位】:大連海事大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 向宇;郭云龍;徐瀟;曾維剛;李莉;;多策略中文微博實(shí)體詞消歧及實(shí)體鏈接[J];計(jì)算機(jī)應(yīng)用與軟件;2016年08期
2 陳玉博;何世柱;劉康;趙軍;呂學(xué)強(qiáng);;融合多種特征的實(shí)體鏈接技術(shù)研究[J];中文信息學(xué)報(bào);2016年04期
3 譚詠梅;王睿;李茂林;;基于上下文信息和排序?qū)W習(xí)的實(shí)體鏈接方法[J];北京郵電大學(xué)學(xué)報(bào);2015年05期
4 楊光;劉秉權(quán);劉銘;;基于圖方法的命名實(shí)體消歧[J];智能計(jì)算機(jī)與應(yīng)用;2015年05期
5 王慶;陳澤亞;郭靜;陳晰;王晶華;;基于詞共現(xiàn)矩陣的項(xiàng)目關(guān)鍵詞詞庫(kù)和關(guān)鍵詞語(yǔ)義網(wǎng)絡(luò)[J];計(jì)算機(jī)應(yīng)用;2015年06期
6 昝紅英;吳泳鋼;賈玉祥;牛桂玲;;基于多源知識(shí)的中文微博命名實(shí)體鏈接[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2015年07期
7 譚詠梅;楊雪;;結(jié)合實(shí)體鏈接與實(shí)體聚類的命名實(shí)體消歧[J];北京郵電大學(xué)學(xué)報(bào);2014年05期
8 懷寶興;寶騰飛;祝恒書;劉淇;;一種基于概率主題模型的命名實(shí)體鏈接方法[J];軟件學(xué)報(bào);2014年09期
9 魏強(qiáng);金芝;許焱;;基于概率主題模型的物聯(lián)網(wǎng)服務(wù)發(fā)現(xiàn)[J];軟件學(xué)報(bào);2014年08期
10 肖智博;車豐;吳鏑;李慶豐;魯明羽;;查詢無(wú)關(guān)排序主題模型[J];模式識(shí)別與人工智能;2014年07期
相關(guān)博士學(xué)位論文 前1條
1 郭宇航;基于上下文的實(shí)體鏈指技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前5條
1 王睿;實(shí)體鏈接的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2015年
2 薛昊原;領(lǐng)域文本資源實(shí)體鏈接算法研究[D];鄭州大學(xué);2015年
3 郭云龍;微博實(shí)體與百科條目鏈接的多策略研究[D];西南大學(xué);2015年
4 楊雪;基于維基百科的命名實(shí)體消歧的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2014年
5 官山山;中文微博實(shí)體鏈接方法研究[D];哈爾濱工業(yè)大學(xué);2013年
,本文編號(hào):2398838
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2398838.html