一種基于BTM主題模型的命名實體鏈接方法研究

發(fā)布時間：2019-01-02 18:36

【摘要】：隨著網(wǎng)絡(luò)資源的不斷膨脹,信息的不斷增多使得人們獲取有價值的信息變得越來越困難。而Tweets、微博等短文本的發(fā)展和流行,使得人們更加無法從中獲取更多感興趣的內(nèi)容,拓展命名實體條目的歧義問題成為研究的重點難點,命名實體鏈接技術(shù)是解決該問題的重要方法。命名實體鏈接是把文檔中給定的命名實體鏈接到知識庫中一個無歧義實體的過程,包括同義實體的合并、歧義實體的消歧等。該技術(shù)可以提升在線推薦系統(tǒng)、互聯(lián)網(wǎng)搜索引擎等實際應(yīng)用的信息過濾能力。本文針對短文本內(nèi)容簡短、語言隨意不規(guī)范等特性,提出了一種基于BTM主題模型的命名實體鏈接方法。本文首先使用離線版維基百科來構(gòu)建命名實體知識庫,構(gòu)建同義詞表和歧義詞表。本文使用基于規(guī)則和統(tǒng)計相結(jié)合的方法,識別短文本中的命名實體。由于短文本中出現(xiàn)的命名實體的多樣性,根據(jù)知識庫中的同義詞表進行標(biāo)準(zhǔn)化,根據(jù)歧義詞表獲取候選命名實體集合并根據(jù)命名實體上下文特性進行剪枝,縮減候選實體集的大小,提高候選實體排序的效率。本文綜合考慮詞共同出現(xiàn)頻率與單個出現(xiàn)頻率的情況,改進了 MPM詞共現(xiàn)度量只考慮共現(xiàn)頻率而不考慮單個詞出現(xiàn)頻率情況,來計算詞共現(xiàn)程度系數(shù)。其次,本文基于同一文檔下詞與命名實體具有相似的主題分布的假設(shè),在語義層面對文檔進行建模和實體消歧,提出了一種基于BTM主題模型的命名實體鏈接方法。該方法使用基于詞共現(xiàn)程度系數(shù)的BTM模型來對命名實體語義建模,并使用了吉普斯采樣的方法求解參數(shù),這使得模型更加簡單準(zhǔn)確,為后續(xù)處理數(shù)據(jù)提供了理論基礎(chǔ)。最后本文根據(jù)命名實體所在主題空間的位置向量與候選實體的余弦相似度,把給定文本中的命名實體鏈接到知識庫中一個無歧義的命名實體。
[Abstract]:With the expansion of network resources, the increasing of information makes it more and more difficult for people to obtain valuable information. However, with the development and popularity of short texts such as Tweets, Weibo, people are unable to get more interesting content from them, and it becomes a key and difficult point to study the ambiguity of named entity items. Named entity linking is an important method to solve this problem. Named entity link is the process of linking a given named entity in a document to an unambiguous entity in the knowledge base, including the merging of synonymous entities, disambiguation of ambiguous entities, and so on. This technology can improve the information filtering ability of online recommendation system, Internet search engine and other practical applications. In this paper, a named entity linking method based on BTM subject model is proposed for short text, which is short in content and random in language. In this paper, we first use offline Wikipedia to construct named entity knowledge base, synonym table and ambiguous lexicon. This paper uses a rule-based and statistical approach to identify named entities in short text. Because of the diversity of named entities in short text, the synonyms in the knowledge base are standardized, the candidate named entity collections are obtained from ambiguous word tables and pruned according to the context characteristics of named entities. Reduce the size of candidate entity set and improve the efficiency of candidate entity sorting. In this paper, the co-occurrence frequency and the single occurrence frequency of words are considered synthetically, and the MPM word co-occurrence measure is improved to calculate the cooccurrence degree coefficient by only considering the co-occurrence frequency and not considering the occurrence frequency of a single word. Secondly, based on the assumption that the words in the same document have similar topic distribution with named entities, this paper models and disambiguates the documents at the semantic level, and proposes a named entity linking method based on BTM topic model. This method uses BTM model based on cooccurrence coefficient to model named entity semantics, and uses Gyibug sampling method to solve parameters, which makes the model more simple and accurate, and provides a theoretical basis for the subsequent data processing. Finally, according to the cosine similarity between the location vector of the named entity and the candidate entity, the named entity in the given text is linked to an unambiguous named entity in the knowledge base.
【學(xué)位授予單位】：大連海事大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.1

【參考文獻】

相關(guān)期刊論文前10條

1 向宇;郭云龍;徐瀟;曾維剛;李莉;;多策略中文微博實體詞消歧及實體鏈接[J];計算機應(yīng)用與軟件;2016年08期

2 陳玉博;何世柱;劉康;趙軍;呂學(xué)強;;融合多種特征的實體鏈接技術(shù)研究[J];中文信息學(xué)報;2016年04期

3 譚詠梅;王睿;李茂林;;基于上下文信息和排序?qū)W習(xí)的實體鏈接方法[J];北京郵電大學(xué)學(xué)報;2015年05期

4 楊光;劉秉權(quán);劉銘;;基于圖方法的命名實體消歧[J];智能計算機與應(yīng)用;2015年05期

5 王慶;陳澤亞;郭靜;陳晰;王晶華;;基于詞共現(xiàn)矩陣的項目關(guān)鍵詞詞庫和關(guān)鍵詞語義網(wǎng)絡(luò)[J];計算機應(yīng)用;2015年06期

6 昝紅英;吳泳鋼;賈玉祥;牛桂玲;;基于多源知識的中文微博命名實體鏈接[J];山東大學(xué)學(xué)報(理學(xué)版);2015年07期

7 譚詠梅;楊雪;;結(jié)合實體鏈接與實體聚類的命名實體消歧[J];北京郵電大學(xué)學(xué)報;2014年05期

8 懷寶興;寶騰飛;祝恒書;劉淇;;一種基于概率主題模型的命名實體鏈接方法[J];軟件學(xué)報;2014年09期

9 魏強;金芝;許焱;;基于概率主題模型的物聯(lián)網(wǎng)服務(wù)發(fā)現(xiàn)[J];軟件學(xué)報;2014年08期

10 肖智博;車豐;吳鏑;李慶豐;魯明羽;;查詢無關(guān)排序主題模型[J];模式識別與人工智能;2014年07期

相關(guān)博士學(xué)位論文前1條

1 郭宇航;基于上下文的實體鏈指技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2014年

相關(guān)碩士學(xué)位論文前5條

1 王睿;實體鏈接的研究與實現(xiàn)[D];北京郵電大學(xué);2015年

2 薛昊原;領(lǐng)域文本資源實體鏈接算法研究[D];鄭州大學(xué);2015年

3 郭云龍;微博實體與百科條目鏈接的多策略研究[D];西南大學(xué);2015年

4 楊雪;基于維基百科的命名實體消歧的研究與實現(xiàn)[D];北京郵電大學(xué);2014年

5 官山山;中文微博實體鏈接方法研究[D];哈爾濱工業(yè)大學(xué);2013年

，

本文編號：2398838

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2398838.html

上一篇：基于J2EE的地市級煙草專賣市場監(jiān)管信息系統(tǒng)的設(shè)計與實現(xiàn)
下一篇：Web搜索引擎:檢索技術(shù)、存在問題及改進辦法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

一種基于BTM主題模型的命名實體鏈接方法研究