天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

一種基于BTM主題模型的命名實(shí)體鏈接方法研究

發(fā)布時(shí)間:2019-01-02 18:36
【摘要】:隨著網(wǎng)絡(luò)資源的不斷膨脹,信息的不斷增多使得人們獲取有價(jià)值的信息變得越來(lái)越困難。而Tweets、微博等短文本的發(fā)展和流行,使得人們更加無(wú)法從中獲取更多感興趣的內(nèi)容,拓展命名實(shí)體條目的歧義問(wèn)題成為研究的重點(diǎn)難點(diǎn),命名實(shí)體鏈接技術(shù)是解決該問(wèn)題的重要方法。命名實(shí)體鏈接是把文檔中給定的命名實(shí)體鏈接到知識(shí)庫(kù)中一個(gè)無(wú)歧義實(shí)體的過(guò)程,包括同義實(shí)體的合并、歧義實(shí)體的消歧等。該技術(shù)可以提升在線推薦系統(tǒng)、互聯(lián)網(wǎng)搜索引擎等實(shí)際應(yīng)用的信息過(guò)濾能力。本文針對(duì)短文本內(nèi)容簡(jiǎn)短、語(yǔ)言隨意不規(guī)范等特性,提出了一種基于BTM主題模型的命名實(shí)體鏈接方法。本文首先使用離線版維基百科來(lái)構(gòu)建命名實(shí)體知識(shí)庫(kù),構(gòu)建同義詞表和歧義詞表。本文使用基于規(guī)則和統(tǒng)計(jì)相結(jié)合的方法,識(shí)別短文本中的命名實(shí)體。由于短文本中出現(xiàn)的命名實(shí)體的多樣性,根據(jù)知識(shí)庫(kù)中的同義詞表進(jìn)行標(biāo)準(zhǔn)化,根據(jù)歧義詞表獲取候選命名實(shí)體集合并根據(jù)命名實(shí)體上下文特性進(jìn)行剪枝,縮減候選實(shí)體集的大小,提高候選實(shí)體排序的效率。本文綜合考慮詞共同出現(xiàn)頻率與單個(gè)出現(xiàn)頻率的情況,改進(jìn)了 MPM詞共現(xiàn)度量只考慮共現(xiàn)頻率而不考慮單個(gè)詞出現(xiàn)頻率情況,來(lái)計(jì)算詞共現(xiàn)程度系數(shù)。其次,本文基于同一文檔下詞與命名實(shí)體具有相似的主題分布的假設(shè),在語(yǔ)義層面對(duì)文檔進(jìn)行建模和實(shí)體消歧,提出了一種基于BTM主題模型的命名實(shí)體鏈接方法。該方法使用基于詞共現(xiàn)程度系數(shù)的BTM模型來(lái)對(duì)命名實(shí)體語(yǔ)義建模,并使用了吉普斯采樣的方法求解參數(shù),這使得模型更加簡(jiǎn)單準(zhǔn)確,為后續(xù)處理數(shù)據(jù)提供了理論基礎(chǔ)。最后本文根據(jù)命名實(shí)體所在主題空間的位置向量與候選實(shí)體的余弦相似度,把給定文本中的命名實(shí)體鏈接到知識(shí)庫(kù)中一個(gè)無(wú)歧義的命名實(shí)體。
[Abstract]:With the expansion of network resources, the increasing of information makes it more and more difficult for people to obtain valuable information. However, with the development and popularity of short texts such as Tweets, Weibo, people are unable to get more interesting content from them, and it becomes a key and difficult point to study the ambiguity of named entity items. Named entity linking is an important method to solve this problem. Named entity link is the process of linking a given named entity in a document to an unambiguous entity in the knowledge base, including the merging of synonymous entities, disambiguation of ambiguous entities, and so on. This technology can improve the information filtering ability of online recommendation system, Internet search engine and other practical applications. In this paper, a named entity linking method based on BTM subject model is proposed for short text, which is short in content and random in language. In this paper, we first use offline Wikipedia to construct named entity knowledge base, synonym table and ambiguous lexicon. This paper uses a rule-based and statistical approach to identify named entities in short text. Because of the diversity of named entities in short text, the synonyms in the knowledge base are standardized, the candidate named entity collections are obtained from ambiguous word tables and pruned according to the context characteristics of named entities. Reduce the size of candidate entity set and improve the efficiency of candidate entity sorting. In this paper, the co-occurrence frequency and the single occurrence frequency of words are considered synthetically, and the MPM word co-occurrence measure is improved to calculate the cooccurrence degree coefficient by only considering the co-occurrence frequency and not considering the occurrence frequency of a single word. Secondly, based on the assumption that the words in the same document have similar topic distribution with named entities, this paper models and disambiguates the documents at the semantic level, and proposes a named entity linking method based on BTM topic model. This method uses BTM model based on cooccurrence coefficient to model named entity semantics, and uses Gyibug sampling method to solve parameters, which makes the model more simple and accurate, and provides a theoretical basis for the subsequent data processing. Finally, according to the cosine similarity between the location vector of the named entity and the candidate entity, the named entity in the given text is linked to an unambiguous named entity in the knowledge base.
【學(xué)位授予單位】:大連海事大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 向宇;郭云龍;徐瀟;曾維剛;李莉;;多策略中文微博實(shí)體詞消歧及實(shí)體鏈接[J];計(jì)算機(jī)應(yīng)用與軟件;2016年08期

2 陳玉博;何世柱;劉康;趙軍;呂學(xué)強(qiáng);;融合多種特征的實(shí)體鏈接技術(shù)研究[J];中文信息學(xué)報(bào);2016年04期

3 譚詠梅;王睿;李茂林;;基于上下文信息和排序?qū)W習(xí)的實(shí)體鏈接方法[J];北京郵電大學(xué)學(xué)報(bào);2015年05期

4 楊光;劉秉權(quán);劉銘;;基于圖方法的命名實(shí)體消歧[J];智能計(jì)算機(jī)與應(yīng)用;2015年05期

5 王慶;陳澤亞;郭靜;陳晰;王晶華;;基于詞共現(xiàn)矩陣的項(xiàng)目關(guān)鍵詞詞庫(kù)和關(guān)鍵詞語(yǔ)義網(wǎng)絡(luò)[J];計(jì)算機(jī)應(yīng)用;2015年06期

6 昝紅英;吳泳鋼;賈玉祥;牛桂玲;;基于多源知識(shí)的中文微博命名實(shí)體鏈接[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2015年07期

7 譚詠梅;楊雪;;結(jié)合實(shí)體鏈接與實(shí)體聚類的命名實(shí)體消歧[J];北京郵電大學(xué)學(xué)報(bào);2014年05期

8 懷寶興;寶騰飛;祝恒書;劉淇;;一種基于概率主題模型的命名實(shí)體鏈接方法[J];軟件學(xué)報(bào);2014年09期

9 魏強(qiáng);金芝;許焱;;基于概率主題模型的物聯(lián)網(wǎng)服務(wù)發(fā)現(xiàn)[J];軟件學(xué)報(bào);2014年08期

10 肖智博;車豐;吳鏑;李慶豐;魯明羽;;查詢無(wú)關(guān)排序主題模型[J];模式識(shí)別與人工智能;2014年07期

相關(guān)博士學(xué)位論文 前1條

1 郭宇航;基于上下文的實(shí)體鏈指技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2014年

相關(guān)碩士學(xué)位論文 前5條

1 王睿;實(shí)體鏈接的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2015年

2 薛昊原;領(lǐng)域文本資源實(shí)體鏈接算法研究[D];鄭州大學(xué);2015年

3 郭云龍;微博實(shí)體與百科條目鏈接的多策略研究[D];西南大學(xué);2015年

4 楊雪;基于維基百科的命名實(shí)體消歧的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2014年

5 官山山;中文微博實(shí)體鏈接方法研究[D];哈爾濱工業(yè)大學(xué);2013年

,

本文編號(hào):2398838

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2398838.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶875f4***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com