對象檢索中的實體信息查詢擴展算法研究
發(fā)布時間:2019-07-10 08:34
【摘要】:本文主要研究了對象檢索中的實體信息擴展算法,現(xiàn)如今對于信息的需求已經(jīng)逐漸從較為模糊的網(wǎng)頁檢索演進為對象檢索,帶動實體信息抽取成為最核心的技術(shù)之一,而實體信息擴展則是實體信息抽取技術(shù)中一個重要的部分。實體信息抽取的目的在于自動生成包含實體相關(guān)屬性信息的實體知識庫。本文研究的實體信息查詢擴展的目的:一是擴充實體查詢詞信息,在查詢詞信息不完備的條件下,對實體查詢詞進行信息擴充,消除查詢詞歧義,明確查詢意圖;二是實現(xiàn)針對實體別稱等共指信息的擴展,從而將共同指向的不同實體之間的信息得以合并共享。 本文的主要工作如下: 首先,將對象檢索與傳統(tǒng)的信息檢索進行了分析對比,重點分析了實體信息擴展和傳統(tǒng)查詢擴展在預(yù)處理、詞項選擇、相關(guān)度計算、及匹配方法上的區(qū)別和聯(lián)系,并在此基礎(chǔ)上確定了本文的主要研究課題,即基于統(tǒng)計學(xué)習(xí)的實體信息擴展,以及基于語法規(guī)則的實體信息擴展。 其次,針對與實體相關(guān)度高的詞項擴展問題,本文提出了一種基于概率統(tǒng)計的實體信息擴展方法,利用相關(guān)反饋技術(shù),結(jié)合層次聚類算法,在相關(guān)文檔集內(nèi)對實體與詞項進行共現(xiàn)相關(guān)度挖掘,實現(xiàn)對實體描述信息的擴展;谠撃P,對兩千余個實體進行了相關(guān)詞項擴展,并應(yīng)用在TREC2012Microblog評測任務(wù)中,結(jié)果驗證了該模型的有效性。 最后,針對實體別稱、同義詞、身份描述等信息,本文研究給出了一種基于語法規(guī)則的實體信息擴展方法,通過詞法分析預(yù)處理,根據(jù)針對共指表述的語法特征,對實體表述進行共指消解,實現(xiàn)實體別稱等信息的擴展。利用該模型,在TAC2012KBP中的兩個子任務(wù)中獲得良好效果,驗證了該模型的有效性。
文內(nèi)圖片:
圖片說明:凝聚的層次聚類劃分策略這一簇文檔集中的全部文檔將作為對實體的支撐信息/并在后續(xù)步驟中對這些文檔進行針對這一實體的信息抽取作為對這一實體的信息擴展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.3
本文編號:2512481
文內(nèi)圖片:
圖片說明:凝聚的層次聚類劃分策略這一簇文檔集中的全部文檔將作為對實體的支撐信息/并在后續(xù)步驟中對這些文檔進行針對這一實體的信息抽取作為對這一實體的信息擴展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.3
【參考文獻】
相關(guān)期刊論文 前3條
1 徐建民;白彥霞;吳樹芳;;基于同義詞擴展的貝葉斯網(wǎng)絡(luò)檢索模型[J];計算機應(yīng)用;2006年11期
2 嚴華云;劉其平;肖良軍;;信息檢索中的相關(guān)反饋技術(shù)綜述[J];計算機應(yīng)用研究;2009年01期
3 王蘭成;李超;;結(jié)合兩種相似度計算的主題信息檢索方法研究[J];現(xiàn)代圖書情報技術(shù);2009年11期
,本文編號:2512481
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2512481.html
最近更新
教材專著