基于上下文的多特征圖模型中文實體鏈接技術(shù)
[Abstract]:With the development of network information and the increasing demand of semantic search, the expansion of knowledge base has become a hot topic in the field of natural language processing. Entity link is the key technology of the expansion of knowledge base, and it is the process of correctly linking the entity reference in the text to the entity in the knowledge base. It has important theoretical research value and practical application value. At present, most of the languages processed by physical link technology are English, and the research on Chinese is still in its infancy. The main causes of this phenomenon include: (1) lack of unified and authoritative Chinese open source knowledge base and corpus; (2) Chinese entity extraction technology is restricted by Chinese word segmentation, and Chinese has rich semantics, more flexible grammar and greater difficulty in disambiguation than English. It still stays at the expression level of named entity, and can not get the semantic information of entity well. In view of the above problems, this paper based on the current mainstream English entity link technology, combined with the current research status of Chinese, A multi-feature graph model based on context is proposed. (1) Chinese Wikipedia is selected as the knowledge base support for this entity link task. And extract Chinese corpus information from the official evaluation data provided by the NIST (National Institute of Standards and Technology, National Institute of Standards and Technology (NIST (National Institute of Standards and Technology,) in the KBP (Knowledge Base Population, knowledge Base expansion of the TAC (Text Analysis Conference, text Analysis Conference. Construct corpus and experimental data set; (2) from the context of entity reference expression and Wikipedia database, fully extract a variety of features between entities and quantify them to semantic similarity. Then the semantic similarity is fused into the constructed graph model. By using the feature of topic consistency of the graph model, the candidate entities are sorted and the entity links are completed, so as to improve the accuracy of Chinese word segmentation and increase the semantic information of entities. In order to verify the performance of this method, the method of reproducing the latest Chinese entity link is adopted. The experimental results show that the proposed method can effectively improve the accuracy and efficiency of the entity link, and achieve a good overall effect.
【學(xué)位授予單位】:太原理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 楊光;劉秉權(quán);劉銘;;基于圖方法的命名實體消歧[J];智能計算機與應(yīng)用;2015年05期
2 李茂林;;基于主題敏感的重啟隨機游走實體鏈接方法[J];北京大學(xué)學(xué)報(自然科學(xué)版);2016年01期
3 陳萬禮;昝紅英;吳泳鋼;;基于多源知識和Ranking SVM的中文微博命名實體鏈接[J];中文信息學(xué)報;2015年05期
4 昝紅英;吳泳鋼;賈玉祥;牛桂玲;;基于多源知識的中文微博命名實體鏈接[J];山東大學(xué)學(xué)報(理學(xué)版);2015年07期
5 張濤;劉康;趙軍;;一種基于圖模型的維基概念相似度計算方法及其在實體鏈接系統(tǒng)中的應(yīng)用[J];中文信息學(xué)報;2015年02期
6 舒佳根;惠浩添;錢龍華;朱巧明;;一個中文實體鏈接語料庫的建設(shè)[J];北京大學(xué)學(xué)報(自然科學(xué)版);2015年02期
7 譚詠梅;楊雪;;結(jié)合實體鏈接與實體聚類的命名實體消歧[J];北京郵電大學(xué)學(xué)報;2014年05期
8 郭宇航;秦兵;劉挺;李生;;實體鏈指技術(shù)研究進展[J];智能計算機與應(yīng)用;2014年05期
9 懷寶興;寶騰飛;祝恒書;劉淇;;一種基于概率主題模型的命名實體鏈接方法[J];軟件學(xué)報;2014年09期
10 朱敏;賈真;左玲;吳安峻;陳方正;柏玉;;中文微博實體鏈接研究[J];北京大學(xué)學(xué)報(自然科學(xué)版);2014年01期
,本文編號:2203297
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2203297.html