基于網(wǎng)絡(luò)語義標簽的多源知識庫實體對齊算法
發(fā)布時間:2018-07-21 14:00
【摘要】:知識庫是多種自然語言處理任務(wù)的重要數(shù)據(jù)資源,但單一知識庫覆蓋度低,不同知識庫異構(gòu)性強,不利于數(shù)據(jù)的共享和集成.因此,多源知識庫融合技術(shù)的研究有著十分重要的意義.其中,多源知識庫實體對齊是多源知識庫融合技術(shù)中的重要組成部分.在語義萬維網(wǎng)發(fā)展的推動下,國外開展了很多相關(guān)工作,大多適用于英文知識庫,對于中文知識庫的研究較少.出于對中文知識庫融合的研究目的,該文提出了一種基于網(wǎng)絡(luò)語義標簽的多源知識庫實體對齊算法.該算法綜合利用屬性標簽、類別標簽和非結(jié)構(gòu)化文本關(guān)鍵詞,對齊中文百科實體.經(jīng)實驗測試,該算法能夠較好地解決多源知識庫實體對齊問題,算法在近95%的準確率下,仍能保持近55%的較好的召回率,應(yīng)用于實際系統(tǒng)中,滿足了實際的多源知識庫實體對齊應(yīng)用需求.
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者單位】: 中國科學(xué)院自動化研究所模式識別國家重點實驗室;
【基金】:國家自然科學(xué)基金項目(61533018) 國家“九七三”重點基礎(chǔ)研究發(fā)展規(guī)劃(2014CB340503) “CCF-騰訊”犀牛鳥基金資助~~
【分類號】:TP391.1
,
本文編號:2135758
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者單位】: 中國科學(xué)院自動化研究所模式識別國家重點實驗室;
【基金】:國家自然科學(xué)基金項目(61533018) 國家“九七三”重點基礎(chǔ)研究發(fā)展規(guī)劃(2014CB340503) “CCF-騰訊”犀牛鳥基金資助~~
【分類號】:TP391.1
,
本文編號:2135758
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2135758.html
最近更新
教材專著