基于網(wǎng)絡(luò)語(yǔ)義標(biāo)簽的多源知識(shí)庫(kù)實(shí)體對(duì)齊算法
發(fā)布時(shí)間:2018-07-21 14:00
【摘要】:知識(shí)庫(kù)是多種自然語(yǔ)言處理任務(wù)的重要數(shù)據(jù)資源,但單一知識(shí)庫(kù)覆蓋度低,不同知識(shí)庫(kù)異構(gòu)性強(qiáng),不利于數(shù)據(jù)的共享和集成.因此,多源知識(shí)庫(kù)融合技術(shù)的研究有著十分重要的意義.其中,多源知識(shí)庫(kù)實(shí)體對(duì)齊是多源知識(shí)庫(kù)融合技術(shù)中的重要組成部分.在語(yǔ)義萬(wàn)維網(wǎng)發(fā)展的推動(dòng)下,國(guó)外開(kāi)展了很多相關(guān)工作,大多適用于英文知識(shí)庫(kù),對(duì)于中文知識(shí)庫(kù)的研究較少.出于對(duì)中文知識(shí)庫(kù)融合的研究目的,該文提出了一種基于網(wǎng)絡(luò)語(yǔ)義標(biāo)簽的多源知識(shí)庫(kù)實(shí)體對(duì)齊算法.該算法綜合利用屬性標(biāo)簽、類別標(biāo)簽和非結(jié)構(gòu)化文本關(guān)鍵詞,對(duì)齊中文百科實(shí)體.經(jīng)實(shí)驗(yàn)測(cè)試,該算法能夠較好地解決多源知識(shí)庫(kù)實(shí)體對(duì)齊問(wèn)題,算法在近95%的準(zhǔn)確率下,仍能保持近55%的較好的召回率,應(yīng)用于實(shí)際系統(tǒng)中,滿足了實(shí)際的多源知識(shí)庫(kù)實(shí)體對(duì)齊應(yīng)用需求.
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者單位】: 中國(guó)科學(xué)院自動(dòng)化研究所模式識(shí)別國(guó)家重點(diǎn)實(shí)驗(yàn)室;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(61533018) 國(guó)家“九七三”重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃(2014CB340503) “CCF-騰訊”犀牛鳥(niǎo)基金資助~~
【分類號(hào)】:TP391.1
,
本文編號(hào):2135758
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者單位】: 中國(guó)科學(xué)院自動(dòng)化研究所模式識(shí)別國(guó)家重點(diǎn)實(shí)驗(yàn)室;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(61533018) 國(guó)家“九七三”重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃(2014CB340503) “CCF-騰訊”犀牛鳥(niǎo)基金資助~~
【分類號(hào)】:TP391.1
,
本文編號(hào):2135758
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2135758.html
最近更新
教材專著