中文知識工程和知識服務(wù)平臺的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-12-12 13:27

【摘要】：互聯(lián)網(wǎng)的飛速發(fā)展帶動(dòng)了網(wǎng)絡(luò)用戶的數(shù)量迅猛增長,越來越多的用戶進(jìn)入互聯(lián)網(wǎng)成為網(wǎng)民,使得用戶對網(wǎng)絡(luò)資源的需求在急劇的增長。如何從浩瀚的信息資源中檢索出用戶需要的信息是目前互聯(lián)網(wǎng)發(fā)展所面臨的一大挑戰(zhàn),而當(dāng)前為網(wǎng)絡(luò)用戶提供信息檢索服務(wù)的主要是互聯(lián)網(wǎng)搜索引擎以及部分問答社區(qū),它們在向用戶提供數(shù)據(jù)服務(wù)的廣‘度方面已經(jīng)有了長足的進(jìn)步,諸如百度和百度知道等。但是在提供數(shù)據(jù)的精度方面效果并不太好,尤其當(dāng)用戶的需求是精確度要求較高的知識信息時(shí)無論是搜索引擎還是問答社區(qū)都顯得有些力不從心。本文針對網(wǎng)絡(luò)信息迅速膨脹與網(wǎng)絡(luò)用戶對知識信息的需求不匹配的問題,提出了利用中文知識工程的相關(guān)技術(shù)創(chuàng)建中文知識庫,并且建立一個(gè)提供中文知識服務(wù)的平臺,該平臺旨在實(shí)現(xiàn)為網(wǎng)絡(luò)用戶提供優(yōu)質(zhì)、高效的知識共享信息。在知識庫的構(gòu)建方面,本文提出利用百科頁面的信息框來抽取屬性對,并根據(jù)從信息框抽取的屬性對訓(xùn)練分類模型,利用該模型并結(jié)合現(xiàn)代漢語自動(dòng)分詞、詞性標(biāo)記和命名實(shí)體標(biāo)注技術(shù)實(shí)現(xiàn)了從不含有信息框的百科頁面中抽取屬性對，，并利用抽取出來的屬性對建立屬性值數(shù)據(jù)庫,實(shí)現(xiàn)用戶檢索知識信息的準(zhǔn)確定位。同時(shí),用戶在檢索一個(gè)知識信息的時(shí)候還很關(guān)心與其相關(guān)的一些其它的知識信息,于是本文提出了一種基于維基百科的實(shí)體關(guān)聯(lián)度計(jì)算方法,該方法利用維基百科頁面中含有的共現(xiàn)鏈接信息來計(jì)算兩個(gè)命名實(shí)體的關(guān)聯(lián)度。在知識服務(wù)方面,本文利用基于鏈接分析的HITS算法來對檢索結(jié)果進(jìn)行排序,并且對經(jīng)過HITS排序的網(wǎng)頁再通過計(jì)算頁面與問題的相似度來最終確定答案貝面的排序。
[Abstract]:The rapid development of the Internet has driven the rapid growth of the number of network users, more and more users enter the Internet to become Internet users, making the demand for network resources of users in the rapid growth. How to retrieve the information that users need from the vast information resources is a major challenge facing the development of the Internet. At present, the Internet search engines and part of the Q & A community are the main service providers of information retrieval for Internet users. They have come a long way in providing users with a wide range of data services, such as Baidu and Baidu know. However, the accuracy of the data is not very good, especially when the user needs high accuracy of knowledge information, whether search engines or Q & A community seem to be unable to do. In this paper, aiming at the problem that the rapid expansion of network information does not match the demand of network users for knowledge information, this paper proposes to use the relevant technologies of Chinese knowledge engineering to create Chinese knowledge base, and to establish a platform to provide Chinese knowledge service. The platform aims to provide high-quality and efficient knowledge sharing information for network users. In the construction of knowledge base, this paper proposes to use the information box of encyclopedia page to extract attribute pairs, and to train the classification model according to the attribute pairs extracted from the information frame, using this model and combining with modern Chinese automatic word segmentation. The technology of part of speech and named entity tagging realizes the extraction of attribute pairs from encyclopedia pages which never contain information box, and sets up attribute value database by using extracted attribute pairs to realize the accurate location of users' retrieval of knowledge information. At the same time, when retrieving a knowledge information, users are also concerned about some other knowledge information related to it, so this paper proposes a method of entity correlation degree calculation based on Wikipedia. This method uses coexisting link information contained in Wikipedia page to calculate the correlation degree of two named entities. In the aspect of knowledge service, this paper uses the HITS algorithm based on link analysis to sort the retrieval results, and then calculates the similarity between the page and the problem by calculating the similarity between the page and the problem to determine the sorting of the answer shell surface.
【學(xué)位授予單位】：北方工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP393.09;TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 劉高軍;馬硯忠;段建勇;;基于維基百科的中文命名實(shí)體關(guān)聯(lián)度計(jì)算[J];北方工業(yè)大學(xué)學(xué)報(bào);2012年01期

2 田久樂;趙蔚;;基于同義詞詞林的詞語相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期

3 李淑英;;中文分詞技術(shù)[J];科技信息(科學(xué)教研);2007年36期

4 劉斌,黃鐵軍,程軍,高文;一種新的基于統(tǒng)計(jì)的自動(dòng)文本分類方法[J];中文信息學(xué)報(bào);2002年06期

5 秦春秀;趙捧未;劉懷亮;;詞語相似度計(jì)算研究[J];情報(bào)理論與實(shí)踐;2007年01期

6 李滿華;;股市財(cái)富效應(yīng)相關(guān)問題研究[J];商場現(xiàn)代化;2010年12期

7 牟晉娟;包宏;;中文實(shí)體關(guān)系抽取研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2009年15期

8 李滿華;;財(cái)富與財(cái)富效應(yīng)相關(guān)問題研究[J];現(xiàn)代商貿(mào)工業(yè);2010年11期

相關(guān)博士學(xué)位論文前1條

1 李榮陸;文本分類及其相關(guān)技術(shù)研究[D];復(fù)旦大學(xué);2005年

相關(guān)碩士學(xué)位論文前1條

1 顧申華;基于互動(dòng)問答系統(tǒng)的問題推薦[D];上海交通大學(xué);2009年

本文編號：2374655

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2374655.html

上一篇：基于加權(quán)SimRank的中文查詢推薦研究
下一篇：云計(jì)算環(huán)境下隱私與數(shù)據(jù)保護(hù)關(guān)鍵技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

中文知識工程和知識服務(wù)平臺的設(shè)計(jì)與實(shí)現(xiàn)