基于微博中的人物圖譜的構(gòu)建方法研究

發(fā)布時間：2019-06-12 03:17

【摘要】：隨著互聯(lián)網(wǎng)的快速發(fā)展,參與到互聯(lián)網(wǎng)中的用戶越來越多。在互聯(lián)網(wǎng)上每天都會產(chǎn)生大量的數(shù)據(jù),這些數(shù)據(jù)包含著很多有用信息。如何從這些無結(jié)構(gòu)的文本數(shù)據(jù)中抽取有用的結(jié)構(gòu)化數(shù)據(jù)是本文研究的重點。然而,在這些自然語言文檔中,描述了大量的人物社會關(guān)系。從這些文檔中自動提取人物社會關(guān)系對人物社會關(guān)系分析研究是十分有用的。自舉的關(guān)系提取系統(tǒng)能有效的適用于微博環(huán)境,本文在該模型的基礎(chǔ)上提出了四點改進意見。下面給出了本文研究的主要內(nèi)容。本文提出了基于圖的排列算法。自舉關(guān)系提取模型能提取出特定關(guān)系下的人物實體對。為了提高該模型的性能,本文提出基于圖的排列算法,對該模型產(chǎn)生的結(jié)果,該算法考慮到結(jié)果與種子集的相似性,從而提高了該模型的性能。本文提出了目標關(guān)系下的種子集構(gòu)建模型。由于在關(guān)系提取中傳統(tǒng)的種子集構(gòu)建方法需要大量的人工干預(yù),導(dǎo)致實驗的效率變低。本文提出的種子集構(gòu)建方法是利用百度百科構(gòu)建中文語義知識庫,然后對中文語義知識庫中的關(guān)系進行分類,本文只考慮三種類別的關(guān)系提取問題,最后利用中文知識庫結(jié)合搜索引擎構(gòu)建種子集。本文改進實體對相似性計算方法。在基于圖的排列算法中,需要構(gòu)建實體對圖,實體對之間的相似性計算方法是十分重要的。本文改進了原來實體對圖中兩個實體對之間的相似性計算方法,本文利用潛在關(guān)系分析(LRA)來計算相似性,這種方法可以解決降維去噪問題,能提高計算的準確性。本文改進內(nèi)容模式相似性計算方法。在基于圖的排列算法中,需要構(gòu)建內(nèi)容模式圖,內(nèi)容模式之間的相似性計算方法是十分重要的。本文也改進了原來內(nèi)容模式圖中內(nèi)容模式之間的相似度計算方法,本文采用路徑包含樹表示內(nèi)容模式,利用卷積樹核函數(shù)計算內(nèi)容模式之間的相似性,這種改進方法可以提高相似性的準確性。本文最后構(gòu)建出了可視化人物關(guān)系圖譜,實驗證明了本文研究內(nèi)容的適用性和可行性,本文提出的方法可以用于任何類型的關(guān)系提取,具有較強的可擴展性。
[Abstract]:With the rapid development of the Internet, more and more users participate in the Internet. A large amount of data is generated every day on the Internet, which contains a lot of useful information. How to extract useful structured data from these unstructured text data is the focus of this paper. However, in these natural language documents, a large number of character social relations are described. Automatic extraction of character social relations from these documents is very useful for the analysis and research of character social relations. Bootstrap relational extraction system can be effectively applied to Weibo environment. This paper puts forward four suggestions for improvement on the basis of this model. The main contents of this paper are given below. In this paper, an arrangement algorithm based on graph is proposed. The bootstrap relationship extraction model can extract the character entity pairs under the specific relationship. In order to improve the performance of the model, a graph-based permutation algorithm is proposed in this paper. For the results of the model, the algorithm takes into account the similarity between the results and the species subset, thus improving the performance of the model. In this paper, a seed set construction model based on target relation is proposed. Because the traditional seed set construction method needs a lot of artificial intervention in relational extraction, the efficiency of the experiment becomes lower. The seed set construction method proposed in this paper is to use Baidu encyclopedia to construct Chinese semantic knowledge base, and then to classify the relationships in Chinese semantic knowledge base. This paper only considers the relationship extraction problem of three categories, and finally uses the Chinese knowledge base combined with search engine to construct seed set. In this paper, the entity pair similarity calculation method is improved. In the graph-based arrangement algorithm, it is very important to construct the entity pair graph and the similarity calculation method between the entity pair. In this paper, the similarity calculation method between the two entity pairs in the original entity pair is improved. In this paper, the potential relation analysis (LRA) is used to calculate the similarity. This method can solve the problem of dimension reduction and denoising, and can improve the accuracy of the calculation. In this paper, the similarity calculation method of content pattern is improved. In the graph-based arrangement algorithm, it is necessary to construct the content pattern diagram, and the similarity calculation method between the content patterns is very important. This paper also improves the similarity calculation method between the content patterns in the original content pattern diagram. In this paper, the path inclusion tree is used to represent the content pattern, and the convolution tree kernel function is used to calculate the similarity between the content patterns. This improved method can improve the accuracy of similarity. At the end of this paper, the visual relationship map of characters is constructed, and the experiment proves the applicability and feasibility of the research content in this paper. The method proposed in this paper can be used in any type of relational extraction, and has strong scalability.
【學(xué)位授予單位】：西華大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.1;TP393.092

【參考文獻】

相關(guān)期刊論文前10條

1 黃衛(wèi)春;范少帥;熊李艷;鐘茂生;;基于特征選擇的人物關(guān)系抽取方法[J];科學(xué)技術(shù)與工程;2015年03期

2 武金剛;;知識圖譜——搜索引擎的進化[J];百科知識;2013年22期

3 王連喜;;微博短文本預(yù)處理及學(xué)習(xí)研究綜述[J];圖書情報工作;2013年11期

4 李卓君;;搜索引擎問題分析及發(fā)展趨勢研究[J];中國市場;2011年49期

5 張小娣;宋余慶;;基于科學(xué)知識圖譜的搜索引擎前沿分析[J];科技管理研究;2011年18期

6 邱均平;胡文君;羅力;;基于知識圖譜的國際網(wǎng)絡(luò)搜索引擎研究現(xiàn)狀與前沿分析[J];圖書情報工作;2010年24期

7 唐明偉;卞藝杰;陶飛飛;;基于語義向量空間模型的文檔檢索系統(tǒng)研究[J];情報雜志;2010年05期

8 黃鑫;朱巧明;錢龍華;劉梅梅;;基于特征組合的中文實體關(guān)系抽取[J];微電子學(xué)與計算機;2010年04期

9 莊成龍;錢龍華;周國棟;;基于樹核函數(shù)的實體語義關(guān)系抽取方法研究[J];中文信息學(xué)報;2009年01期

10 車萬翔,劉挺,李生;實體關(guān)系自動抽取[J];中文信息學(xué)報;2005年02期

相關(guān)碩士學(xué)位論文前2條

1 杜振雷;面向微博短文本的情感分析研究[D];北京信息科技大學(xué);2013年

2 牛鴿軍;新浪微博虛擬社區(qū)的網(wǎng)絡(luò)結(jié)構(gòu)研究[D];哈爾濱工業(yè)大學(xué);2013年

，

本文編號：2497686

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2497686.html

上一篇：網(wǎng)絡(luò)安全技術(shù)的現(xiàn)狀與發(fā)展趨勢——評《網(wǎng)絡(luò)安全技術(shù)》
下一篇：基于謂詞時序邏輯的惡意代碼行為描述及檢測

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于微博中的人物圖譜的構(gòu)建方法研究