基于機器學習的微博人物關系信息抽取與分析研究
本文選題:微博 + 人物關系抽取 ; 參考:《北京郵電大學》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術的飛速發(fā)展,社交網(wǎng)絡的研究對于輿情監(jiān)控和商業(yè)分析等工作越來越重要,因此對社交網(wǎng)絡的研究成為熱點。微博是中國最大的社交網(wǎng)絡社區(qū),其語料特點和傳統(tǒng)媒體有很大的不同。本文根據(jù)微博的特點,主要研究在微博用戶社交的場景下如何提高人物關系抽取的性能,以及如何提高人物關系強度預測的能力。論文的主要工作和成果包括:(1)研究微博語料的特點,并分析傳統(tǒng)人物關系抽取算法的優(yōu)缺點,針對傳統(tǒng)方法對模糊樣本的識別能力不足的問題,提出了 SVMDT_RFC算法模型。通過引進SVM決策樹,改進隨機森林算法,使用最大化分類間隔的SVM決策樹節(jié)點分裂算法和基于分類間隔加權的隨機森林投票算法,提高對于模糊樣本的人物關系抽取能力。本文將SVMDT_RFC算法與SVM和隨機森林算法進行實驗比較,結果表明該方法可以提高模糊樣本的人物關系抽取的準確率,對于中等長度文本和長文本的人物關系抽取的準確率提升效果更顯著。(2)研究傳統(tǒng)的人物關系建模方法,針對傳統(tǒng)模型對現(xiàn)實生活中人物關系的還原度不足的問題,結合微博語料文本情感特征信息豐富的特點,通過引入情感強度特征,設計了一種人物關系建模方案。該方案結合用戶屬性特征與行為特征,通過構建感詞典與表情詞典等方式分析用戶的情感強度。將情感特征引入模型,實現(xiàn)了對人物關系的多維度建模,可以更準確的模擬真實的人物關系,提高模型的真實性和有效性。(3)在上述人物關系模型的基礎上,提出了一種基于多層感知機的人物關系強度預測方案,通過十折交叉驗證實驗,與決策樹模型和最大熵模型進行對比,實驗結果證明了本文提出的方案能夠提高人物關系強度預測的準確性。其次,將傳統(tǒng)人物關系模型與本文提出的人物關系模型進行對比,發(fā)現(xiàn)引入情感特征后,提高了預測的準確率,證明本文提出的人物關系模型的有效性。最后,解決傳統(tǒng)的人物關系強度預測方案的僅能輸出強和弱兩種結果導致的預測不準確的問題,此方案可以多級別量化預測任務關系強度,可以更精細化更準確的預測人物關系強度,通過對比不同強度級別的人物關系的預測中,引入情感特前和引入情感特征后的結果,證明多級別量化的關系強度預測方案有助于對人物關系進行更深入的分析和研究。論文的結構和各章節(jié)內容安排如下:第一章介紹了論文的選題背景以及對于微博網(wǎng)絡研究的意義,介紹人物關系抽取的研究現(xiàn)狀和人物關系強度預測的研究現(xiàn)狀。第二章首先介紹了人物關系抽取系統(tǒng)的流程以及其中涉及到的問題。之后分析了目前人物關系抽取方案中存在的問題。第三章分析了如何對人物關系進行建模以及模型的不足,最后簡單介紹了用到的相關算法。第四章首先分析了目前人物關系抽取方案的問題,即對模糊樣本的抽取能力不足。通過引入SVM決策樹對隨機森林算法進行改進,提出了基于SVMDT__RFC算法的微博人物關系抽取的技術方案。第五章針對人物關系抽取可以獲取關系種類但無法給出關系強度的問題,首先介紹了引入情感特征并結合屬性特征以及行為特征的人物關系模型,然后提出了一種可以獲得多級別量化輸出的基于多層感知機的人物關系強度預測的方案。第六章對全文進行了總結,并指出當前研究的一些不足,以及今后改善方向。
[Abstract]:With the rapid development of Internet technology, the research of social networks is becoming more and more important for public opinion monitoring and business analysis. Therefore, the research on social networks has become a hot spot. Micro-blog is the largest social network community in China. Its language features are very different from that of traditional media. Based on the characteristics of micro-blog, this paper mainly studies in micro How to improve the performance of personage relationship extraction and how to improve the ability to predict the relationship intensity of personage. The main work and achievements of this paper include: (1) study the characteristics of micro-blog language and analyze the advantages and disadvantages of the traditional figure extraction algorithm. The SVMDT_RFC algorithm model is proposed. By introducing the SVM decision tree, improving the random forest algorithm, using the SVM decision tree node splitting algorithm which maximizes the classification interval and the random forest voting algorithm based on the classification interval weighting, the SVMDT_RFC algorithm is improved with the SVM and the random forest. Compared with the experimental results, the results show that the method can improve the accuracy of the figure relationship extraction of the fuzzy samples and improve the accuracy of the figure relation extraction of medium length text and long text. (2) the traditional modeling method of personage relationship is studied, and the reduction degree of the traditional model to the relationship of the real life is aimed at the reduction degree of the character relationship in the real life. In combination with the characteristics of emotional feature information of micro-blog text text, a personage relationship modeling scheme is designed by introducing emotional intensity features. The scheme combines user attributes and behavior features to analyze user's emotional intensity by constructing a sense dictionary and an expression dictionary. The multi-dimensional modeling of the character relationship can be used to simulate the real character relationship more accurately and improve the authenticity and validity of the model. (3) on the basis of the model of the personage relationship, a kind of figure relationship intensity prediction scheme based on the multilayer perceptron is proposed, and the ten fold cross validation experiment is carried out with the decision tree model and the maximum entropy. Compared with the experimental results, the experimental results show that the proposed scheme can improve the accuracy of the prediction of the relationship strength of the personage. Secondly, the traditional figure relationship model is compared with the figure model proposed in this paper, and it is found that after introducing the emotional characteristics, the accuracy of the prediction is improved, and the validity of the model is proved to be effective. Finally, the solution of the traditional figure relationship intensity prediction scheme can only output the inaccurate prediction problem caused by two strong and weak results. This scheme can quantify the relationship intensity of the prediction task in multiple levels, more precise and more accurate prediction of the relationship strength of the figure, by the prediction of the relationship between the characters of different intensity levels, As the result of emotional characteristics and emotional characteristics, it is proved that the multi level quantitative relationship intensity prediction scheme helps to further analyze and study the relationship between characters. The structure and the contents of the chapters are arranged as follows: the first chapter introduces the background of the topic and the significance of the research on the micro-blog network, and introduces the relationship between the characters. The research status of extraction and the research status of character relationship intensity prediction. The second chapter first introduces the process of the character extraction system and the problems involved. Then it analyzes the existing problems in the current figure extraction scheme. The third chapter analyzes how to model the relationship between characters and the insufficiency of the model, and finally, In the fourth chapter, the fourth chapter firstly analyzes the problem of the current figure extraction scheme, that is, the ability to extract the fuzzy samples is insufficient. By introducing the SVM decision tree to improve the random forest algorithm, the SVMDT__RFC algorithm based technology scheme for the extraction of micro-blog personage relations is proposed. The fifth chapter is aimed at the character relationship. The problem of extracting relationship types but unable to give the relationship strength is extracted. First, it introduces the relationship model which introduces emotional characteristics and combines attribute characteristics and behavior characteristics, and then proposes a scheme to predict the intensity of personage threshold based on multi-layer perceptron. The sixth chapter introduces the full text into the full text. Summarized and pointed out some deficiencies in the current research and the direction for improvement in the future.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1;TP393.092
【相似文獻】
相關期刊論文 前3條
1 張群力;;論紀實敘事中人物關系的運用[J];電視研究;2012年09期
2 傅宛菊;陳木蘭;;中國新魔幻電影的類型化初探[J];東南傳播;2014年08期
3 丁海峰;;論電影《海洋天堂》中細節(jié)的運用[J];西部廣播電視;2013年05期
相關會議論文 前3條
1 白勁鵬;;可怕的對稱——論《了不起的蓋茨比》中的主次人物關系[A];外語語言教學研究——黑龍江省外國語學會第十一次學術年會論文集[C];1997年
2 黃素影;;《天倫》創(chuàng)作小結[A];我的角色與我們的劇團——第六屆電影表演藝術學會獎文集[C];1997年
3 吳士余;;重視人物關系的典型化[A];《毛澤東文藝思想研究》第三輯暨全國毛澤東文藝思想研究會第三次年會論文集[C];1983年
相關重要報紙文章 前6條
1 本報記者 張悅;音樂劇《蝶》推出修排版[N];中國藝術報;2008年
2 記者 金朝力;網(wǎng)絡視頻業(yè)首推人臉識別功能[N];北京商報;2010年
3 本文實習記者 張柳青;紀念汶川地震一周年[N];中國電影報;2009年
4 許柏林;小成本拍出大境界[N];人民日報;2012年
5 張克丹 綜合整理;青春·理想·奮斗·奉獻[N];中國電影報;2009年
6 上海戲劇學院副教授 石俊;問號的力量[N];文匯報;2012年
相關碩士學位論文 前10條
1 周舸;基于機器學習的微博人物關系信息抽取與分析研究[D];北京郵電大學;2017年
2 潘云;基于中文在線資源的人物關系抽取研究[D];華東師范大學;2015年
3 史軍;初析舞劇《奶奶的信》的立意與結構[D];北京舞蹈學院;2015年
4 唐丞博;談《追夢時刻》中人物關系的發(fā)展和變化[D];云南藝術學院;2016年
5 劉博佳;基于維基百科的人物關系抽取研究[D];北京交通大學;2016年
6 陳靜;關于《哥兒》的中譯本中粗話的翻譯研究[D];北京外國語大學;2016年
7 馮元為;基于知識圖譜構建人物關系的設計與實現(xiàn)[D];重慶大學;2016年
8 楊岸楨;基于中文微博文本的人物關系提取與分析[D];西華大學;2016年
9 黃蓓靜;深度學習技術在中文人物關系抽取中的應用研究[D];華東師范大學;2017年
10 徐珊;孫昌涉初期小說的人物關系和作家意識研究[D];山東大學;2009年
,本文編號:1921160
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1921160.html