微博短文本情感分析關(guān)鍵技術(shù)研究與實現(xiàn)

發(fā)布時間：2018-08-27 18:12

【摘要】：隨著社交網(wǎng)絡(luò)的興起以及微博自媒體時代的到來，互聯(lián)網(wǎng)上每天能產(chǎn)生數(shù)以億計的博文。海量的微博文數(shù)據(jù)蘊含了豐富的有關(guān)個人、社會、企業(yè)、政府多維度、多層次、多樣化的信息。對博文進行內(nèi)容分析，監(jiān)控網(wǎng)絡(luò)輿情，完成對博文中蘊含的情感傾向性的分析等，有重要的理論研究價值和應(yīng)用價值。本文基于模擬用戶登錄方式采集海量微博數(shù)據(jù)，通過分詞、詞性標注、主題詞提取等自然語言處理技術(shù)，結(jié)合情感詞庫和微博語料，通過構(gòu)建向量空間模型，并動態(tài)調(diào)整情感影響因子的權(quán)重等參數(shù)，對微博數(shù)據(jù)進行情感分析。本文所做的工作如下：首先，基于模擬瀏覽器技術(shù)，結(jié)合HttpWatch8.5抓包分析技術(shù)，采集海量微博信息。第二，，基于隱馬爾可夫模型和N-Gram語言模型，設(shè)計實現(xiàn)了中文分詞器SkyLightAnalyzer，主要功能包括分詞、詞性標注、詞義消歧、未登錄詞識別等。第三，基于統(tǒng)計和規(guī)則相結(jié)的算法，在前述中文分詞器的基礎(chǔ)上，實現(xiàn)了針對博文的主題詞提取與情感單元提取。第四，提出基于向量空間模型和動態(tài)調(diào)整情感影響因子的權(quán)重算法，設(shè)計并實現(xiàn)了基于博主個性化建模與內(nèi)容分析的情感傾向性分析方法。實驗與實用表明了本文提出的算法的有效性。文中也對存在的不足以及下一步的工作計劃進行了說明。
[Abstract]:With the rise of social networks and the advent of Weibo since the media era, hundreds of millions of blog posts can be generated on the Internet every day. The massive Weibo text data contains abundant information about individual, society, enterprise and government. It has important theoretical research value and application value to analyze the content of blog articles, monitor network public opinion, and complete the analysis of emotional tendency contained in blog posts. Based on simulated user login, this paper collects massive Weibo data, constructs vector space model by using natural language processing technology, such as participle, part of speech tagging, subject word extraction and so on, combining emotional lexicon and Weibo corpus. And dynamically adjust the weight of affective factors and other parameters, Weibo data for emotional analysis. The work of this paper is as follows: first, based on the simulation browser technology, combined with HttpWatch8.5 packet capture analysis technology, collect massive Weibo information. Secondly, based on the hidden Markov model and N-Gram language model, the main functions of Chinese word Segmentation (SkyLightAnalyzer,) include word segmentation, part of speech tagging, word sense disambiguation, unrecorded word recognition and so on. Thirdly, based on the algorithm of combining statistics and rules, based on the above Chinese word segmentation, the thesis implements the subject word extraction and emotion unit extraction for blog posts. Fourthly, an algorithm based on vector space model and dynamic adjustment of affective influence factors is proposed, and an emotional orientation analysis method based on personalization modeling and content analysis is designed and implemented. Experimental and practical results show the effectiveness of the proposed algorithm. The paper also describes the shortcomings and the next work plan.
【學(xué)位授予單位】：河北科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP391.1;TP393.092

【參考文獻】

相關(guān)期刊論文前10條

1 張仰森;郭江;;四種統(tǒng)計詞義消歧模型的分析與比較[J];北京信息科技大學(xué)學(xué)報(自然科學(xué)版);2011年02期

2 朱聰慧;趙鐵軍;鄭德權(quán);;基于無向圖序列標注模型的中文分詞詞性標注一體化系統(tǒng)[J];電子與信息學(xué)報;2010年03期

3 李華波;吳禮發(fā);賴海光;鄭成輝;黃康宇;;有效的爬行Ajax頁面的網(wǎng)絡(luò)爬行算法[J];電子科技大學(xué)學(xué)報;2013年01期

4 王佰玲;曲蕓;張永錚;田志宏;;基于數(shù)據(jù)流的網(wǎng)頁內(nèi)容分析技術(shù)研究[J];電子學(xué)報;2013年04期

5 潘欣;呂靜波;張素莉;;基于網(wǎng)絡(luò)蜘蛛的新詞自動發(fā)現(xiàn)算法研究[J];長春工程學(xué)院學(xué)報(自然科學(xué)版);2011年03期

6 崔世起;劉群;孟遙;于浩;西野文人;;基于大規(guī)模語料庫的新詞檢測[J];計算機研究與發(fā)展;2006年05期

7 黃德根;焦世斗;周惠巍;;基于子詞的雙層CRFs中文分詞[J];計算機研究與發(fā)展;2010年05期

8 姚繼偉;趙東范;;基于短語匹配的中文分詞消歧方法[J];吉林大學(xué)學(xué)報(理學(xué)版);2010年03期

9 張海軍;史樹敏;朱朝勇;黃河燕;;中文新詞識別技術(shù)綜述[J];計算機科學(xué);2010年03期

10 張敏;王春紅;;基于統(tǒng)計方法的Web新詞分詞方法研究[J];計算機工程與科學(xué);2010年05期

相關(guān)博士學(xué)位論文前1條

1 車超;知識自動獲取的詞義消歧方法[D];大連理工大學(xué);2010年

本文編號：2208046

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2208046.html

上一篇：Kano模型在網(wǎng)絡(luò)銀行用戶體驗設(shè)計中的應(yīng)用
下一篇：基于社交媒體的事件感知與多模態(tài)事件脈絡(luò)生成

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

微博短文本情感分析關(guān)鍵技術(shù)研究與實現(xiàn)