基于Hadoop的微博用戶情感分類研究與實現(xiàn)
發(fā)布時間:2018-10-20 08:00
【摘要】:隨著微博等新型社交網絡服務的發(fā)展與普及,人們借助此類媒介表達觀點和情感變得更加靈活、自由、快速。因此,,針對微博的情感分類也顯得越來越重要,通過微博情感分類,了解用戶對政策、產品、輿論熱點等的反應,更好的對用戶自身、企業(yè)、政府等提供決策支持具有重要的意義。 在微博海量數據集上執(zhí)行情感分類任務時,傳統(tǒng)的情感分類算法的擴展性成為系統(tǒng)的瓶頸。因而,本文首先研究云計算平臺-Hadoop的主要技術,分析了在Hadoop上實施情感分類的可行性。在此基礎上,本文針對微博文本情感特點,通過自動構建和人工構建相結合的情感語料庫,改進基于微博情感元素和語義的特征抽取算法,并采用Hadoop技術,設計了一種分布式、可擴展、自治的微博情感分類模型。針對該模型中的情感分類問題,設計并實現(xiàn)了基于Hadoop的樸素貝葉斯情感分類算法。測試結果表明,采用基于Hadoop的樸素貝葉斯情感分類模型對海量微博數據進行情感分類,具有良好的執(zhí)行效率和較高的擴展性。
[Abstract]:With the development and popularity of new social networking services such as Weibo, people have become more flexible, free and quick to express their views and feelings through such media. Therefore, according to Weibo's emotional classification, it is becoming more and more important to understand the reaction of users to policies, products, public opinion hot spots, and so on, and better to the users themselves and the enterprises, through Weibo emotional classification. It is of great significance for the government to provide decision support. When carrying out emotion classification task on Weibo's massive data set, the expansibility of traditional emotion classification algorithm becomes the bottleneck of the system. Therefore, this paper first studies the main technology of cloud computing platform-Hadoop, and analyzes the feasibility of implementing emotion classification on Hadoop. On this basis, according to the emotional characteristics of Weibo text, this paper improves the feature extraction algorithm based on Weibo emotion elements and semantics through the combination of automatic and artificial construction of emotional corpus, and designs a distributed feature extraction algorithm based on Hadoop technology. Extensible, autonomous Weibo emotional classification model. Aiming at the emotion classification problem in this model, a naive Bayesian emotion classification algorithm based on Hadoop is designed and implemented. The test results show that using naive Bayesian emotion classification model based on Hadoop to classify massive Weibo data has good performance efficiency and high scalability.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092;TP391.1
本文編號:2282534
[Abstract]:With the development and popularity of new social networking services such as Weibo, people have become more flexible, free and quick to express their views and feelings through such media. Therefore, according to Weibo's emotional classification, it is becoming more and more important to understand the reaction of users to policies, products, public opinion hot spots, and so on, and better to the users themselves and the enterprises, through Weibo emotional classification. It is of great significance for the government to provide decision support. When carrying out emotion classification task on Weibo's massive data set, the expansibility of traditional emotion classification algorithm becomes the bottleneck of the system. Therefore, this paper first studies the main technology of cloud computing platform-Hadoop, and analyzes the feasibility of implementing emotion classification on Hadoop. On this basis, according to the emotional characteristics of Weibo text, this paper improves the feature extraction algorithm based on Weibo emotion elements and semantics through the combination of automatic and artificial construction of emotional corpus, and designs a distributed feature extraction algorithm based on Hadoop technology. Extensible, autonomous Weibo emotional classification model. Aiming at the emotion classification problem in this model, a naive Bayesian emotion classification algorithm based on Hadoop is designed and implemented. The test results show that using naive Bayesian emotion classification model based on Hadoop to classify massive Weibo data has good performance efficiency and high scalability.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092;TP391.1
【參考文獻】
相關期刊論文 前9條
1 胡光民;周亮;柯立新;;基于Hadoop的網絡日志分析系統(tǒng)研究[J];電腦知識與技術;2010年22期
2 吳維;肖詩斌;;基于多特征與復合分類法的中文微博情感分析[J];北京信息科技大學學報(自然科學版);2013年04期
3 劉志明;劉魯;;基于機器學習的中文微博情感分類實證研究[J];計算機工程與應用;2012年01期
4 張玉芳;彭時名;呂佳;;基于文本分類TFIDF方法的改進與應用[J];計算機工程;2006年19期
5 謝麗星;周明;孫茂松;;基于層次結構的多策略中文微博情感分析和特征抽取[J];中文信息學報;2012年01期
6 龐磊;李壽山;周國棟;;基于情緒知識的中文微博情感分類方法[J];計算機工程;2012年13期
7 周勝臣;瞿文婷;石英子;施詢之;孫韻辰;;中文微博情感分析研究綜述[J];計算機應用與軟件;2013年03期
8 張浩;尚進;;微博時代的電子政務建設與創(chuàng)新[J];中國信息界;2011年09期
9 陳彥舟;曹金璇;;基于Hadoop的微博輿情監(jiān)控系統(tǒng)[J];計算機系統(tǒng)應用;2013年04期
本文編號:2282534
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2282534.html
最近更新
教材專著