基于內(nèi)容的新浪微博輿情預(yù)測(cè)研究
發(fā)布時(shí)間:2018-08-18 09:33
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,網(wǎng)絡(luò)成為了人們獲取信息和發(fā)表意見(jiàn)的重要載體。新浪微博以其短小精悍、表達(dá)方式簡(jiǎn)單等特征,吸引了大量的用戶。當(dāng)今的新浪微博月活兩億以上,日活達(dá)到千萬(wàn)數(shù)量,微博用戶每時(shí)每刻在平臺(tái)上進(jìn)行大量的博文輸出,用戶轉(zhuǎn)評(píng)贊活躍。微博在給信息傳播和熱點(diǎn)討論帶來(lái)便利的同時(shí)也給虛假信息的滋生創(chuàng)造了條件,負(fù)面、虛假信息的傳播不僅會(huì)擾亂和諧的網(wǎng)絡(luò)環(huán)境也會(huì)給社會(huì)帶來(lái)負(fù)面的影響。然而微博平臺(tái)數(shù)據(jù)龐大,如果僅依靠人為的操作和管理不僅獲取的信息量有限而且會(huì)消耗大量的人力物力。輿情監(jiān)控系統(tǒng)既可以實(shí)現(xiàn)及時(shí)地發(fā)現(xiàn)熱點(diǎn)事件,又可將整個(gè)監(jiān)控過(guò)程平臺(tái)化、自動(dòng)化,實(shí)現(xiàn)了高效地運(yùn)作。本文使用文本挖掘的相關(guān)技術(shù),實(shí)現(xiàn)了對(duì)海量博文的分類和聚類。在文本向量化階段使用分布式卡方特征提取法降維,tfidf值計(jì)算權(quán)重。采用支持向量機(jī)的分類方法和kmeans的聚類方法。在文本分類和聚類的基礎(chǔ)上形成事件。通過(guò)博文總量的轉(zhuǎn)發(fā)、評(píng)論和點(diǎn)贊數(shù)計(jì)算事件熱度。最終形成熱點(diǎn)事件的監(jiān)控?cái)?shù)據(jù)。并可實(shí)現(xiàn)歷史事件的數(shù)據(jù)分析與展示。本文在之前輿情研究的基礎(chǔ)上,實(shí)現(xiàn)了基于內(nèi)容的輿情監(jiān)控系統(tǒng),并在事件聚類之前進(jìn)行了類別的劃分,使得監(jiān)控的事件覆蓋度更廣,內(nèi)容更加豐富。
[Abstract]:With the rapid development of the Internet, the Internet has become an important carrier for people to obtain information and express their opinions. Sina Weibo to its short, simple expression and other characteristics, attracted a large number of users. Nowadays, Sina Weibo has more than 200 million active users every month, and millions of active users every day. Weibo users carry out a large number of blog posts on the platform every moment of the day. Weibo not only brings convenience to information dissemination and hot discussion, but also creates conditions for the breeding of false information. The spread of false information not only disturbs the harmonious network environment, but also brings negative influence to the society. However, the data of Weibo platform is huge, if it only depends on artificial operation and management, not only the amount of information obtained is limited, but also a lot of manpower and material resources will be consumed. The monitoring system of public opinion can not only discover hot events in time, but also make the whole monitoring process platform and automate, and realize efficient operation. In this paper, the text mining technology is used to realize the classification and clustering of massive blog articles. In the phase of text vectorization, the distributed chi-square feature extraction method is used to reduce the dimension and tfidf value to calculate the weight. Support vector machine classification method and kmeans clustering method are adopted. Events are formed on the basis of text classification and clustering. The heat of events is calculated by forwarding, commenting, and counting the total amount of blog posts. Finally, the monitoring data of hot spot events are formed. And can realize the historical event data analysis and display. Based on the previous research of public opinion, this paper implements a content-based monitoring system for public opinion, and classifies categories before event clustering, which makes the coverage of monitoring events wider and the content more abundant.
【學(xué)位授予單位】:首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:G206;C912.63
[Abstract]:With the rapid development of the Internet, the Internet has become an important carrier for people to obtain information and express their opinions. Sina Weibo to its short, simple expression and other characteristics, attracted a large number of users. Nowadays, Sina Weibo has more than 200 million active users every month, and millions of active users every day. Weibo users carry out a large number of blog posts on the platform every moment of the day. Weibo not only brings convenience to information dissemination and hot discussion, but also creates conditions for the breeding of false information. The spread of false information not only disturbs the harmonious network environment, but also brings negative influence to the society. However, the data of Weibo platform is huge, if it only depends on artificial operation and management, not only the amount of information obtained is limited, but also a lot of manpower and material resources will be consumed. The monitoring system of public opinion can not only discover hot events in time, but also make the whole monitoring process platform and automate, and realize efficient operation. In this paper, the text mining technology is used to realize the classification and clustering of massive blog articles. In the phase of text vectorization, the distributed chi-square feature extraction method is used to reduce the dimension and tfidf value to calculate the weight. Support vector machine classification method and kmeans clustering method are adopted. Events are formed on the basis of text classification and clustering. The heat of events is calculated by forwarding, commenting, and counting the total amount of blog posts. Finally, the monitoring data of hot spot events are formed. And can realize the historical event data analysis and display. Based on the previous research of public opinion, this paper implements a content-based monitoring system for public opinion, and classifies categories before event clustering, which makes the coverage of monitoring events wider and the content more abundant.
【學(xué)位授予單位】:首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:G206;C912.63
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 楊愛(ài)東;劉東蘇;;基于Hadoop的微博輿情監(jiān)控系統(tǒng)模型研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2016年05期
2 余秀才;;微博輿情研究中的大數(shù)據(jù)風(fēng)險(xiǎn)與挑戰(zhàn)[J];華中科技大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2015年05期
3 蘭月新;董希琳;蘇國(guó)強(qiáng);瞿志凱;;大數(shù)據(jù)背景下微博輿情信息交互模型研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2015年05期
4 李天龍;李明德;張宏邦;;微博輿情生成機(jī)制研究[J];情報(bào)雜志;2014年09期
5 唐曉波;童海燕;嚴(yán)承希;;基于話題情感強(qiáng)度的微博輿情分析[J];圖書(shū)館學(xué)研究;2014年17期
6 張s,
本文編號(hào):2189067
本文鏈接:http://sikaile.net/shekelunwen/shgj/2189067.html
最近更新
教材專著