當(dāng)前位置：主頁(yè) > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

微博數(shù)據(jù)挖掘可視化系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間：2019-05-23 14:51

【摘要】：隨著移動(dòng)通訊網(wǎng)絡(luò)環(huán)境的不斷完善以及智能手機(jī)的進(jìn)一步普及,我國(guó)互聯(lián)網(wǎng)已全面進(jìn)入Web2.0時(shí)代。作為Web2.0的典型代表,微博擁有大量的活躍用戶,內(nèi)容覆蓋領(lǐng)域廣泛,社會(huì)影響力巨大。微博已經(jīng)成為了人們獲取信息、分享觀點(diǎn)的重要渠道,海量數(shù)據(jù)背后蘊(yùn)藏著巨大的學(xué)術(shù)研究?jī)r(jià)值。因此,本文以微博為研究對(duì)象,圍繞微博數(shù)據(jù)的采集、挖掘、情感分析和可視化進(jìn)行研究,設(shè)計(jì)并實(shí)現(xiàn)基于微博的數(shù)據(jù)挖掘可視化系統(tǒng)。本文的主要工作包括:(1)在數(shù)據(jù)采集方面,設(shè)計(jì)并實(shí)現(xiàn)了微博爬蟲系統(tǒng)。該系統(tǒng)采用模擬登錄解決身份認(rèn)證問(wèn)題,參考廣度優(yōu)先搜索的思想,利用熱門微博監(jiān)測(cè)模塊實(shí)現(xiàn)高質(zhì)量用戶的自動(dòng)發(fā)現(xiàn),并結(jié)合網(wǎng)絡(luò)爬蟲、BeautifulSoup、正則表達(dá)式、多線程并發(fā)和數(shù)據(jù)庫(kù)等技術(shù)實(shí)現(xiàn)多種用戶信息和微博信息的采集。該爬蟲系統(tǒng)解決了信息采集不全面、請(qǐng)求微博服務(wù)器過(guò)于頻繁的問(wèn)題,實(shí)現(xiàn)了微博數(shù)據(jù)全面高效的獲取。(2)在數(shù)據(jù)挖掘方面,設(shè)計(jì)并實(shí)現(xiàn)了微博數(shù)據(jù)挖掘的用戶分析模塊和微博分析模塊,提供了微博分析的基本功能,并基于機(jī)器學(xué)習(xí)算法重點(diǎn)研究了微博文本的情感分析,設(shè)計(jì)并實(shí)現(xiàn)了分類器的訓(xùn)練實(shí)驗(yàn)。本文分別采用“單詞”、“雙詞”和“單詞雙詞結(jié)合”三種特征抽取模型,利用卡方統(tǒng)計(jì)算法進(jìn)行特征選擇,并采用樸素貝葉斯、邏輯回歸和支持向量機(jī)等六種分類算法進(jìn)行對(duì)比實(shí)驗(yàn)。通過(guò)反復(fù)實(shí)驗(yàn)比較,獲得了最優(yōu)的分類模型。該模型無(wú)論對(duì)微博文本還是更短小的評(píng)論文本均得到了較好的分類效果。(3)在數(shù)據(jù)可視化方面,本文采用柱狀圖、折線圖、地圖、標(biāo)簽云、餅圖、儀表盤等多種可視化圖表展示數(shù)據(jù)分析結(jié)果,并通過(guò)瀏覽器呈現(xiàn)。本系統(tǒng)采用B/S結(jié)構(gòu),前端采用瀏覽器展示分析結(jié)果,后臺(tái)由微博爬蟲、MySQL關(guān)系型數(shù)據(jù)庫(kù)、數(shù)據(jù)挖掘模塊組合實(shí)現(xiàn)數(shù)據(jù)的采集、處理和分析功能。最終,實(shí)現(xiàn)微博的數(shù)據(jù)挖掘與可視化分析。本文的主要貢獻(xiàn)和創(chuàng)新點(diǎn)包括:(1)設(shè)計(jì)并實(shí)現(xiàn)了包括微博數(shù)據(jù)采集、數(shù)據(jù)挖掘和數(shù)據(jù)可視化的微博分析系統(tǒng)。搭建的系統(tǒng)實(shí)現(xiàn)了用戶分析和微博分析等功能,為后續(xù)研究提供了基礎(chǔ)平臺(tái)。(2)系統(tǒng)實(shí)現(xiàn)了微博文本的情感分析功能,本文采用機(jī)器學(xué)習(xí)算法訓(xùn)練情感分析模型,該模型的正確率達(dá)到85%,AUC值達(dá)到0.94。系統(tǒng)可以直接調(diào)用該分類器實(shí)現(xiàn)微博文本的情感分析。
[Abstract]:With the continuous improvement of mobile communication network environment and the further popularization of smart phones, China's Internet has entered the Web2.0 era in an all-round way. As a typical representative of Web2.0, Weibo has a large number of active users, a wide range of content coverage, great social influence. Weibo has become an important channel for people to obtain information and share views, and there is a huge academic research value behind the massive data. Therefore, this paper takes Weibo as the research object, focuses on Weibo data collection, mining, emotional analysis and visualization, and designs and implements a data mining visualization system based on Weibo. The main work of this paper is as follows: (1) in the aspect of data acquisition, the Weibo crawler system is designed and implemented. The system adopts simulated login to solve the identity authentication problem, refers to the idea of breadth first search, uses the popular Weibo monitoring module to realize the automatic discovery of high quality users, and combines the network crawler and BeautifulSoup, regular expression. Multi-thread concurrency and database technology realize the collection of various user information and Weibo information. The crawler system solves the problem that the information collection is not comprehensive and the Weibo server is too frequent, and realizes the comprehensive and efficient acquisition of Weibo data. (2) in the aspect of data mining, The user analysis module and Weibo analysis module of Weibo data mining are designed and implemented, which provides the basic functions of Weibo analysis, and focuses on the emotional analysis of Weibo text based on machine learning algorithm. The training experiment of classifiers is designed and implemented. In this paper, three feature extraction models, "word", "double word" and "word double word combination", are used to select the features by using chi-square statistical algorithm, and naive Bays is used. Six classification algorithms, such as logical regression and support vector machine, are compared. Through repeated experiments and comparison, the optimal classification model is obtained. The model has a good classification effect on Weibo text or shorter comment text. (3) in the aspect of data visualization, this paper adopts histogram, broken line chart, map, label cloud, cake chart, Dashboard and other visual charts show the results of data analysis, and presented through the browser. The system adopts B / S structure, the front end adopts browser to display the analysis results, and the background is composed of Weibo crawler, MySQL relational database and data mining module to realize the function of data collection, processing and analysis. Finally, the data mining and visual analysis of Weibo are realized. The main contributions and innovations of this paper are as follows: (1) A Weibo analysis system including Weibo data acquisition, data mining and data visualization is designed and implemented. The system realizes the functions of user analysis and Weibo analysis, and provides a basic platform for follow-up research. (2) the system realizes the emotional analysis function of Weibo text. In this paper, the machine learning algorithm is used to train the emotional analysis model. The correct rate of the model is 85%, and the AUC value is 0.94. The system can directly call the classifier to realize the emotional analysis of Weibo text.
【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP393.092

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 黃源,張福炎;數(shù)據(jù)挖掘及其技術(shù)實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2001年12期

2 香麗蕓;淺談數(shù)據(jù)挖掘及其應(yīng)用[J];昌吉師專學(xué)報(bào);2001年02期

3 鄭雪燕,張杰明,岳洋;數(shù)據(jù)挖掘語(yǔ)言[J];計(jì)算機(jī)時(shí)代;2001年11期

4 劉明晶;數(shù)據(jù)挖掘[J];華南金融電腦;2001年04期

5 張偉;劉勇國(guó);彭軍;廖曉峰;吳中福;;數(shù)據(jù)挖掘發(fā)展研究[J];計(jì)算機(jī)科學(xué);2001年07期

6 鐘曉;馬少平;張鈸;俞瑞釗;;數(shù)據(jù)挖掘綜述[J];模式識(shí)別與人工智能;2001年01期

7 朱建平,張潤(rùn)楚;數(shù)據(jù)挖掘的發(fā)展及其特點(diǎn)[J];統(tǒng)計(jì)與決策;2002年07期

8 傅嵐;在數(shù)據(jù)海洋中打撈信息數(shù)據(jù)挖掘[J];科技廣場(chǎng);2002年11期

9 李峻;數(shù)據(jù)挖掘,企業(yè)洞察先機(jī)的“慧眼”[J];中國(guó)計(jì)算機(jī)用戶;2002年48期

10 羅可,蔡碧野,卜勝賢,謝中科;數(shù)據(jù)挖掘及其發(fā)展研究[J];計(jì)算機(jī)工程與應(yīng)用;2002年14期

相關(guān)會(huì)議論文前10條

1 史東輝;蔡慶生;張春陽(yáng);;一種新的數(shù)據(jù)挖掘多策略方法研究[A];第十七屆全國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2000年

2 張弦;;數(shù)據(jù)挖掘在農(nóng)業(yè)中的應(yīng)用[A];紀(jì)念中國(guó)農(nóng)業(yè)工程學(xué)會(huì)成立30周年暨中國(guó)農(nóng)業(yè)工程學(xué)會(huì)2009年學(xué)術(shù)年會(huì)（CSAE 2009）論文集[C];2009年

3 魏順平;;教育數(shù)據(jù)挖掘:現(xiàn)狀與趨勢(shì)[A];信息化、工業(yè)化融合與服務(wù)創(chuàng)新——第十三屆計(jì)算機(jī)模擬與信息技術(shù)學(xué)術(shù)會(huì)議論文集[C];2011年

4 關(guān)清平;沉培輝;;概率網(wǎng)絡(luò)在數(shù)據(jù)挖掘上的應(yīng)用[A];科技、工程與經(jīng)濟(jì)社會(huì)協(xié)調(diào)發(fā)展——中國(guó)科協(xié)第五屆青年學(xué)術(shù)年會(huì)論文集[C];2004年

5 丁瑾;;基于Web數(shù)據(jù)挖掘的綜述[A];山西省科學(xué)技術(shù)情報(bào)學(xué)會(huì)學(xué)術(shù)年會(huì)論文集[C];2004年

6 聶茹;田森平;;Web數(shù)據(jù)挖掘及其在電子商務(wù)中的應(yīng)用[A];中南六�。▍^(qū)）自動(dòng)化學(xué)會(huì)第24屆學(xué)術(shù)年會(huì)會(huì)議論文集[C];2006年

7 李菊;王軍;;數(shù)據(jù)挖掘在客戶關(guān)系管理的應(yīng)用[A];計(jì)算機(jī)技術(shù)與應(yīng)用進(jìn)展·2007——全國(guó)第18屆計(jì)算機(jī)技術(shù)與應(yīng)用（CACIS）學(xué)術(shù)會(huì)議論文集[C];2007年

8 肖陽(yáng);李啟賢;;數(shù)據(jù)挖掘在中國(guó)鋼鐵行業(yè)中的應(yīng)用[A];中國(guó)計(jì)量協(xié)會(huì)冶金分會(huì)2012年會(huì)暨能源計(jì)量與節(jié)能降耗經(jīng)驗(yàn)交流會(huì)論文集[C];2012年

9 楊磊;王貴成;汪勇;張占勝;;SQL Server 2005在數(shù)據(jù)挖掘中的應(yīng)用[A];2009年中國(guó)智能自動(dòng)化會(huì)議論文集（第二分冊(cè)）[C];2009年

10 謝中;邱玉輝;;面向商務(wù)網(wǎng)站有效性的數(shù)據(jù)挖掘方法[A];第十八屆全國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2001年

相關(guān)重要報(bào)紙文章前10條

1 本報(bào)記者褚寧;數(shù)據(jù)挖掘如“挖金”[N];解放日?qǐng)?bào);2002年

2 周蓉蓉;數(shù)據(jù)挖掘需要點(diǎn)想像力[N];計(jì)算機(jī)世界;2004年

3 □中國(guó)電信股份有限公司北京研究院張舒博 □北京郵電大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院牛琨;走出數(shù)據(jù)挖掘的誤區(qū)[N];人民郵電;2006年

4 《網(wǎng)絡(luò)世界》記者王瑩;數(shù)據(jù)挖掘保險(xiǎn)業(yè)的新藍(lán)海[N];網(wǎng)絡(luò)世界;2012年

5 劉俊麗;基于地理化的網(wǎng)絡(luò)數(shù)據(jù)挖掘與分析提升投資有效性[N];人民郵電;2014年

6 本報(bào)記者連曉東;數(shù)據(jù)挖掘：金融信息化新熱點(diǎn)[N];中國(guó)電子報(bào);2002年

7 本報(bào)記者鳳小華朱仁康;“數(shù)字挖掘軟件”引領(lǐng)中國(guó)信息化新浪潮[N];中國(guó)電子報(bào);2003年

8 本報(bào)記者　史延廷;“成功企業(yè)數(shù)據(jù)挖掘暨數(shù)量化管理論壇”在京舉辦[N];中國(guó)旅游報(bào);2002年

9 朱小寧;數(shù)據(jù)挖掘：信息化戰(zhàn)爭(zhēng)的基礎(chǔ)工程[N];解放軍報(bào);2005年

10 本報(bào)記者王小平;從“大集中”走向數(shù)據(jù)挖掘[N];金融時(shí)報(bào);2002年

相關(guān)博士學(xué)位論文前10條

1 于自強(qiáng);海量流數(shù)據(jù)挖掘相關(guān)問(wèn)題研究[D];山東大學(xué);2015年

2 張馨;全基因組SNP芯片應(yīng)用于CNV和L0H分析的軟件比對(duì)與數(shù)據(jù)挖掘[D];復(fù)旦大學(xué);2011年

3 彭計(jì)紅;基于數(shù)據(jù)挖掘的癡呆中醫(yī)證的研究[D];南京中醫(yī)藥大學(xué);2015年

4 李秋虹;基于MapReduce的大規(guī)模數(shù)據(jù)挖掘技術(shù)研究[D];復(fù)旦大學(xué);2013年

5 鄔文帥;基于多目標(biāo)決策的數(shù)據(jù)挖掘方法評(píng)估與應(yīng)用[D];電子科技大學(xué);2015年

6 謝邦彥;整合數(shù)據(jù)挖掘與TRIZ理論的質(zhì)量管理方法研究[D];首都經(jīng)濟(jì)貿(mào)易大學(xué);2010年

7 何偉全;云南高校學(xué)生意外傷害因素關(guān)聯(lián)規(guī)則挖掘及風(fēng)險(xiǎn)管控體系研究[D];昆明理工大學(xué);2015年

8 段功豪;基于多結(jié)構(gòu)數(shù)據(jù)挖掘的滑坡災(zāi)害預(yù)測(cè)模型研究[D];中國(guó)地質(zhì)大學(xué);2016年

9 白曉明;基于數(shù)據(jù)挖掘的復(fù)合材料宏—細(xì)觀力學(xué)模型研究[D];哈爾濱工業(yè)大學(xué);2016年

10 藍(lán)永豪（LAM Wing Ho）;基于數(shù)據(jù)挖掘技術(shù)分析當(dāng)代中醫(yī)名家痤瘡驗(yàn)方經(jīng)驗(yàn)研究[D];南京中醫(yī)藥大學(xué);2016年

相關(guān)碩士學(xué)位論文前10條

1 林仁紅;基于數(shù)據(jù)挖掘的機(jī)遇識(shí)別與評(píng)價(jià)研究[D];首都經(jīng)濟(jì)貿(mào)易大學(xué);2007年

2 張彥俊;游戲運(yùn)營(yíng)中的數(shù)據(jù)挖掘[D];復(fù)旦大學(xué);2011年

3 焦亞召;基于多核函數(shù)FCM算法在數(shù)據(jù)挖掘聚類中的應(yīng)用研究[D];昆明理工大學(xué);2015年

4 王杰鋒;物聯(lián)網(wǎng)能耗數(shù)據(jù)智能分析及其應(yīng)用平臺(tái)設(shè)計(jì)[D];江南大學(xué);2015年

5 劉學(xué)建;數(shù)據(jù)挖掘在電子商務(wù)推薦系統(tǒng)中的應(yīng)用研究[D];昆明理工大學(xué);2015年

6 戴陽(yáng)陽(yáng);基于數(shù)據(jù)挖掘的金融時(shí)間序列預(yù)測(cè)研究與應(yīng)用[D];江南大學(xué);2015年

7 石思優(yōu);基于主題模型的醫(yī)療數(shù)據(jù)挖掘研究[D];廣東技術(shù)師范學(xué)院;2015年

8 陳丹;移動(dòng)互聯(lián)網(wǎng)信令挖掘?qū)崿F(xiàn)智慧營(yíng)銷的設(shè)計(jì)與實(shí)現(xiàn)應(yīng)用研究[D];華南理工大學(xué);2015年

9 陳思;基于數(shù)據(jù)挖掘的大學(xué)生客戶識(shí)別模型的研究[D];昆明理工大學(xué);2015年

10 位長(zhǎng)帥;基于客戶數(shù)據(jù)挖掘的電信客戶關(guān)系管理研究[D];西南交通大學(xué);2015年

，

本文編號(hào)：2483992

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2483992.html

上一篇：網(wǎng)絡(luò)定向廣告投放系統(tǒng)研究
下一篇：跨層優(yōu)化的多路友好傳輸研究與設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

微博數(shù)據(jù)挖掘可視化系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)