BBS準實時輿情監(jiān)測技術(shù)研究與實現(xiàn)
本文選題:高校論壇 + 信息采集。 參考:《華中科技大學(xué)》2012年碩士論文
【摘要】:輿情監(jiān)測是當前各級部門的一項重要工作。通過輿情監(jiān)測,,獲取輿論的準確動向,已經(jīng)成為必須進行的工作。BBS作為當前重要信息傳播渠道,在輿論傳播中起了很重要的作用,研究專門針對BBS這個特定領(lǐng)域的輿情監(jiān)測也是十分有意義的工作。 目前,對于BBS輿情研究多集中在兩個方面,一是研究對校園網(wǎng)絡(luò)輿論的引導(dǎo)策略,二是研究BBS輿情監(jiān)測系統(tǒng)。已有的輿情監(jiān)測系統(tǒng)多采用基于通用搜索引擎技術(shù)的信息采集分析方法,此方法屬于主動采集信息的方法,信息采集較全,但存在信息采集周期較長,重復(fù)內(nèi)容較多的問題,不能滿足新形勢下實時、高效、準確的新需求。因此,提出了一種能夠準實時監(jiān)測BBS輿情信息的技術(shù)。該技術(shù)主要從數(shù)據(jù)采集、數(shù)據(jù)預(yù)處理和數(shù)據(jù)分析三個方面設(shè)計系統(tǒng)架構(gòu)。首先在對當前武漢地區(qū)的BBS架構(gòu)特征進行提取的基礎(chǔ)上,針對BBS架構(gòu)的特點提出了準實時采集數(shù)據(jù)的方案。然后采用HtmlParser工具對采集回來的數(shù)據(jù)進行預(yù)處理,同時采用Solr為數(shù)據(jù)庫搭建搜索引擎。最后從熱點信息提取和輿情信息提取及預(yù)警分析三個方面進行數(shù)據(jù)挖掘,得到完整的檢測系統(tǒng)。系統(tǒng)能夠準實時對BBS進行監(jiān)測,準確獲取指定內(nèi)容,高效檢索采集數(shù)據(jù),進而挖掘熱點事件。 研究不僅能夠應(yīng)用于高校輿情監(jiān)測,也可方便的擴展到主流論壇系統(tǒng)的輿情監(jiān)測。目前系統(tǒng)已經(jīng)給有關(guān)部門使用了一段時間,反映效果較好。
[Abstract]:Public opinion monitoring is an important work of departments at all levels at present. Through public opinion monitoring to obtain the accurate trend of public opinion, has become the work that must be carried out. BBS as the current important information dissemination channel, has played a very important role in the dissemination of public opinion. At present, the research on BBS public opinion is focused on two aspects, one is to study the guiding strategy of campus network public opinion, the other is to study the BBS public opinion monitoring system. The existing monitoring system of public opinion mostly adopts the information collection and analysis method based on the general search engine technology. This method belongs to the method of actively collecting information, and the information collection is more complete, but there is the problem of long period of information collection and more repeated content. Can not meet the new situation under the real-time, efficient, accurate new requirements. Therefore, a near-real-time monitoring technology for public opinion information of BBS is proposed. The system architecture is designed from three aspects: data acquisition, data preprocessing and data analysis. Firstly, based on the feature extraction of current BBS architecture in Wuhan, a quasi-real-time data acquisition scheme is proposed for the characteristics of BBS architecture. Then the HtmlParser tool is used to preprocess the collected data, and Solr is used to build the search engine for the database. Finally, data mining is carried out from three aspects: hot spot information extraction, public opinion information extraction and early warning analysis, and a complete detection system is obtained. The system can monitor the BBS in real time, obtain the specified content accurately, retrieve the collected data efficiently, and then mine the hot events. The research can not only be applied to the monitoring of public opinion in colleges and universities, but also can be conveniently extended to the monitoring of public opinion in the mainstream forum system. At present, the system has been used by relevant departments for a period of time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP393.09;TP391.3
【參考文獻】
相關(guān)期刊論文 前10條
1 李偉;黃穎;;基于HtmlParser的網(wǎng)頁信息提取[J];兵工自動化;2007年07期
2 劉毅;輿情視角下的群體性突發(fā)事件機制研究[J];湖北社會科學(xué);2005年09期
3 李舒晨;劉云;李勇;;網(wǎng)絡(luò)輿情分析中網(wǎng)頁信息預(yù)處理方案的實現(xiàn)[J];電腦與電信;2008年10期
4 周明建,高濟,李飛;基于本體論的Web信息抽取[J];計算機輔助設(shè)計與圖形學(xué)學(xué)報;2004年04期
5 劉金紅;陸余良;;主題網(wǎng)絡(luò)爬蟲研究綜述[J];計算機應(yīng)用研究;2007年10期
6 李昌清;李艷霞;李勝利;王劍;;基于動態(tài)異構(gòu)的Web信息集成網(wǎng)頁分析方法[J];計算機應(yīng)用研究;2007年12期
7 程亮;何志浩;李龍;;內(nèi)容安全監(jiān)控下的中文BBS結(jié)構(gòu)和用語研究[J];科技情報開發(fā)與經(jīng)濟;2008年01期
8 付光宇;;國外網(wǎng)絡(luò)信息資源采集研究及其啟示[J];科技情報開發(fā)與經(jīng)濟;2008年31期
9 王冬梅;;高校輿情信息與大學(xué)生思政教育[J];寧波大學(xué)學(xué)報(教育科學(xué)版);2008年02期
10 歐健文,董守斌,蔡斌;模板化網(wǎng)頁主題信息的提取方法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2005年S1期
相關(guān)碩士學(xué)位論文 前1條
1 馮穎;網(wǎng)絡(luò)輿情敏感話題發(fā)現(xiàn)平臺的研究[D];北京交通大學(xué);2009年
本文編號:2002932
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2002932.html