互聯(lián)網(wǎng)輿情信息采集分析系統(tǒng)關(guān)鍵技術(shù)研究
發(fā)布時間:2018-04-14 20:04
本文選題:輿情 + 網(wǎng)絡(luò)爬蟲; 參考:《天津大學(xué)》2012年碩士論文
【摘要】:在當(dāng)前Internet網(wǎng)絡(luò)環(huán)境日趨復(fù)雜的條件下,網(wǎng)絡(luò)輿情已經(jīng)對社會的穩(wěn)定和眾多上網(wǎng)的人們產(chǎn)生了重大的影響。網(wǎng)絡(luò)輿情發(fā)生的范圍廣,傳播的速度快,并且輿情的爆發(fā)點具有不易發(fā)現(xiàn)和控制等特點,這使得對互聯(lián)網(wǎng)中輿情信息采集和分析變得非常重要。 本文對互聯(lián)網(wǎng)中輿情信息采集系統(tǒng)的需求進行深入分析,然后將網(wǎng)絡(luò)拓?fù)浜突陉P(guān)鍵字網(wǎng)頁內(nèi)容過濾技術(shù)以及廣度優(yōu)先搜索技術(shù)設(shè)計并實現(xiàn)了一個面向輿情信息采集的垂直搜索引擎爬蟲,并采用分詞和主題詞抽取方法分析出相應(yīng)的熱點輿情專題,并實現(xiàn)對突發(fā)輿情事件、涉及內(nèi)容安全的敏感話題及時發(fā)現(xiàn)與預(yù)警,通過機器自動識別本地區(qū)的突發(fā)輿情,同時設(shè)計并實現(xiàn)了一種輿情報告半自動生成系統(tǒng)的算法,將檢索的結(jié)果數(shù)據(jù)依據(jù)關(guān)鍵詞的頻率、權(quán)重,網(wǎng)頁類別,網(wǎng)頁內(nèi)容預(yù)警,網(wǎng)頁熱度進行相關(guān)指標(biāo)進行排序,半自動生成輿情簡報。 該系統(tǒng)實現(xiàn)了對新聞網(wǎng)站、論壇網(wǎng)站、博客和貼吧等網(wǎng)站的輿情信息的有效采集,,并能實現(xiàn)對采集結(jié)果進行統(tǒng)計分析、主題分析,實現(xiàn)輿情報告的半自動輸出。
[Abstract]:With the increasing complexity of Internet network environment, network public opinion has had a great impact on social stability and many Internet users.The network public opinion has a wide range of occurrence, the speed of dissemination is fast, and the burst point of public opinion is difficult to find and control, which makes the collection and analysis of public opinion information in the Internet become very important.In this paper, the requirements of the public opinion information collection system in the Internet are deeply analyzed.Then we design and implement a vertical search engine crawler based on Web topology, keyword based content filtering technology and breadth-first search technology, which is oriented to the collection of public opinion information.And using word segmentation and theme word extraction method to analyze the corresponding hot topic of public opinion, and realize the emergency public opinion event, the sensitive topic related to the content security timely discovery and early warning, through the machine automatic identification of the sudden public opinion in the region,At the same time, an algorithm of semi-automatic generation system of public opinion report is designed and implemented. The result data are sorted according to the frequency, weight, category, early warning and heat of the page.Semi-automatic generation of public opinion briefings.The system realizes the effective collection of public opinion information of news website, forum website, blog and post bar, and can realize the statistical analysis of collection result, theme analysis and semi-automatic output of public opinion report.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP393.09
【參考文獻】
相關(guān)期刊論文 前2條
1 曾潤喜;;網(wǎng)絡(luò)輿情管控工作機制研究[J];圖書情報工作;2009年18期
2 丁振國;吳寶貴;辛友強;;基于Bloom Filter的大規(guī)模網(wǎng)頁去重策略研究[J];現(xiàn)代圖書情報技術(shù);2008年03期
本文編號:1750812
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1750812.html
最近更新
教材專著