互聯(lián)網(wǎng)輿情監(jiān)測分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-04-10 02:35
本文選題:數(shù)據(jù)采集 切入點(diǎn):Heritrix 出處:《北京交通大學(xué)》2017年碩士論文
【摘要】:我國互聯(lián)網(wǎng)產(chǎn)業(yè)近十幾年來發(fā)展極為迅猛,由于其具有傳播速度快、受眾基數(shù)大、內(nèi)容覆蓋廣泛、社會(huì)動(dòng)員能力強(qiáng)等優(yōu)勢,互聯(lián)網(wǎng)開辟了新的社會(huì)輿論聚集地。但有些不法分子借助這種途徑向廣大人民群眾傳播虛假信息,散布反動(dòng)性言論,并已造成惡劣影響。諸如此類問題都是在信息傳播方式從傳統(tǒng)媒體向新媒體轉(zhuǎn)變過程中衍生出來的。因此加大對(duì)網(wǎng)絡(luò)輿論的監(jiān)管,增強(qiáng)政府對(duì)社會(huì)輿論正確導(dǎo)向的把控能力,對(duì)我國在新形勢下的健康發(fā)展有著至關(guān)重要的作用。本文主要介紹互聯(lián)網(wǎng)輿情監(jiān)測分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)過程。用戶可通過本系統(tǒng)獲取某自定義敏感事件的多維度輿情分析結(jié)果,如情感極性占比、事件隨時(shí)間的發(fā)展趨勢等,還可以對(duì)事件進(jìn)行預(yù)警、生成報(bào)告等操作。要對(duì)輿論做到盡可能全面的監(jiān)控,大規(guī)模數(shù)據(jù)的采集必不可少。本系統(tǒng)的數(shù)據(jù)來源包括新聞網(wǎng)站、移動(dòng)新聞客戶端、論壇等互聯(lián)網(wǎng)媒體中的公開文本信息,數(shù)據(jù)采集模塊在采用Heritrix爬蟲框架的基礎(chǔ)上進(jìn)行擴(kuò)展開發(fā)。爬蟲模塊具備近千個(gè)國內(nèi)外站點(diǎn)信息的采集能力,并可形成標(biāo)準(zhǔn)格式化文件供數(shù)據(jù)分析程序使用。海量詳情數(shù)據(jù)存儲(chǔ)采用HBase非關(guān)系型數(shù)據(jù)庫。高性能系統(tǒng)需要在盡可能短的時(shí)間內(nèi),準(zhǔn)確返回用戶想要的數(shù)據(jù)信息,這些都要依靠一個(gè)高效的搜索引擎。本文還將介紹搜索引擎Solr在系統(tǒng)文本搜索、海量數(shù)據(jù)統(tǒng)計(jì)中的應(yīng)用。Solr是一個(gè)高效的數(shù)據(jù)檢索工具,在整個(gè)互聯(lián)網(wǎng)輿情監(jiān)測分析系統(tǒng)中將承擔(dān)十分重要的工作。論文在研究國內(nèi)外數(shù)據(jù)采集和搜索引擎相關(guān)成果的基礎(chǔ)上,借鑒成熟文本情感分析產(chǎn)品的特性,運(yùn)用現(xiàn)代軟件工程管理的基本思想,提煉各類用戶故事后,形成了核心業(yè)務(wù)處理模型以及可推廣的同類產(chǎn)品通用解決方案。本系統(tǒng)已成功上線進(jìn)行商業(yè)運(yùn)作,為各級(jí)政府部門提供了便捷高效的互聯(lián)網(wǎng)輿情監(jiān)控工具,打擊了擾亂社會(huì)穩(wěn)定的網(wǎng)絡(luò)犯罪,推動(dòng)了正能量信息的傳播,為凈化網(wǎng)絡(luò)環(huán)境、抑制不良事件的發(fā)生做出了積極貢獻(xiàn)。
[Abstract]:The Internet industry in China has developed very rapidly in the past ten years. Because of its advantages such as fast communication, large audience base, wide coverage of content and strong ability of social mobilization, the Internet has opened up a new gathering place of public opinion.But some lawless elements use this way to spread false information and reactionary statements to the broad masses of the people, and have caused adverse effects.Such problems are derived from the process of information dissemination from traditional media to new media.Therefore, strengthening the supervision of network public opinion and strengthening the government's ability to control the public opinion correctly is of vital importance to the healthy development of our country under the new situation.This paper mainly introduces the design and implementation of Internet public opinion monitoring and analysis system.Through this system, users can obtain the results of multi-dimensional public opinion analysis of a self-defined sensitive event, such as the proportion of emotional polarity, the development trend of events with time, and so on.To monitor public opinion as comprehensively as possible, large-scale data collection is essential.The data sources of the system include news website, mobile news client, forum and other Internet media open text information. The data acquisition module is developed on the basis of Heritrix crawler framework.The crawler module has the ability of collecting information of nearly 1,000 domestic and foreign sites, and can form standard format files for data analysis program.HBase non-relational database is used to store mass detail data.High-performance systems need to return the data information users want in as short a time as possible, all of which depend on an efficient search engine.This paper also introduces the application of search engine Solr in the system text search and mass data statistics. Solr is an efficient data retrieval tool, which will undertake very important work in the whole Internet public opinion monitoring and analysis system.Based on the research of domestic and foreign data acquisition and search engine related achievements, this paper draws lessons from the characteristics of mature text emotional analysis products and extracts various user stories by using the basic idea of modern software engineering management.Formed the core business processing model and general-purpose solutions for similar products.The system has been successfully put online for commercial operation, providing government departments at all levels with convenient and efficient Internet public opinion monitoring tools, cracking down on network crimes that disrupt social stability, promoting the dissemination of positive energy information, and purifying the network environment.Restraining the occurrence of adverse events has made a positive contribution.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP274;TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 ;2008年3月互聯(lián)網(wǎng)輿情分析報(bào)告[J];今傳媒;2008年05期
2 許鑫;章成志;;互聯(lián)網(wǎng)輿情分析及應(yīng)用研究[J];情報(bào)科學(xué);2008年08期
3 魏麗萍;;互聯(lián)網(wǎng)輿情形成機(jī)制探析[J];濰坊學(xué)院學(xué)報(bào);2010年01期
4 陳永剛;孫卉W,
本文編號(hào):1729304
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1729304.html
最近更新
教材專著