山東省科學(xué)院輿情系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-07 08:13
本文選題:網(wǎng)絡(luò)輿情 + 網(wǎng)絡(luò)爬蟲。 參考:《濟(jì)南大學(xué)》2014年碩士論文
【摘要】:時(shí)至今日,現(xiàn)代科技的迅速發(fā)展,迎來了互聯(lián)網(wǎng)時(shí)代,互聯(lián)網(wǎng)已經(jīng)普及到社會大眾,隨著人們接觸網(wǎng)絡(luò),越來越多的人會從網(wǎng)絡(luò)中查找自己感興趣的事或人,并且發(fā)表自己對某些事和人的看法。因此,網(wǎng)絡(luò)輿情成為了新型的社會輿論的一種重要表現(xiàn)形式。而建立了解網(wǎng)絡(luò)輿情的輿情系統(tǒng)也成為社會發(fā)展的一種必不可少的重大課題。為此,,山東省科學(xué)院輿情系統(tǒng)基于抓取有關(guān)其內(nèi)容并進(jìn)行處理,對于其不利因素能夠盡量避免從而達(dá)到提升自身發(fā)展的空間并方便其有關(guān)部門能夠更好的了解目前關(guān)于科學(xué)院的現(xiàn)狀。 本文主要工作包括:山東省科學(xué)院系統(tǒng)框架構(gòu)建及運(yùn)行環(huán)境和開發(fā)環(huán)境的搭建、敘述Nutch抓取數(shù)據(jù)信息的工作原理、數(shù)據(jù)信息的采集技術(shù)、文本內(nèi)容數(shù)據(jù)預(yù)處理。 (1)敘述本課題網(wǎng)絡(luò)輿情研究的背景及意義,介紹網(wǎng)絡(luò)輿情對社會發(fā)展的重要意義,以及目前對輿情系統(tǒng)的設(shè)計(jì)必不可少的原因并分析目前網(wǎng)絡(luò)輿情的研究現(xiàn)狀。 (2)研究網(wǎng)絡(luò)挖掘數(shù)據(jù)的Nutch采集技術(shù)、網(wǎng)絡(luò)爬蟲的工作原理。 (3)實(shí)現(xiàn)系統(tǒng)框架的搭建:其一、運(yùn)行環(huán)境的搭建是將Nutch編譯后的文件導(dǎo)入到Cygwin的模擬環(huán)境中;其二、開發(fā)環(huán)境的搭建基于Eclipse下將Nutch源碼導(dǎo)入并進(jìn)行編譯,并配置其文件。 (4)輿情系統(tǒng)的數(shù)據(jù)信息處理:對采集的信息進(jìn)行網(wǎng)頁的邏輯結(jié)構(gòu)分析,并對抓取的數(shù)據(jù)進(jìn)行信息凈化、中文分詞、文本聚類等。 最后通過對山東省科學(xué)院輿情系統(tǒng)的整體分析來確定整體系統(tǒng)的架構(gòu),并實(shí)現(xiàn)輿情系統(tǒng)。整個(gè)系統(tǒng)是將獲取的信息內(nèi)容進(jìn)行分析,通過對網(wǎng)頁數(shù)據(jù)信息的凈化、中文分詞、文本聚類等等處理技術(shù)來實(shí)現(xiàn)系統(tǒng)的關(guān)鍵內(nèi)容。
[Abstract]:Today, with the rapid development of modern science and technology, the Internet has been popularized to the public. With the contact of people with the Internet, more and more people will look up the things or people they are interested in. And express their views on certain things and people. Therefore, network public opinion has become a new type of public opinion an important form of expression. Establishing a public opinion system to understand network public opinion has become an indispensable and important issue in social development. For this reason, the public opinion system of Shandong Academy of Sciences is based on grabbing and processing its contents. To its disadvantage factors can avoid as much as possible to achieve the space of improving their own development and facilitate the relevant departments to better understand the current status of the Academy of Sciences. The main work of this paper is as follows: the construction of system framework and the operating environment and development environment of Shandong Academy of Sciences, the working principle of Nutch capture data information, the collection technology of data information, and the preprocessing of text content data. 1) narrate the background and significance of the research on network public opinion, introduce the significance of network public opinion to social development, and the essential reasons for the design of network public opinion system, and analyze the current research situation of network public opinion. This paper studies the Nutch acquisition technology of network mining data and the working principle of network crawler. First, the running environment is to import the Nutch compiled files into the simulation environment of Cygwin; secondly, the development environment is based on the Eclipse to import and compile the Nutch source code, and configure its files. 4) data information processing of public opinion system: the logical structure of the collected information is analyzed, and the captured data is purified, Chinese word segmentation, text clustering and so on. Finally, through the overall analysis of the public opinion system of Shandong Academy of Sciences to determine the overall system structure, and realize the public opinion system. The whole system is to analyze the content of the obtained information, through the purification of web page data information, Chinese word segmentation, text clustering and other processing techniques to achieve the key content of the system.
【學(xué)位授予單位】:濟(jì)南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP311.52;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 姜?jiǎng)俸?;我國網(wǎng)絡(luò)輿情的現(xiàn)狀及其引導(dǎo)[J];廣西社會科學(xué);2009年01期
2 吳麗輝 ,王斌 ,余智華;一種通用Web信息采集系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程;2005年03期
3 姚天f ;婁德成;;漢語語句主題語義傾向分析方法的研究[J];中文信息學(xué)報(bào);2007年05期
4 洪宇;張宇;劉挺;李生;;話題檢測與跟蹤的評測及研究綜述[J];中文信息學(xué)報(bào);2007年06期
5 王明文;付劍波;羅遠(yuǎn)勝;陸旭;;基于協(xié)同聚類的兩階段文本聚類方法[J];模式識別與人工智能;2009年06期
6 茍?jiān)?;聚類分析在圖書館館藏書目中的挖掘與應(yīng)用[J];內(nèi)蒙古科技與經(jīng)濟(jì);2009年13期
7 陳曉云;陳垎;王雷;李榮陸;胡運(yùn)發(fā);;基于分類規(guī)則樹的頻繁模式文本分類[J];軟件學(xué)報(bào);2006年05期
本文編號:1990490
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1990490.html
最近更新
教材專著