天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

web輿情信息自動化采集系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間:2018-04-20 15:39

  本文選題:輿情采集 + 信息提取 ; 參考:《電子科技大學(xué)》2014年碩士論文


【摘要】:輿情作為群眾對于社會中存在的某些事件的觀點和態(tài)度的集合,對政府維護社會的穩(wěn)定、了解社會存在的問題,提高政府公信力有積極的作用。同時,輿情對公司準確及時掌握客戶對公司產(chǎn)品和服務(wù)的看法和建議,提升產(chǎn)品和服務(wù)的質(zhì)量,增強公司的綜合競爭力有深遠的戰(zhàn)略意義。Web 2.0的興起,為Web輿情信息的自動化采集帶來了重大發(fā)展機遇,同時也對采集技術(shù)提出了新的挑戰(zhàn)。Web信息作為輿情信息的主要載體,因此,解決該類信息的采集問題,顯得更加迫切。從現(xiàn)有的研究成果來看,Web輿情采集需要解決海量數(shù)據(jù)挖掘,數(shù)據(jù)實時分析以及數(shù)據(jù)分析的準確性等問題。本文首先對現(xiàn)有的Web信息抽取技術(shù)的國內(nèi)外研究現(xiàn)狀做了概要的總結(jié),然后對目前已有的研究成果進行了詳細的分析。結(jié)合實際項目的需要,提出了自己的web輿情信息采集方法。主要研究內(nèi)容如下:1.研究已有的信息采集模型和采集算法,并對它們的功能和優(yōu)缺點進行了對比和分析。采集模型主要包括理解模型、對象模型和視覺模型,采集算法包括本體論算法、馬爾可夫算法等,總結(jié)比較全面。2.研究并提出了可視化信息采集模板生成技術(shù),將用戶操作行為(包括點擊下一頁超鏈接或者按鈕、點擊網(wǎng)頁某個元素、下拉列表等)轉(zhuǎn)化為采集模板,降低了模板的制作難度,并提高了模板的制作效率。3.實現(xiàn)了基于DOM樹和行塊分布函數(shù)的網(wǎng)頁正文提取子系統(tǒng),應(yīng)用了xpath和正則表達式等相關(guān)技術(shù),系統(tǒng)綜合采用了統(tǒng)計與規(guī)則相結(jié)合的方法來解決系統(tǒng)的通用性問題。4.實現(xiàn)了對采集到的web信息進行聚類分析等數(shù)據(jù)處理過程,最終為用戶提供了輿情瀏覽、熱點話題發(fā)現(xiàn)等綜合輿情服務(wù)。
[Abstract]:As a collection of public opinion and attitude towards some events in society, public opinion has a positive effect on the government to maintain social stability, understand the problems existing in society, and improve the credibility of the government. At the same time, public opinion has far-reaching strategic significance for the company to accurately and timely grasp the customer's views and suggestions on the company's products and services, improve the quality of the products and services, and enhance the comprehensive competitiveness of the company. It brings great development opportunity for the automatic collection of Web public opinion information. At the same time, it also puts forward new challenges to the collection technology. Web information is the main carrier of public opinion information. Therefore, it is more urgent to solve the problem of collecting this kind of information. From the existing research results, we need to solve the problems of mass data mining, real-time data analysis and accuracy of data analysis. In this paper, the current research status of Web information extraction technology at home and abroad is summarized, and then the existing research results are analyzed in detail. According to the need of the actual project, this paper puts forward its own method of collecting web public opinion information. The main research contents are as follows: 1. The existing information collection models and algorithms are studied, and their functions, advantages and disadvantages are compared and analyzed. The acquisition model mainly includes understanding model, object model and visual model. The collection algorithm includes ontology algorithm, Markov algorithm and so on. This paper studies and puts forward the technology of creating visual information collection template, which converts the user's operation behavior (including clicking on the next page hyperlink or button, clicking on a page element, drop-down list, etc.) into a collection template, which reduces the difficulty of making the template. It also improves the efficiency of template making. A web page text extraction subsystem based on DOM tree and row block distribution function is implemented. The related techniques such as xpath and regular expression are applied. The method of combining statistics and rules is adopted to solve the universal problem of the system. The process of data processing such as clustering analysis of collected web information is realized. Finally, the comprehensive public opinion services such as browsing of public opinion, hot topic discovery and so on are provided for users.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1

【參考文獻】

相關(guān)博士學(xué)位論文 前1條

1 杜阿寧;互聯(lián)網(wǎng)輿情信息挖掘方法研究[D];哈爾濱工業(yè)大學(xué);2007年

,

本文編號:1778378

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1778378.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶cc510***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com