當(dāng)前位置：主頁 > 管理論文 > 移動網(wǎng)絡(luò)論文 >

微博輿情分析中的網(wǎng)頁結(jié)構(gòu)化信息抽取技術(shù)研究

發(fā)布時間：2018-05-26 16:56

本文選題：微博 + 輿情��；參考：《北京郵電大學(xué)》2014年碩士論文

【摘要】：微博是一種基于用戶關(guān)系的信息獲取,分享和傳播的平臺。作為時下因特網(wǎng)中最流行的社交工具之一,微博在為人們帶來便捷的同時,也正在成為虛假信息滋生和泛濫的溫床。因此,針對微博的輿情監(jiān)測對于國家政府以及網(wǎng)絡(luò)監(jiān)管部門來說是十分必要的。為了能夠?qū)ξ⒉┻@一重要的輿情源進(jìn)行全局有效的分析,我們需要同時獲取當(dāng)前流行的多個微博站點(diǎn)的微博,并獲取每條微博的作者,正文,評論數(shù),轉(zhuǎn)發(fā)數(shù)等結(jié)構(gòu)化信息。針對此目的,本文提出了一種統(tǒng)一的基于層次聚類的微博網(wǎng)頁結(jié)構(gòu)化信息抽取方法。該方法可以在不借助業(yè)務(wù)提供商的API的情況下,從使用網(wǎng)絡(luò)爬蟲爬取的任意微博業(yè)務(wù)提供商的微博網(wǎng)頁中逐條采集微博的結(jié)構(gòu)化信息,為實(shí)現(xiàn)跨站點(diǎn)的全局性微博輿情分析奠定基礎(chǔ)。本文的主要工作如下：1)研究了典型的微博輿情分析系統(tǒng)所分析的輿情指標(biāo)以及系統(tǒng)架構(gòu),并提出了微博輿情分析系統(tǒng)對于微博網(wǎng)頁結(jié)構(gòu)化信息抽取模塊的要求。2)在上述的工作的基礎(chǔ)上,提出了一種統(tǒng)一的基于層次聚類的微博網(wǎng)頁結(jié)構(gòu)化信息抽取方法。該方法充分地考慮了微博網(wǎng)頁所獨(dú)有的DOM樹結(jié)構(gòu),克服了一些目前通用的Web信息抽取方法所具有的計算量大、對微博網(wǎng)頁正文體抽取不準(zhǔn)確的問題,能夠高效地、準(zhǔn)確地抽取出微博網(wǎng)頁中的結(jié)構(gòu)化信息。3)利用本文提出的方法對多家微博網(wǎng)站的網(wǎng)頁的進(jìn)行了抽取實(shí)驗(yàn),并嘗試在一個微博輿情分析實(shí)驗(yàn)系統(tǒng)中使用該方法。這些實(shí)驗(yàn)表明,本文提出的方法具有很高的準(zhǔn)確性,并且能夠滿足微博輿情分析系統(tǒng)對于微博網(wǎng)頁結(jié)構(gòu)化信息抽取模塊的要求。
[Abstract]:Weibo is a user-based information acquisition, sharing and dissemination platform. As one of the most popular social tools on the Internet, Weibo is not only bringing convenience to people, but also becoming the breeding ground of false information. Therefore, public opinion monitoring for Weibo is very necessary for national government and network supervision department. In order to analyze Weibo as an important source of public opinion globally and effectively, we need to obtain the Weibo of several popular Weibo sites at the same time, and obtain the author, text, comment number, forwarding number and other structured information of each Weibo. For this purpose, a unified hierarchical clustering method for extracting structured information from Weibo pages is proposed in this paper. This method can collect the structured information of Weibo from the Weibo pages of any Weibo service provider crawled by a web crawler without the help of the API of the service provider. For the realization of cross-site global Weibo public opinion analysis laid the foundation. The main work of this paper is as follows: 1) the public opinion index and the system structure of the typical Weibo public opinion analysis system are studied. On the basis of the above work, a unified hierarchical clustering method for extracting structured information from Weibo pages is proposed. This method fully takes into account the unique DOM tree structure of Weibo web pages, overcomes the large computational complexity of some current Web information extraction methods, and it can efficiently extract the positive style of Weibo pages. Extract the structured information from Weibo web pages accurately. 3) We use the method proposed in this paper to extract the web pages of many Weibo websites, and try to use this method in a Weibo public opinion analysis experiment system. These experiments show that the method proposed in this paper has high accuracy and can meet the requirements of Weibo public opinion analysis system for the structural information extraction module of Weibo web pages.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 楊震;段立娟;賴英旭;;基于字符串相似性聚類的網(wǎng)絡(luò)短文本輿情熱點(diǎn)發(fā)現(xiàn)技術(shù)[J];北京工業(yè)大學(xué)學(xué)報;2010年05期

2 黃美璇;;基于主題發(fā)現(xiàn)的輿情分析系統(tǒng)的設(shè)計與實(shí)現(xiàn)[J];北京聯(lián)合大學(xué)學(xué)報(自然科學(xué)版);2012年01期

3 唐果;陳宏剛;;基于BBS熱點(diǎn)主題發(fā)現(xiàn)的文本聚類方法[J];計算機(jī)工程;2010年07期

4 劉偉;嚴(yán)華梁;;一種統(tǒng)一的Web新聞對象自動抽取方法[J];計算機(jī)工程;2012年11期

5 陳釗;張冬梅;;Web信息抽取技術(shù)綜述[J];計算機(jī)應(yīng)用研究;2010年12期

6 周佳穎;朱珍民;高曉芳;;基于統(tǒng)計與正文特征的中文網(wǎng)頁正文抽取研究[J];中文信息學(xué)報;2009年05期

7 王允;李弼程;林琛;;基于網(wǎng)頁布局相似度的Web論壇數(shù)據(jù)抽取[J];中文信息學(xué)報;2010年02期

8 段曉麗;王宇;谷靜;劉瑋楠;;基于正文特征及網(wǎng)頁結(jié)構(gòu)的主題網(wǎng)頁信息抽取[J];計算機(jī)工程與應(yīng)用;2012年30期

9 陳巧;施Oz;;基于螞蟻算法的Deep Web頁面信息抽取方法研究[J];煤炭技術(shù);2013年02期

10 歐健文,董守斌,蔡斌;模板化網(wǎng)頁主題信息的提取方法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2005年S1期

，

本文編號：1938093

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1938093.html

上一篇：基于K關(guān)聯(lián)圖的流分類算法及其在微博情感分析中的應(yīng)用
下一篇：基于泛在設(shè)備的能力匯聚與開放系統(tǒng)架構(gòu)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

微博輿情分析中的網(wǎng)頁結(jié)構(gòu)化信息抽取技術(shù)研究