當(dāng)前位置：主頁 > 管理論文 > 移動網(wǎng)絡(luò)論文 >

互聯(lián)網(wǎng)業(yè)務(wù)重組與內(nèi)容提取

發(fā)布時間：2019-01-15 07:25

【摘要】：互聯(lián)網(wǎng)的迅猛發(fā)展帶動了網(wǎng)絡(luò)應(yīng)用的快速增長,互聯(lián)網(wǎng)為用戶提供了種類繁多的網(wǎng)絡(luò)業(yè)務(wù),并不斷滿足網(wǎng)絡(luò)用戶的各種需求。每天都會產(chǎn)生海量的數(shù)據(jù)信息,過濾不良信息,篩選有用的信息,具有重要的研究價值與工程意義。本文致力于網(wǎng)絡(luò)應(yīng)用的業(yè)務(wù)重組與內(nèi)容提取的研究與實現(xiàn),主要工作內(nèi)容包括三個部分,網(wǎng)絡(luò)業(yè)務(wù)重組設(shè)計與實現(xiàn)、基于正則表達式的論壇社區(qū)應(yīng)用的內(nèi)容提取與安全審計、基于DOM樹的網(wǎng)頁內(nèi)容提取與分析。本文首先介紹了HTML語言、DOM模型以及涉及到的報文采集技術(shù),數(shù)據(jù)包重組技術(shù)等關(guān)鍵技術(shù)。其次,設(shè)計與實現(xiàn)了網(wǎng)絡(luò)業(yè)務(wù)重組過程,其中介紹了數(shù)據(jù)包重組過程,并使用了libnids開源庫實現(xiàn)了TCP會話重組,并對HTTP數(shù)據(jù)進行了壓縮解碼與塊解碼,得到了web頁面。再次,采集幾十種熱門論壇通信數(shù)據(jù),通過分析得到了幾種常用的論壇通用系統(tǒng),并提取了論壇識別特征,提出了論壇指紋概念,優(yōu)化了傳統(tǒng)的論壇審計方法。最后,結(jié)合網(wǎng)頁特點與提取信息的特征,提出了基于DOM的網(wǎng)頁提取方法：對網(wǎng)頁進行預(yù)處理,選擇標(biāo)簽作為網(wǎng)頁提取特征,通過構(gòu)建DOM樹,實現(xiàn)了對網(wǎng)頁內(nèi)容的快速提取。通過這個方法完成了網(wǎng)絡(luò)辦公管理服務(wù)系統(tǒng)的軟件版本跟蹤模塊,并分析了網(wǎng)頁特征提取方法與網(wǎng)頁特點。
[Abstract]:With the rapid development of the Internet, the rapid growth of network applications, the Internet provides users with a wide variety of network services, and constantly meet the needs of network users. It has important research value and engineering significance to produce massive data information, filter bad information and filter useful information every day. This paper is devoted to the research and implementation of business reorganization and content extraction of network application. The main work includes three parts: design and implementation of network business reorganization, content extraction and security audit of forum community application based on regular expression. Web content extraction and analysis based on DOM tree. This paper first introduces the HTML language, DOM model, packet collection technology, packet recombination technology and other key technologies. Secondly, this paper designs and implements the process of network business reorganization, which introduces the process of packet recombination, and uses libnids open source library to realize TCP session reconfiguration. The HTTP data is compressed and decoded, and the web page is obtained. Thirdly, through the analysis of dozens of popular forum communication data, several common forum systems are obtained, and the forum identification features are extracted, the concept of forum fingerprint is proposed, and the traditional forum auditing method is optimized. Finally, combining the characteristics of web pages and the features of extracting information, a method of web page extraction based on DOM is put forward: preprocessing the web pages, selecting tags as the feature of page extraction, and constructing the DOM tree to quickly extract the content of the web pages. Through this method, the software version tracking module of the network office management service system is completed, and the method of feature extraction and the feature of the web page are analyzed.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP393.092

【參考文獻】

相關(guān)期刊論文前4條

1 溫曙光;謝高崗;;libpcap-MT:一種多線程的通用數(shù)據(jù)包捕獲庫[J];計算機研究與發(fā)展;2011年05期

2 馬如林;蔣華;張慶霞;;一種哈希表快速查找的改進方法[J];計算機工程與科學(xué);2008年09期

3 姚光開,于永棠,柴喬林;微型TCP/IP協(xié)議棧的設(shè)計與實現(xiàn)[J];計算機應(yīng)用;2003年09期

4 林延福,楊新旭,李學(xué)干;網(wǎng)絡(luò)內(nèi)容審計及其關(guān)鍵技術(shù)的研究[J];現(xiàn)代電子技術(shù);2005年02期

，

本文編號：2408982

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2408982.html

上一篇：網(wǎng)絡(luò)時間隱蔽通道的擬合模型特性研究
下一篇：IP定位技術(shù)的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

互聯(lián)網(wǎng)業(yè)務(wù)重組與內(nèi)容提取