當(dāng)前位置：主頁(yè) > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

網(wǎng)站反爬取機(jī)制的研究與應(yīng)用

發(fā)布時(shí)間：2019-06-26 17:12

【摘要】：隨著WEB技術(shù)的發(fā)展和應(yīng)用方式的多樣化,越來越多的人們開始依靠網(wǎng)絡(luò)學(xué)習(xí)、工作和生活。Web2.0的到來,萬(wàn)維網(wǎng)成為大量信息的載體,這使得互聯(lián)網(wǎng)中運(yùn)行的爬蟲日益增加。這些爬蟲占用網(wǎng)站資源,對(duì)網(wǎng)站造成很大的危害。發(fā)現(xiàn)和防范網(wǎng)絡(luò)爬蟲,建立反爬取機(jī)制是規(guī)避爬蟲對(duì)網(wǎng)站所帶來的危害的應(yīng)有做法。反爬取機(jī)制在保障網(wǎng)站的正常安全的提供訪問服務(wù),保護(hù)網(wǎng)站內(nèi)容以及用戶隱私信息,以及在做基于用戶訪問數(shù)據(jù)的數(shù)據(jù)挖掘都是有著重要的意義。本文在闡述了爬蟲原理和研究分析了現(xiàn)有的反爬取機(jī)制后,針對(duì)爬蟲的訪問特征,設(shè)計(jì)了一個(gè)實(shí)時(shí)的反爬取機(jī)制,采用服務(wù)化架構(gòu)(RPC)的方式,將反爬取檢測(cè)和原有Web服務(wù)器分開。充分利用原有Web服務(wù)器和反爬取服務(wù)器的環(huán)境優(yōu)勢(shì),減少反爬取機(jī)制對(duì)原WEB服務(wù)器的影響。在識(shí)別爬蟲時(shí)提高Web請(qǐng)求檢測(cè)的維度,增加校驗(yàn)邏輯的復(fù)雜程度,以此方式來提高爬蟲識(shí)別的準(zhǔn)確率。實(shí)驗(yàn)表明,本機(jī)制在反爬取領(lǐng)域和爬蟲識(shí)別領(lǐng)域中具有較好的效果,相對(duì)于其他的反爬取機(jī)制在實(shí)時(shí)性、準(zhǔn)確率、覆蓋率、綜合評(píng)價(jià)指標(biāo)上均有較為明顯的優(yōu)勢(shì)。
[Abstract]:With the development of WEB technology and the diversification of application methods, more and more people begin to rely on network learning, work and life. With the advent of Web 2.0, the World wide Web has become the carrier of a large number of information, which makes the number of crawlers running in the Internet increasing day by day. These crawlers occupy the website resources, causing great harm to the website. It is necessary to find and prevent network crawlers and establish anti-crawling mechanism to avoid the harm caused by crawlers to websites. Anti-crawling mechanism is of great significance in ensuring the normal security of the website, protecting the content of the website and the privacy information of users, and doing data mining based on user access data. After expounding the principle of crawler and studying and analyzing the existing anti-crawling mechanism, this paper designs a real-time anti-crawling mechanism according to the access characteristics of crawler, and separates the anti-crawling detection from the original Web server by using the service architecture (RPC). Make full use of the environmental advantages of the original Web server and the anti-crawling server, and reduce the influence of the anti-crawling mechanism on the original WEB server. In order to improve the accuracy of crawler recognition, the dimension of Web request detection is improved and the complexity of verification logic is increased in order to improve the accuracy of crawler recognition. The experimental results show that this mechanism has good results in the field of anti-crawling and crawling recognition, and has obvious advantages over other anti-crawling mechanisms in real-time, accuracy, coverage and comprehensive evaluation index.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 鄒科文;李達(dá);鄧婷敏;李嘉振;陳義明;;網(wǎng)絡(luò)爬蟲針對(duì)“反爬”網(wǎng)站的爬取策略研究[J];電腦知識(shí)與技術(shù);2016年07期

2 吳曉暉;紀(jì)星;;Web爬蟲檢測(cè)技術(shù)綜述[J];湖北汽車工業(yè)學(xué)院學(xué)報(bào);2012年01期

3 范純龍;袁濱;余周華;徐蕾;;基于陷阱技術(shù)的網(wǎng)絡(luò)爬蟲檢測(cè)[J];計(jì)算機(jī)應(yīng)用;2010年07期

4 劉慶杰;孫旭光;王小英;;通過Filter抵御網(wǎng)頁(yè)爬蟲[J];網(wǎng)絡(luò)安全技術(shù)與應(yīng)用;2010年01期

5 徐鵬;林森;;基于C4.5決策樹的流量分類方法[J];軟件學(xué)報(bào);2009年10期

6 張寧;;基于滑動(dòng)窗口的時(shí)間序列離群數(shù)據(jù)挖掘[J];燕山大學(xué)學(xué)報(bào);2008年06期

7 嚴(yán)偉;宓為建;萇道方;何軍良;;一種基于最佳優(yōu)先搜索算法的集裝箱堆場(chǎng)場(chǎng)橋調(diào)度策略[J];中國(guó)工程機(jī)械學(xué)報(bào);2008年01期

8 郭偉剛;鞠時(shí)光;;電子商務(wù)網(wǎng)站中Web Robot的檢測(cè)技術(shù)[J];計(jì)算機(jī)工程;2005年23期

9 郭偉剛,鞠時(shí)光;一個(gè)基于事務(wù)分析的Web Robot檢測(cè)算法[J];計(jì)算機(jī)應(yīng)用;2005年07期

10 梁延華,王振興;Web Robots安全策略研究[J];信息工程大學(xué)學(xué)報(bào);2003年03期

相關(guān)碩士學(xué)位論文前10條

1 林旭;基于WEB訪問日志的異常檢測(cè)技術(shù)研究[D];中國(guó)海洋大學(xué);2015年

2 黃燕紅;基于SVM算法的癌癥基因數(shù)據(jù)分類研究[D];蘇州大學(xué);2015年

3 初光磊;SVM在數(shù)據(jù)挖掘中的應(yīng)用[D];北京郵電大學(xué);2015年

4 閆明;高可用可擴(kuò)展集群化Redis設(shè)計(jì)與實(shí)現(xiàn)[D];西安電子科技大學(xué);2014年

5 史珊姍;基于決策樹C4.5算法的網(wǎng)絡(luò)入侵檢測(cè)研究[D];蘇州大學(xué);2012年

6 史晨超;基于滑動(dòng)窗口的網(wǎng)上銀行數(shù)據(jù)流頻繁模式研究[D];復(fù)旦大學(xué);2012年

7 段江麗;基于SVM的文本分類系統(tǒng)中特征選擇與權(quán)重計(jì)算算法的研究[D];太原理工大學(xué);2011年

8 余舟華;基于陷阱的spider檢測(cè)評(píng)價(jià)模型研究[D];沈陽(yáng)航空航天大學(xué);2011年

9 宋婷;基于SVM的網(wǎng)絡(luò)爬蟲檢測(cè)研究與實(shí)現(xiàn)[D];天津大學(xué);2010年

10 葉斌;分布式企業(yè)服務(wù)總線消息機(jī)制的研究與實(shí)現(xiàn)[D];浙江大學(xué);2010年

，

本文編號(hào)：2506337

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2506337.html

上一篇：強(qiáng)抗毀性社交僵尸網(wǎng)絡(luò)的構(gòu)建及其防御
下一篇：網(wǎng)絡(luò)可靠性評(píng)估模型與算法綜述

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)站反爬取機(jī)制的研究與應(yīng)用