天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)頁防抓取系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-04-03 06:39

  本文選題:防抓取 切入點(diǎn):網(wǎng)絡(luò)爬蟲 出處:《哈爾濱工業(yè)大學(xué)》2015年碩士論文


【摘要】:某公司是中國領(lǐng)先的在線旅游平臺,機(jī)票搜索交易平臺是其中的重要基礎(chǔ)平臺之一,搜索范圍覆蓋全球范圍內(nèi)約18余萬條航線,可實(shí)時(shí)搜索4000多家旅游代理商網(wǎng)站,同時(shí)其2014年度的機(jī)票交易也突破了8000萬張。然而在業(yè)務(wù)量持續(xù)增長的同時(shí),機(jī)票搜索交易平臺及其相關(guān)業(yè)務(wù)系統(tǒng)都面臨著各類外部來源的信息抓取所帶來的壓力,大量的抓取請求帶來了一系列嚴(yán)峻的問題:○1數(shù)據(jù)安全問題,面對非正常的抓取訪問的,關(guān)鍵數(shù)據(jù)存在被競爭對手獲取的風(fēng)險(xiǎn);○2系統(tǒng)性能問題,大量的抓取請求造成服務(wù)器資源的耗盡,嚴(yán)重影響用戶的搜索和交易體驗(yàn);○3不同的業(yè)務(wù)系統(tǒng)重復(fù)對防抓取進(jìn)行實(shí)現(xiàn),且實(shí)現(xiàn)質(zhì)量良莠不齊,形成了資源的浪費(fèi)。論文通過對網(wǎng)絡(luò)爬蟲和防抓取相關(guān)技術(shù)的深入研究,設(shè)計(jì)并實(shí)現(xiàn)了網(wǎng)頁防抓取系統(tǒng)(Web Anti-Crawling System,ACS)。ACS系統(tǒng)為公司的機(jī)票搜索交易平臺及其下面的多個(gè)業(yè)務(wù)項(xiàng)目提供了統(tǒng)一的、高質(zhì)量的防抓取服務(wù),實(shí)現(xiàn)了HTTP協(xié)議頭、JS加密串、IP黑名單、訪問頻率控制等防抓取策略;通過對機(jī)票搜索交易平臺業(yè)務(wù)的深入了解,實(shí)現(xiàn)了業(yè)務(wù)邏輯相關(guān)的行為模式防抓取策略,進(jìn)一步提高了抓取所需的成本;另外,ACS系統(tǒng)對策略接口、防抓取服務(wù)接口的設(shè)計(jì),使得API接口與實(shí)現(xiàn)分離,不僅具有良好的拓展性,同時(shí)也降低與業(yè)務(wù)系統(tǒng)之間的耦合性,便于防抓取服務(wù)的接入。Anti-Crawling System為上述由抓取帶來的問題提供了一個(gè)解決方案。整個(gè)防抓取系統(tǒng)經(jīng)過一定的功能測試和性能測試,確定論文中所述的五個(gè)防抓取策略已經(jīng)可以正常工作,滿足系統(tǒng)預(yù)期的功能需求;ACS系統(tǒng)與其他業(yè)務(wù)系統(tǒng)耦合度低,非常易于防抓取服務(wù)的接入;同時(shí)在性能測試過程中,整個(gè)防抓取系統(tǒng)能夠穩(wěn)定地提供服務(wù)且能達(dá)到預(yù)期的性能要求。目前ACS系統(tǒng)已經(jīng)正式投入實(shí)際使用和運(yùn)行。
[Abstract]:A company is a leading online travel platform in China, and the ticket search and transaction platform is one of the important basic platforms. The search scope covers more than 180,000 routes around the world, and it can search more than 4000 travel agent websites in real time.At the same time, its 2014 air ticket transactions also broke through 80 million.However, while the volume of business continues to grow, ticket search and transaction platforms and their related business systems are facing the pressure of information capture from all kinds of external sources.A large number of fetching requests have brought a series of serious problems: 01 data security problems. Faced with abnormal grab access, critical data has the risk of being acquired by competitors.A large number of crawling requests lead to the exhaustion of server resources, which seriously affect the user's search and transaction experience. Different business systems repeat the implementation of anti-grab, and the quality of the implementation is uneven, resulting in a waste of resources.Based on the deep research of web crawler and anti-grabbing technology, this paper designs and implements the web Anti-Crawling system ACS.ACS system provides a unified platform for the airline ticket search and transaction platform and several business items below it.The high quality anti-grab service realizes the anti-grab strategy of HTTP protocol, such as JS encryption, IP blacklist, access frequency control and so on, through in-depth understanding of the business of air ticket search and transaction platform,In addition, the design of the policy interface and the anti-grab service interface of the API system makes the API interface separate from the implementation.It not only has good expansibility, but also reduces the coupling with the service system. It is convenient to access. Anti-Crawling System to provide a solution for the above problems caused by the grab.After a certain function test and performance test, the whole anti-grab system determines that the five anti-grab strategies mentioned in the paper can work normally, and meet the expected functional requirements of the system, and the coupling degree between ACS system and other business systems is low.It is very easy to access the anti-grab service, and in the process of performance testing, the whole anti-grab system can provide the service stably and meet the expected performance requirements.At present, ACS system has been put into practical use and operation.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 范純龍;袁濱;余周華;徐蕾;;基于陷阱技術(shù)的網(wǎng)絡(luò)爬蟲檢測[J];計(jì)算機(jī)應(yīng)用;2010年07期

2 梁雪松;張容;;網(wǎng)絡(luò)爬蟲對網(wǎng)絡(luò)安全的影響及其對策分析[J];計(jì)算機(jī)與數(shù)字工程;2009年12期

3 周中華;張惠然;謝江;;基于Python的新浪微博數(shù)據(jù)爬蟲[J];計(jì)算機(jī)應(yīng)用;2014年11期

4 李璐;張國印;李正文;;基于SVM的主題爬蟲技術(shù)研究[J];計(jì)算機(jī)科學(xué);2015年02期

相關(guān)碩士學(xué)位論文 前4條

1 宋婷;基于SVM的網(wǎng)絡(luò)爬蟲檢測研究與實(shí)現(xiàn)[D];天津大學(xué);2010年

2 劉嘯;基于Cookie欺騙的Session滲透入侵分析及其安全模型研究[D];浙江大學(xué);2003年

3 蘇旋;分布式網(wǎng)絡(luò)爬蟲技術(shù)的研究與實(shí)現(xiàn)[D];哈爾濱工業(yè)大學(xué);2006年

4 林樂彬;Inar網(wǎng)絡(luò)爬蟲的設(shè)計(jì)與實(shí)現(xiàn)[D];哈爾濱工業(yè)大學(xué);2006年

,

本文編號:1704044

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1704044.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶5a71b***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
欧美午夜视频免费观看| 国产精品色热综合在线| 欧美高潮喷吹一区二区| 夫妻性生活动态图视频| 高潮少妇高潮久久精品99| 亚洲国产精品肉丝袜久久| 国产精品亚洲欧美一区麻豆| 亚洲综合精品天堂夜夜| 91麻豆精品欧美视频| 亚洲成人免费天堂诱惑| 欧美亚洲美女资源国产| 一本色道久久综合狠狠躁| 日韩精品在线观看一区| 中文久久乱码一区二区| 麻豆剧果冻传媒一二三区| 国产偷拍盗摄一区二区| 老司机精品视频在线免费看| 日韩精品一区二区三区av在线| 激情爱爱一区二区三区| 久草热视频这里只有精品 | 人妻亚洲一区二区三区| 少妇肥臀一区二区三区| 99日韩在线视频精品免费| 久久精品a毛片看国产成人| 亚洲中文字幕剧情在线播放| 99少妇偷拍视频在线| 欧美91精品国产自产| 午夜成年人黄片免费观看| 亚洲最新的黄色录像在线| 色婷婷亚洲精品综合网| 中文字幕亚洲精品乱码加勒比| 精品欧美国产一二三区| 日韩三极片在线免费播放| 午夜精品久久久免费视频| 91后入中出内射在线| 国产精品一区二区视频成人| 久久精品色妇熟妇丰满人妻91 | 国产不卡在线免费观看视频| 亚洲夫妻性生活免费视频| 中文字幕精品少妇人妻| 欧美一级黄片欧美精品|