利用Nutch研究與實現(xiàn)支持Ajax動態(tài)網(wǎng)頁的網(wǎng)絡(luò)爬蟲系統(tǒng)
[Abstract]:With the rapid development of Web2.0, the application of Ajax technology is more and more. Ajax technology through asynchronous calls to carry out local page refresh, to a large extent, improve the user's experience, It reduces the network traffic and improves the visiting speed of the website. While Ajax technology changes the interaction mode of the Internet, it also brings a series of problems to users and developers. For example, the use and writing of JavaScript code is not standardized, the browser is not compatible, the number of page requests is too many, the abuse of Ajax technology caused by the excessive burden of servers and many other problems. The crawler system belongs to a necessary data collection subsystem in the search engine. After the search engine establishes the index according to the data collected by the crawler system, Providing search services to users. The extensive use of Ajax technology also has an important impact on search engines. The traditional search engine only provides the search service for the static page data, but not the search service for the dynamic data generated by the Ajax technology. The extensive use of the Ajax technology has resulted in the increasing volume of the page dynamic data generated by the Ajax technology. This part of dynamic data is of great significance in data analysis and data mining. For example, some of the comments above Sina News are generated dynamically through Ajax technology, and the collection of data is of great significance to national security. In this paper, we improve Nutch, add some modules, build a web crawler system which can crawl Ajax dynamic data, build index according to the data, and provide search service to users.
【學位授予單位】:內(nèi)蒙古師范大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前5條
1 查志華;李偉;;搜索引擎的技術(shù)現(xiàn)狀及發(fā)展趨勢[J];兵團教育學院學報;2006年03期
2 趙志宏;黃蕾;劉峰;陳振宇;;Deep Web搜索技術(shù)進展綜述[J];山東大學學報(工學版);2009年02期
3 鄭冬冬;崔志明;;Deep Web爬蟲爬行策略研究[J];計算機工程與設(shè)計;2006年17期
4 胡少榮;孟嗣儀;劉云;張彥超;丁飛;;網(wǎng)頁信息自動抽取技術(shù)的研究[J];鐵路計算機應(yīng)用;2010年09期
5 嚴亞蘭;面向動態(tài)網(wǎng)頁爬行的Crawler架構(gòu)[J];圖書情報知識;2003年04期
相關(guān)碩士學位論文 前6條
1 王佳;支持Ajax技術(shù)的主題網(wǎng)絡(luò)爬蟲系統(tǒng)研究與實現(xiàn)[D];北京交通大學;2011年
2 羅兵;支持AJAX的互聯(lián)網(wǎng)搜索引擎爬蟲設(shè)計與實現(xiàn)[D];浙江大學;2007年
3 肖卓磊;基于Ajax技術(shù)的搜索引擎研究[D];武漢理工大學;2009年
4 袁小節(jié);基于協(xié)議驅(qū)動與事件驅(qū)動的綜合聚焦爬蟲研究與實現(xiàn)[D];國防科學技術(shù)大學;2009年
5 曾偉輝;支持AJAX的網(wǎng)絡(luò)爬蟲系統(tǒng)設(shè)計與實現(xiàn)[D];中國科學技術(shù)大學;2009年
6 莊重;WEB信息抽取的研究[D];湖北工業(yè)大學;2009年
本文編號:2287161
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2287161.html