面向Ajax的搜索引擎技術研究

發(fā)布時間：2018-11-07 06:40

【摘要】：Web正在經(jīng)歷一場巨大的變革,Web2.0時代已經(jīng)到來。在Web2.0的大背景下,有一項技術已經(jīng)大獲成功并擁有了重要的地位,那就是Ajax,它有效地把JavaScript和動態(tài)DOM操作結合了起來,并通過與服務器的異步通信來實現(xiàn)豐富的交互性和響應性。但是AJAX技術上的變化徹底粉碎了傳統(tǒng)的“網(wǎng)頁”的概念,而這恰恰是現(xiàn)有眾多web技術的實現(xiàn)基礎,所以它帶來創(chuàng)新性的同時也帶來了很大的挑戰(zhàn),主要體現(xiàn)在“網(wǎng)頁”的可搜索性和可測試性。本文主要從可搜索性出發(fā),分析傳統(tǒng)網(wǎng)絡搜索引擎在Ajax出現(xiàn)后遇到的技術瓶頸,并對當下支持Ajax應用的搜索引擎技術的研究現(xiàn)狀做了全面的調(diào)查,著重介紹了Ajax爬蟲技術的研究現(xiàn)狀,雖然取得了一定的研究成果,但是還有很多有待解決的問題。由于Ajax單個頁面中包含多個狀態(tài),本文引援了經(jīng)典的狀態(tài)轉換圖模型對Ajax應用進行建模,并介紹了一種基于狀態(tài)轉換圖的單線程Ajax爬行算法,然后在此基礎上提出了一種并行的爬行算法,實驗證明其爬行性能得到了大幅提升。在并行爬蟲的研究基礎上,本文又創(chuàng)新的提出了Ajax搜索引擎原型系統(tǒng),基于一個輕量級搜索引擎Nutch實現(xiàn),有效利用其插件機制擴展其功能,讓其支持了對Ajax頁面的爬取、索引和檢索,驗證了本文觀點的正確性和有效性。
[Abstract]:Web is undergoing a huge change, and the era of Web2.0 has arrived. In the context of Web2.0, one technology that has been hugely successful and important is Ajax, which effectively combines JavaScript with dynamic DOM operations. And through asynchronous communication with the server to achieve rich interaction and responsiveness. But the change in AJAX technology has completely shattered the traditional concept of "web page", which is the foundation of many existing web technologies, so it brings innovation and great challenges. This is mainly reflected in the searchability and testability of web pages. Based on the searchability, this paper analyzes the technical bottleneck of the traditional network search engine after the emergence of Ajax, and makes a comprehensive investigation on the current research status of the search engine technology supporting the Ajax application. This paper mainly introduces the research status of Ajax crawler technology. Although some research results have been obtained, there are still many problems to be solved. Because there are many states in a single Ajax page, this paper introduces the classical state transition graph model to model the Ajax application, and introduces a single-threaded Ajax crawling algorithm based on the state transition graph. Then, a parallel crawling algorithm is proposed, and it is proved by experiments that its crawling performance has been greatly improved. Based on the research of parallel crawler, this paper proposes a prototype system of Ajax search engine, which is based on a lightweight search engine Nutch. It can effectively use its plug-in mechanism to extend its function and enable it to support the crawling of Ajax pages. Indexing and retrieval verify the correctness and validity of this view.
【學位授予單位】：浙江大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP391.3

【參考文獻】

相關期刊論文前2條

1 印鑒,陳憶群,張鋼;搜索引擎技術研究與發(fā)展[J];計算機工程;2005年14期

2 郭浩;陸余良;劉金紅;;一種基于狀態(tài)轉換圖的Ajax爬行算法[J];計算機應用研究;2009年11期

相關碩士學位論文前1條

1 張媚;Ajax友好的網(wǎng)絡爬蟲設計與實現(xiàn)[D];暨南大學;2011年

，

本文編號：2315509

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2315509.html

上一篇：模具經(jīng)驗性知識的搜索條件預處理方法
下一篇：基于微博的用戶興趣分析與個性化信息推薦

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向Ajax的搜索引擎技術研究