面向Ajax的搜索引擎技術(shù)研究
發(fā)布時間:2018-11-07 06:40
【摘要】:Web正在經(jīng)歷一場巨大的變革,Web2.0時代已經(jīng)到來。在Web2.0的大背景下,有一項技術(shù)已經(jīng)大獲成功并擁有了重要的地位,那就是Ajax,它有效地把JavaScript和動態(tài)DOM操作結(jié)合了起來,并通過與服務(wù)器的異步通信來實現(xiàn)豐富的交互性和響應(yīng)性。 但是AJAX技術(shù)上的變化徹底粉碎了傳統(tǒng)的“網(wǎng)頁”的概念,而這恰恰是現(xiàn)有眾多web技術(shù)的實現(xiàn)基礎(chǔ),所以它帶來創(chuàng)新性的同時也帶來了很大的挑戰(zhàn),主要體現(xiàn)在“網(wǎng)頁”的可搜索性和可測試性。 本文主要從可搜索性出發(fā),分析傳統(tǒng)網(wǎng)絡(luò)搜索引擎在Ajax出現(xiàn)后遇到的技術(shù)瓶頸,并對當(dāng)下支持Ajax應(yīng)用的搜索引擎技術(shù)的研究現(xiàn)狀做了全面的調(diào)查,著重介紹了Ajax爬蟲技術(shù)的研究現(xiàn)狀,雖然取得了一定的研究成果,但是還有很多有待解決的問題。 由于Ajax單個頁面中包含多個狀態(tài),本文引援了經(jīng)典的狀態(tài)轉(zhuǎn)換圖模型對Ajax應(yīng)用進行建模,并介紹了一種基于狀態(tài)轉(zhuǎn)換圖的單線程Ajax爬行算法,然后在此基礎(chǔ)上提出了一種并行的爬行算法,實驗證明其爬行性能得到了大幅提升。 在并行爬蟲的研究基礎(chǔ)上,本文又創(chuàng)新的提出了Ajax搜索引擎原型系統(tǒng),基于一個輕量級搜索引擎Nutch實現(xiàn),有效利用其插件機制擴展其功能,讓其支持了對Ajax頁面的爬取、索引和檢索,驗證了本文觀點的正確性和有效性。
[Abstract]:Web is undergoing a huge change, and the era of Web2.0 has arrived. In the context of Web2.0, one technology that has been hugely successful and important is Ajax, which effectively combines JavaScript with dynamic DOM operations. And through asynchronous communication with the server to achieve rich interaction and responsiveness. But the change in AJAX technology has completely shattered the traditional concept of "web page", which is the foundation of many existing web technologies, so it brings innovation and great challenges. This is mainly reflected in the searchability and testability of web pages. Based on the searchability, this paper analyzes the technical bottleneck of the traditional network search engine after the emergence of Ajax, and makes a comprehensive investigation on the current research status of the search engine technology supporting the Ajax application. This paper mainly introduces the research status of Ajax crawler technology. Although some research results have been obtained, there are still many problems to be solved. Because there are many states in a single Ajax page, this paper introduces the classical state transition graph model to model the Ajax application, and introduces a single-threaded Ajax crawling algorithm based on the state transition graph. Then, a parallel crawling algorithm is proposed, and it is proved by experiments that its crawling performance has been greatly improved. Based on the research of parallel crawler, this paper proposes a prototype system of Ajax search engine, which is based on a lightweight search engine Nutch. It can effectively use its plug-in mechanism to extend its function and enable it to support the crawling of Ajax pages. Indexing and retrieval verify the correctness and validity of this view.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3
本文編號:2315509
[Abstract]:Web is undergoing a huge change, and the era of Web2.0 has arrived. In the context of Web2.0, one technology that has been hugely successful and important is Ajax, which effectively combines JavaScript with dynamic DOM operations. And through asynchronous communication with the server to achieve rich interaction and responsiveness. But the change in AJAX technology has completely shattered the traditional concept of "web page", which is the foundation of many existing web technologies, so it brings innovation and great challenges. This is mainly reflected in the searchability and testability of web pages. Based on the searchability, this paper analyzes the technical bottleneck of the traditional network search engine after the emergence of Ajax, and makes a comprehensive investigation on the current research status of the search engine technology supporting the Ajax application. This paper mainly introduces the research status of Ajax crawler technology. Although some research results have been obtained, there are still many problems to be solved. Because there are many states in a single Ajax page, this paper introduces the classical state transition graph model to model the Ajax application, and introduces a single-threaded Ajax crawling algorithm based on the state transition graph. Then, a parallel crawling algorithm is proposed, and it is proved by experiments that its crawling performance has been greatly improved. Based on the research of parallel crawler, this paper proposes a prototype system of Ajax search engine, which is based on a lightweight search engine Nutch. It can effectively use its plug-in mechanism to extend its function and enable it to support the crawling of Ajax pages. Indexing and retrieval verify the correctness and validity of this view.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前2條
1 印鑒,陳憶群,張鋼;搜索引擎技術(shù)研究與發(fā)展[J];計算機工程;2005年14期
2 郭浩;陸余良;劉金紅;;一種基于狀態(tài)轉(zhuǎn)換圖的Ajax爬行算法[J];計算機應(yīng)用研究;2009年11期
相關(guān)碩士學(xué)位論文 前1條
1 張媚;Ajax友好的網(wǎng)絡(luò)爬蟲設(shè)計與實現(xiàn)[D];暨南大學(xué);2011年
,本文編號:2315509
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2315509.html
最近更新
教材專著