Deep Web爬蟲(chóng)爬行策略研究
發(fā)布時(shí)間:2018-06-29 22:43
本文選題:DeepWeb + DeepWeb爬蟲(chóng)。 參考:《計(jì)算機(jī)工程與設(shè)計(jì)》2006年17期
【摘要】:如今Web上越來(lái)越多的信息可以通過(guò)查詢接口來(lái)獲得,為了獲取某DeepWeb站點(diǎn)的頁(yè)面用戶不得不鍵入一系列的關(guān)鍵詞集。由于沒(méi)有直接指向DeepWeb頁(yè)面的靜態(tài)鏈接,當(dāng)前大多搜索引擎不能發(fā)現(xiàn)和索引這些頁(yè)面。然而,近來(lái)研究表明DeepWeb站點(diǎn)提供的高質(zhì)量的信息對(duì)許多用戶來(lái)說(shuō)是非常有價(jià)值。這里研究了怎樣建立起一個(gè)有效的DeepWeb爬蟲(chóng),它可以自動(dòng)發(fā)現(xiàn)和下載DeepWeb頁(yè)面。由于DeepWeb惟一“入口點(diǎn)”是查詢接口,DeepWeb爬蟲(chóng)設(shè)計(jì)面對(duì)的主要挑戰(zhàn)是怎樣對(duì)查詢接口自動(dòng)產(chǎn)生有意義的查詢。這里提出一種針對(duì)查詢接口查詢自動(dòng)產(chǎn)生問(wèn)題的理論框架。通過(guò)在實(shí)際DeepWeb站點(diǎn)上的實(shí)驗(yàn)證明了此方法是非常有
[Abstract]:Nowadays, more and more information on the Web can be obtained through the query interface. In order to obtain the page users of a certain Web site, they have to type a series of keyword sets. Because there are no static links to DeepWeb pages, most search engines are unable to find and index these pages. However, recent studies have shown that high quality information provided by DeepWeb sites is of great value to many users. This paper studies how to build an effective DeepWeb crawler, which can automatically discover and download DeepWeb pages. Because the only "entry point" of DeepWeb is the main challenge in the design of query interface DeepWeb crawler is how to automatically generate meaningful queries on query interfaces. This paper presents a theoretical framework for automatic problem generation of query interface query. Experiments on the actual DeepWeb site show that this method is very useful.
【作者單位】: 蘇州大學(xué)智能信息處理及應(yīng)用研究所 蘇州大學(xué)智能信息處理及應(yīng)用研究所
【基金】:教育部高校博士學(xué)科點(diǎn)科研基金項(xiàng)目(20040285016) 江蘇省高技術(shù)研究基金項(xiàng)目(BG2005019)。
【分類號(hào)】:TP393.092
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 鄭冬冬;崔志明;;Deep Web爬蟲(chóng)爬行策略研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2006年17期
,本文編號(hào):2083666
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2083666.html
最近更新
教材專著