Deep Web爬蟲爬行策略研究

發(fā)布時間：2018-06-29 22:43

本文選題：DeepWeb + DeepWeb爬蟲��；參考：《計算機(jī)工程與設(shè)計》2006年17期

【摘要】：如今Web上越來越多的信息可以通過查詢接口來獲得,為了獲取某DeepWeb站點(diǎn)的頁面用戶不得不鍵入一系列的關(guān)鍵詞集。由于沒有直接指向DeepWeb頁面的靜態(tài)鏈接,當(dāng)前大多搜索引擎不能發(fā)現(xiàn)和索引這些頁面。然而,近來研究表明DeepWeb站點(diǎn)提供的高質(zhì)量的信息對許多用戶來說是非常有價值。這里研究了怎樣建立起一個有效的DeepWeb爬蟲,它可以自動發(fā)現(xiàn)和下載DeepWeb頁面。由于DeepWeb惟一“入口點(diǎn)”是查詢接口,DeepWeb爬蟲設(shè)計面對的主要挑戰(zhàn)是怎樣對查詢接口自動產(chǎn)生有意義的查詢。這里提出一種針對查詢接口查詢自動產(chǎn)生問題的理論框架。通過在實(shí)際DeepWeb站點(diǎn)上的實(shí)驗(yàn)證明了此方法是非常有
[Abstract]:Nowadays, more and more information on the Web can be obtained through the query interface. In order to obtain the page users of a certain Web site, they have to type a series of keyword sets. Because there are no static links to DeepWeb pages, most search engines are unable to find and index these pages. However, recent studies have shown that high quality information provided by DeepWeb sites is of great value to many users. This paper studies how to build an effective DeepWeb crawler, which can automatically discover and download DeepWeb pages. Because the only "entry point" of DeepWeb is the main challenge in the design of query interface DeepWeb crawler is how to automatically generate meaningful queries on query interfaces. This paper presents a theoretical framework for automatic problem generation of query interface query. Experiments on the actual DeepWeb site show that this method is very useful.
【作者單位】：蘇州大學(xué)智能信息處理及應(yīng)用研究所蘇州大學(xué)智能信息處理及應(yīng)用研究所
【基金】：教育部高校博士學(xué)科點(diǎn)科研基金項(xiàng)目(20040285016) 江蘇省高技術(shù)研究基金項(xiàng)目(BG2005019)。
【分類號】：TP393.092

【相似文獻(xiàn)】

相關(guān)期刊論文前1條

1 鄭冬冬;崔志明;;Deep Web爬蟲爬行策略研究[J];計算機(jī)工程與設(shè)計;2006年17期

，

本文編號：2083666

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2083666.html

上一篇：基于海量搜索歷史數(shù)據(jù)的用戶興趣模型
下一篇：從死亡博客案透析我國網(wǎng)絡(luò)隱私權(quán)的保護(hù)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Deep Web爬蟲爬行策略研究