基于最優(yōu)查詢(xún)的多領(lǐng)域deep Web爬蟲(chóng)
發(fā)布時(shí)間:2018-03-17 17:36
本文選題:deep 切入點(diǎn):Web 出處:《計(jì)算機(jī)應(yīng)用研究》2009年09期 論文類(lèi)型:期刊論文
【摘要】:Deep Web信息通過(guò)在網(wǎng)頁(yè)搜索接口提交查詢(xún)?cè)~獲得。通用搜索引擎使用超鏈接爬取網(wǎng)頁(yè),無(wú)法索引deep Web數(shù)據(jù)。為解決此問(wèn)題,介紹一種基于最優(yōu)查詢(xún)的deep Web爬蟲(chóng),通過(guò)從聚類(lèi)網(wǎng)頁(yè)中生成最優(yōu)查詢(xún),自動(dòng)提交查詢(xún),最后索引查詢(xún)結(jié)果。實(shí)驗(yàn)表明系統(tǒng)能自動(dòng)、高效地完成多領(lǐng)域deep Web數(shù)據(jù)爬取。
[Abstract]:Deep Web information is obtained by submitting query words in the web search interface. Universal search engines use hyperlinks to crawl web pages and cannot index deep Web data. In order to solve this problem, a deep Web crawler based on optimal query is introduced. By generating the optimal query from the clustering web page, submitting the query automatically, and finally indexing the query results, the experiment shows that the system can automatically and efficiently crawl the multi-domain deep Web data.
【作者單位】: 浙江大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:浙江省科技計(jì)劃基金資助項(xiàng)目(2007C23086)
【分類(lèi)號(hào)】:TP393.092
,
本文編號(hào):1625775
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1625775.html
最近更新
教材專(zhuān)著