基于關(guān)鍵詞的深度萬維網(wǎng)數(shù)據(jù)庫選擇
發(fā)布時間:2018-07-29 09:20
【摘要】:該文提出一種基于關(guān)鍵詞的深度萬維網(wǎng)查詢方法:用戶用關(guān)鍵詞的方式提交查詢,該方法在線地選擇能夠反映查詢意圖并且提供高質(zhì)量結(jié)果的萬維網(wǎng)數(shù)據(jù)庫.這種方法既避免了深度萬維網(wǎng)數(shù)據(jù)抓取這一代價高、難度大的操作,又可支持多領(lǐng)域的數(shù)據(jù)庫上的關(guān)鍵詞查詢,從而能夠與現(xiàn)有的搜索引擎實現(xiàn)無縫集成.文中側(cè)重于討論基于關(guān)鍵詞的數(shù)據(jù)庫選擇,從以下兩個方面解決這一問題所涉及的挑戰(zhàn):(1)提出了一種度量關(guān)鍵詞-領(lǐng)域?qū)傩躁P(guān)聯(lián)的相關(guān)性模型,并設(shè)計了基于隨機游動的算法從查詢?nèi)罩局邪l(fā)現(xiàn)潛在的關(guān)鍵詞-屬性關(guān)聯(lián);(2)給出了一種新的數(shù)據(jù)采樣方法,并用于基于采樣的數(shù)據(jù)庫-查詢的相關(guān)性模型中,最終解決深度萬維網(wǎng)的數(shù)據(jù)庫選擇問題.在中文深度萬維網(wǎng)真實數(shù)據(jù)集上的實驗表明:提出的方法能夠有效地選擇與關(guān)鍵詞查詢相關(guān)的數(shù)據(jù)庫,提供高質(zhì)量的結(jié)果.
[Abstract]:In this paper, we propose a deep Web query method based on keywords: users submit queries in the form of keywords. This method selects the Web database which can reflect the intention of the query and provide high quality results online. This method not only avoids a generation of expensive and difficult operations of deep web data capture, but also supports keyword queries in multi-domain databases, thus realizing seamless integration with existing search engines. This paper focuses on the choice of database based on keywords, and addresses the challenges involved in this problem from the following two aspects: (1) A correlation model is proposed to measure the association of keyword and domain attributes. The algorithm based on random walk is designed to find potential keyword attribute association from the query log. (2) A new data sampling method is proposed and used in the database query correlation model based on sampling. Finally, the database selection problem of the deep World wide Web is solved. Experiments on the real data set of the Chinese Deep World wide Web show that the proposed method can effectively select the database related to keyword query and provide high quality results.
【作者單位】: 清華大學(xué)計算機科學(xué)與技術(shù)系;
【基金】:國家自然科學(xué)基金重點項目“支持中文Web研究的基礎(chǔ)設(shè)施建設(shè)和應(yīng)用中的基本方法與關(guān)鍵技術(shù)”(60833003)資助
【分類號】:TP311.13
本文編號:2152227
[Abstract]:In this paper, we propose a deep Web query method based on keywords: users submit queries in the form of keywords. This method selects the Web database which can reflect the intention of the query and provide high quality results online. This method not only avoids a generation of expensive and difficult operations of deep web data capture, but also supports keyword queries in multi-domain databases, thus realizing seamless integration with existing search engines. This paper focuses on the choice of database based on keywords, and addresses the challenges involved in this problem from the following two aspects: (1) A correlation model is proposed to measure the association of keyword and domain attributes. The algorithm based on random walk is designed to find potential keyword attribute association from the query log. (2) A new data sampling method is proposed and used in the database query correlation model based on sampling. Finally, the database selection problem of the deep World wide Web is solved. Experiments on the real data set of the Chinese Deep World wide Web show that the proposed method can effectively select the database related to keyword query and provide high quality results.
【作者單位】: 清華大學(xué)計算機科學(xué)與技術(shù)系;
【基金】:國家自然科學(xué)基金重點項目“支持中文Web研究的基礎(chǔ)設(shè)施建設(shè)和應(yīng)用中的基本方法與關(guān)鍵技術(shù)”(60833003)資助
【分類號】:TP311.13
【共引文獻】
相關(guān)碩士學(xué)位論文 前1條
1 鄭冬冬;DeepWeb信息集成系統(tǒng)關(guān)鍵技術(shù)研究[D];蘇州大學(xué);2006年
【相似文獻】
相關(guān)碩士學(xué)位論文 前1條
1 李岸峰;基于Agent的中小企業(yè)知識管理系統(tǒng)架構(gòu)研究[D];遼寧工程技術(shù)大學(xué);2010年
,本文編號:2152227
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2152227.html
最近更新
教材專著