天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

分布式搜索中節(jié)點(diǎn)索引量大小估計(jì)算法

發(fā)布時(shí)間:2018-12-24 10:01
【摘要】:分布式搜索是解決對深層網(wǎng)絡(luò)搜索的有效方案,各節(jié)點(diǎn)的索引量大小是分布式搜索引擎描述選擇節(jié)點(diǎn)的重要參數(shù)。為了解決在非合作環(huán)境中估算節(jié)點(diǎn)索引量大小的問題,提出并實(shí)現(xiàn)了基于高頻詞匯再采樣的高頻再采樣算法和基于文檔捕獲概率不同假設(shè)的異概捕獲算法。高頻再采樣算法在隨機(jī)采樣后基于樣本集中的高頻詞匯進(jìn)行再采樣;而異概捕獲算法則利用Logistic函數(shù)和條件似然方法估算節(jié)點(diǎn)的索引量大小。通過真實(shí)網(wǎng)絡(luò)數(shù)據(jù)的實(shí)驗(yàn)結(jié)果表明,這些算法優(yōu)于已有的采樣-再采樣與捕獲-再捕獲算法。
[Abstract]:Distributed search is an effective solution to the deep network search. The index size of each node is an important parameter to describe the selection node of the distributed search engine. In order to solve the problem of estimating the index size of nodes in a non-cooperative environment, a high-frequency resampling algorithm based on high-frequency lexical resampling and an alternative capture algorithm based on different assumptions of document acquisition probability are proposed and implemented. The high-frequency resampling algorithm is based on the high-frequency vocabulary in the sample set after random sampling, while the hetero-probability acquisition algorithm uses Logistic function and conditional likelihood method to estimate the index size of nodes. The experimental results of real network data show that these algorithms are superior to the existing sample-resampling and capture-recapture algorithms.
【作者單位】: 清華大學(xué)電子工程系;
【分類號】:TP391.3

【共引文獻(xiàn)】

相關(guān)碩士學(xué)位論文 前1條

1 丁丹丹;廣義捕獲反應(yīng)模型及模型選擇[D];北京大學(xué);2008年

【相似文獻(xiàn)】

相關(guān)會(huì)議論文 前1條

1 陸宇e,

本文編號:2390485


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2390485.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶edff6***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com