基于SLCS的元搜索去重技術(shù)研究
發(fā)布時(shí)間:2018-01-20 04:51
本文關(guān)鍵詞: 網(wǎng)頁(yè)去重 元搜索引擎 LCS 特征碼 出處:《圖書(shū)情報(bào)工作》2010年15期 論文類型:期刊論文
【摘要】:針對(duì)元搜索結(jié)果中的網(wǎng)頁(yè)重復(fù)問(wèn)題,把基于最長(zhǎng)公共子序列(Longest Common Subsequence,簡(jiǎn)稱LCS)的網(wǎng)頁(yè)去重方法應(yīng)用到元搜索引擎的去重中,提出基于SLCS(首字母S表示Summary)的元搜索去重方法。在獲得網(wǎng)頁(yè)文檔摘要后,根據(jù)查詢?cè)~在語(yǔ)句中出現(xiàn)的次數(shù)和語(yǔ)句長(zhǎng)度,計(jì)算摘要語(yǔ)句集合中每個(gè)語(yǔ)句權(quán)重,提取權(quán)重最大的語(yǔ)句作為網(wǎng)頁(yè)摘要特征語(yǔ)句,通過(guò)比較摘要特征語(yǔ)句間的LCS,計(jì)算出結(jié)果網(wǎng)頁(yè)相似性,以提高元搜索引擎的檢索質(zhì)量,實(shí)驗(yàn)表明該方法具有較高的準(zhǔn)確率。
[Abstract]:Aiming at the problem of web page repetition in meta search results, the longest Common Subsequence is based on the longest common subsequence. The method of web page de-reduplication based on LCSS is applied to the meta search engine. A new method based on SLCSS (S for Summary-based) is proposed. According to the number and length of the query words in the statement, the weight of each statement in the summary statement set is calculated, and the statement with the largest weight is extracted as the feature statement of the web page summary. By comparing the LCSs among abstract feature statements, the similarity of the result pages is calculated to improve the retrieval quality of the meta search engine. The experimental results show that this method has a high accuracy.
【作者單位】: 河南工業(yè)大學(xué)信息科學(xué)與工程學(xué)院;
【分類號(hào)】:TP391.3
【正文快照】: 元搜索引擎(Meta-search Engine)將用戶的查詢請(qǐng)求分發(fā)給多個(gè)獨(dú)立的成員搜索引擎,對(duì)搜索結(jié)果進(jìn)行融合處理,能夠較好地滿足用戶的查詢需求[1]。但是,查詢結(jié)果中會(huì)有一定程度的重復(fù),這種重復(fù)嚴(yán)重影響查詢結(jié)果的質(zhì)量。因此,如何高效去除元搜索引擎查詢結(jié)果中的重復(fù)網(wǎng)頁(yè),是搜索引,
本文編號(hào):1446916
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1446916.html
最近更新
教材專著