元搜索引擎的結(jié)果合成算法研究
[Abstract]:Search engine provides users with great convenience for information retrieval, but the research shows that the search engine resource coverage still can not meet the needs, and the accuracy of the search engine needs to be improved. The meta-search engine integrates several independent search engines. It invokes its member search engines to complete the user search. Finally, it uniformly processes the returned result sets, and to some extent solves some problems existing in the search engines. It is widely used. At present, the core technologies of meta search engine are the analysis and transformation of retrieval request, the scheduling algorithm of member engine, the algorithm of composition of retrieval results, and so on. In this paper, we focus on the meta-search engine's result composition mechanism, and focus on the two parts of web page de-reduplication and result fusion ranking in the result composition mechanism. The results are very important to the performance of the meta-search engine, but there are still many shortcomings in the meta search engine. The main work of this paper is as follows: (1) this paper systematically studies the architecture and working principle of search engine and meta search engine, and analyzes the current situation of their research both at home and abroad. The key technologies of meta search engine are introduced in detail. (2) comparing and analyzing the existing search engine and the commonly used web page de-duplication algorithm in meta search engine, studying its advantages and disadvantages, combining with the characteristics of the result return of the meta search engine. In this paper, an algorithm based on the URL, title and summary is proposed, and different discriminant methods are proposed according to the characteristics of the URL, title and summary. It makes the rescheduling algorithm more accurate. (3) the classic search result sorting algorithm in meta search engine is studied, the advantages and disadvantages of different sorting algorithms are analyzed and summarized, and the Borda voting sorting method is emphatically studied, aiming at the shortage of Borda sorting. An improved algorithm combining location relation and query similarity is proposed, and the normalization method and similarity calculation method of result location are improved. (4) A meta-search engine prototype is proposed. On the basis of this system, the corresponding experiments are made on the proposed algorithm, and the experimental results are analyzed, and the performance of the algorithm is verified. At the end of the paper, the thesis summarizes the main work, innovation points and experimental process of this paper, and expounds the development direction of meta search engine and the future research problems.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 蘇君華;;搜索引擎評(píng)價(jià)研究綜述[J];情報(bào)雜志;2011年04期
2 安和平;雷英杰;杜書華;崔三俊;;元搜索引擎研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2010年22期
3 周小平;黃家裕;劉連芳;梁一平;申文明;;基于網(wǎng)頁正文主題和摘要的網(wǎng)頁去重算法[J];廣西科學(xué)院學(xué)報(bào);2009年04期
4 劉四維;章軼;夏勇明;錢松榮;;基于HTML標(biāo)記和長句提取的網(wǎng)頁去重算法[J];微型電腦應(yīng)用;2009年08期
5 吳小蘭;汪琪;;元搜索引擎研究綜述[J];圖書情報(bào)工作;2009年09期
6 姚新波;馬治坤;;基于特征串的網(wǎng)頁去重算法[J];科技信息;2008年28期
7 謝蕙;秦杰;胡雙雙;;基于用戶查詢關(guān)鍵詞的網(wǎng)頁去重方法研究[J];現(xiàn)代圖書情報(bào)技術(shù);2008年07期
8 楊彬;康慕寧;;基于用戶反饋的搜索引擎選擇及結(jié)果歸并[J];計(jì)算機(jī)工程;2007年24期
9 魏麗霞;鄭家恒;;基于網(wǎng)頁文本結(jié)構(gòu)的網(wǎng)頁去重[J];計(jì)算機(jī)應(yīng)用;2007年11期
10 郭晨娟;李戰(zhàn)懷;;基于概念的網(wǎng)頁相似度處理算法研究[J];計(jì)算機(jī)應(yīng)用;2006年12期
相關(guān)會(huì)議論文 前1條
1 彭淵;趙鐵軍;鄭德權(quán);于浩;;基于特征句抽取的網(wǎng)頁去重研究[A];全國第八屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會(huì)議(JSCL-2005)論文集[C];2005年
相關(guān)碩士學(xué)位論文 前6條
1 李磊;個(gè)性化元搜索引擎關(guān)鍵技術(shù)的研究[D];內(nèi)蒙古科技大學(xué);2013年
2 欒艷;基于段落指紋的大規(guī)模近似網(wǎng)頁檢測算法研究[D];南京理工大學(xué);2012年
3 王春艷;元搜索引擎的研究與實(shí)現(xiàn)[D];吉林大學(xué);2011年
4 孟慶鑫;搜索引擎相關(guān)技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2011年
5 胡升澤;個(gè)性化元搜索引擎若干關(guān)鍵技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2008年
6 姚漫;基于文本聚類的網(wǎng)頁消重算法研究[D];北京交通大學(xué);2008年
,本文編號(hào):2420245
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2420245.html