分布式搜索的結(jié)果融合方法研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-28 03:47
本文選題:分布式搜索引擎 + 聯(lián)合檢索。 參考:《華南理工大學(xué)》2013年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的高速發(fā)展,網(wǎng)頁數(shù)量和信息豐富性增長速度迅猛,而且信息資源的分布和呈現(xiàn)越來越分布化,這就給傳統(tǒng)的集中式搜索引擎帶來了很多挑戰(zhàn),尤其在系統(tǒng)的可拓展性、以及如何檢索“深層網(wǎng)絡(luò)”并實(shí)現(xiàn)搜索結(jié)果的多樣化等關(guān)鍵問題上。因此為了適應(yīng)新一代網(wǎng)絡(luò)信息分布的構(gòu)造特點(diǎn)和潛在的發(fā)展趨勢(shì),分布式搜索引擎系統(tǒng)將是一種比較合適的解決方案。基于可擴(kuò)展的分布式架構(gòu),,分布式搜索引擎能夠有效利用分布的資源,綜合信息資源的多樣化,并提供給用戶更為全面準(zhǔn)確的信息檢索服務(wù)。 本文工作來源于國家下一代互聯(lián)網(wǎng)CNGI項(xiàng)目“下一代互聯(lián)網(wǎng)分布式搜索引擎”。本文主要研究分布式搜索引擎平臺(tái)的聯(lián)合檢索系統(tǒng),該檢索系統(tǒng)自動(dòng)將查詢分發(fā)給各獨(dú)立的搜索引擎(單元搜索引擎),并對(duì)各單元搜索引擎的返回結(jié)果進(jìn)行結(jié)果融合,以提供給用戶綜合的優(yōu)化排序結(jié)果。聯(lián)合檢索系統(tǒng)的核心技術(shù)是查詢分發(fā)和結(jié)果融合,選擇合適的查詢分發(fā)策略,利用查詢分發(fā)的選擇來對(duì)檢索結(jié)果進(jìn)行綜合優(yōu)化的融合排序是本文的主要研究內(nèi)容。 本文基于來自于校園網(wǎng)的實(shí)際數(shù)據(jù)集特性,通過挖掘單元搜索引擎的靜態(tài)和動(dòng)態(tài)資源特征,采用資源評(píng)分衡量單元搜索引擎和查詢?cè)~的相關(guān)程度,提出了基于資源評(píng)分的查詢分發(fā)策略,該策略能夠選擇與查詢?cè)~相關(guān)度高的單元搜索引擎進(jìn)行查詢分發(fā),保證返回結(jié)果的質(zhì)量。在完成查詢分發(fā)策略的基礎(chǔ)上,提出本文的綜合優(yōu)化的結(jié)果融合排序算法,包括了采用文檔分?jǐn)?shù)歸一化的方式規(guī)范化結(jié)果文檔評(píng)分、基于查詢分發(fā)的資源評(píng)分設(shè)計(jì)合理的融合算法和強(qiáng)化多樣化結(jié)果的融合機(jī)制,最后通過實(shí)驗(yàn)驗(yàn)證本文提出的查詢分發(fā)策略和結(jié)果融合算法能夠提高系統(tǒng)的查準(zhǔn)率,并保證多樣化的展示效果,從而滿足用戶多角度查詢的需求。
[Abstract]:With the rapid development of the Internet, the number of web pages and the richness of information are growing rapidly, and the distribution and presentation of information resources are becoming more and more distributed, which brings a lot of challenges to the traditional centralized search engine. Especially in the system scalability, and how to retrieve the "deep network" and achieve the diversification of search results and other key issues. Therefore, in order to adapt to the new generation of network information distribution characteristics and potential development trend, distributed search engine system will be a more suitable solution. Based on the extensible distributed architecture, the distributed search engine can effectively utilize the distributed resources, synthesize the diversification of the information resources, and provide users with more comprehensive and accurate information retrieval services. The work of this paper comes from the National next Generation Internet (CNGI) project, the next Generation Internet distributed search engine. This paper mainly studies the joint search system of distributed search engine platform, which automatically distributes the query to each independent search engine (unit search engine), and fuses the results of each unit search engine. In order to provide users with a comprehensive optimization of the sorting results. The core technology of the joint retrieval system is query distribution and result fusion. The main research content of this paper is to select the appropriate query distribution strategy and to optimize the retrieval results synthetically by the selection of query distribution. In this paper, based on the characteristics of the actual data set from the campus network, the static and dynamic resource features of the unit search engine are mined, and the correlation degree between the unit search engine and the query word is measured by using the resource score. A query distribution strategy based on resource scoring is proposed. This strategy can select unit search engines with high correlation with query words to distribute queries and ensure the quality of the returned results. Based on the completion of the query distribution strategy, this paper proposes a comprehensive optimization of the results fusion sorting algorithm, including the normalization of the result document score by using the method of document score normalization. Resource scoring based on query distribution designed a reasonable fusion algorithm and enhanced the fusion mechanism of diversified results. Finally, the experiment proved that the query distribution strategy and the result fusion algorithm proposed in this paper can improve the precision of the system. And to ensure a variety of display effects, so as to meet the needs of users from multiple angles of inquiry.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 張強(qiáng)弓,喻國寶,廖湖聲,隋樹林;一種元搜索引擎的查詢結(jié)果處理模型[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年S1期
本文編號(hào):2076604
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2076604.html
最近更新
教材專著