Max-Score查詢處理優(yōu)化技術(shù)研究
發(fā)布時(shí)間:2018-08-07 07:43
【摘要】:隨著互聯(lián)網(wǎng)的迅速發(fā)展,網(wǎng)絡(luò)資源的信息量也急劇增長(zhǎng)。面對(duì)海量數(shù)據(jù)、海量查詢、實(shí)時(shí)響應(yīng)的搜索引擎應(yīng)用需求,如何高效地為用戶查詢提供實(shí)時(shí)的響應(yīng)成為搜索引擎面臨的一個(gè)重要問(wèn)題。一種重要的方法是通過(guò)優(yōu)化單機(jī)的查詢處理性能來(lái)提高整個(gè)系統(tǒng)的檢索效率。本文首先介紹了一些倒排索引查詢處理技術(shù)的相關(guān)理論,包括倒排索引的結(jié)構(gòu)、查詢處理方式以及動(dòng)態(tài)索引剪枝等內(nèi)容。DAAT Max-Score算法是Top-k查詢處理算法的經(jīng)典算法之一。針對(duì)現(xiàn)有Max-Score算法中,初始閾值為0帶來(lái)的“慢啟動(dòng)”問(wèn)題,本文提出了一種基于查詢劃分以及一種基于雙層索引結(jié)構(gòu)的DAAT Max-Score算法;诓樵?cè)~劃分的DAAT Max-Score算法根據(jù)用戶提交查詢?cè)~特點(diǎn),利用TAAT方法對(duì)短查詢集合的快速查詢處理選擇候選文檔和提高初始閾值。而基于雙層索引的DAAT Max-Score算法結(jié)合雙層索引結(jié)構(gòu)的特點(diǎn),在構(gòu)建雙層索引結(jié)構(gòu)時(shí)大幅降低了查詢?cè)~在下層索引的全局最大分?jǐn)?shù),同樣利用TAAT方法對(duì)上層索引的快速查詢處理選擇候選文檔和提高初始閾值,兩種改進(jìn)算法均能有效減少非最終Top-k文檔進(jìn)入候選文檔,從而改進(jìn)查詢處理性能。最后本文以兩種改進(jìn)算法為基礎(chǔ),對(duì)提出的兩種改進(jìn)算法有機(jī)結(jié)合,在Terrier平臺(tái)上設(shè)計(jì)實(shí)現(xiàn)了索引檢索系統(tǒng)。
[Abstract]:With the rapid development of the Internet, the amount of information of network resources is also increasing rapidly. In the face of the demand of search engine application for massive data, massive query and real-time response, how to efficiently provide real-time response to user query becomes an important problem facing search engine. An important method is to improve the retrieval efficiency of the whole system by optimizing the query processing performance of single machine. This paper first introduces some related theories of inverted index query processing technology, including the structure of inverted index, query processing method and dynamic index pruning. DAAT Max-Score algorithm is one of the classical algorithms of Top-k query processing algorithm. Aiming at the "slow start" problem caused by the initial threshold of 0 in existing Max-Score algorithms, this paper proposes a DAAT Max-Score algorithm based on query partitioning and a double-layer index structure. According to the characteristics of user submitted query words, DAAT Max-Score algorithm based on query word partition uses TAAT method to select candidate documents and raise initial threshold for fast query processing of short query sets. The DAAT Max-Score algorithm based on double-layer index combines the characteristics of double-layer index structure, and reduces the global maximum score of query words in the lower layer index greatly in the construction of double-layer index structure. The TAAT method is also used to select candidate documents and raise the initial threshold for fast query processing in the upper index. Both of the two improved algorithms can effectively reduce the non-final Top-k documents entering candidate documents and thus improve the query processing performance. Finally, based on two improved algorithms, an index retrieval system is designed and implemented on Terrier platform.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.3
本文編號(hào):2169339
[Abstract]:With the rapid development of the Internet, the amount of information of network resources is also increasing rapidly. In the face of the demand of search engine application for massive data, massive query and real-time response, how to efficiently provide real-time response to user query becomes an important problem facing search engine. An important method is to improve the retrieval efficiency of the whole system by optimizing the query processing performance of single machine. This paper first introduces some related theories of inverted index query processing technology, including the structure of inverted index, query processing method and dynamic index pruning. DAAT Max-Score algorithm is one of the classical algorithms of Top-k query processing algorithm. Aiming at the "slow start" problem caused by the initial threshold of 0 in existing Max-Score algorithms, this paper proposes a DAAT Max-Score algorithm based on query partitioning and a double-layer index structure. According to the characteristics of user submitted query words, DAAT Max-Score algorithm based on query word partition uses TAAT method to select candidate documents and raise initial threshold for fast query processing of short query sets. The DAAT Max-Score algorithm based on double-layer index combines the characteristics of double-layer index structure, and reduces the global maximum score of query words in the lower layer index greatly in the construction of double-layer index structure. The TAAT method is also used to select candidate documents and raise the initial threshold for fast query processing in the upper index. Both of the two improved algorithms can effectively reduce the non-final Top-k documents entering candidate documents and thus improve the query processing performance. Finally, based on two improved algorithms, an index retrieval system is designed and implemented on Terrier platform.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 鄧順國(guó);試論搜索引擎的發(fā)展趨勢(shì)[J];圖書(shū)館理論與實(shí)踐;2003年05期
相關(guān)博士學(xué)位論文 前2條
1 單棟棟;搜索引擎中索引剪枝的研究[D];北京大學(xué);2013年
2 朱明杰;互聯(lián)網(wǎng)搜索系統(tǒng)中的高性能查詢問(wèn)題研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2009年
相關(guān)碩士學(xué)位論文 前2條
1 羅會(huì)紅;基于SSH和Lucene垂直搜索引擎研究[D];長(zhǎng)沙理工大學(xué);2011年
2 高磊;基于LUCENE的搜索引擎研究與實(shí)現(xiàn)[D];武漢理工大學(xué);2007年
,本文編號(hào):2169339
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2169339.html
最近更新
教材專著