復雜布爾查詢下的文檔收集打分策略的優(yōu)化
發(fā)布時間:2019-06-22 16:42
【摘要】:雖然布爾查詢是信息檢索領(lǐng)域中較早提出的一個概念,但是對布爾查詢的大量研究主要還是針對布爾操作一致的布爾查詢。對于復雜布爾查詢,目前并沒有太多的相關(guān)研究,復雜布爾查詢卻越來越被頻繁地使用(如文本推薦領(lǐng)域)。為了促使這類查詢能夠被更加高效地執(zhí)行,提出了一種基于DAAT(document-at-a-time)框架的文檔收集打分策略——DCQ(DAAT for complex query)算法,并與著名開源搜索引擎Lucene進行比較實驗,查詢性能有了顯著提升。此外,提出了一套對查詢性能的回歸預測機制,該機制能比較準確地決策DCQ算法的使用時機。實驗表明,結(jié)合了性能預測器的復合算法要遠優(yōu)于Lucene當前的文檔收集打分算法。
[Abstract]:Although Boolean query is a concept proposed earlier in the field of information retrieval, a lot of research on Boolean query is mainly aimed at Boolean query with consistent Boolean operation. At present, there is not much research on complex Boolean query, but complex Boolean query is used more and more frequently (such as text recommendation field). In order to make this kind of query be executed more efficiently, a document collection scoring strategy based on DAAT (document-at-a-time) framework, DCQ (DAAT for complex query) algorithm, is proposed and compared with Lucene, a famous open source search engine. The query performance has been significantly improved. In addition, a set of regression prediction mechanism for query performance is proposed, which can determine the timing of DCQ algorithm more accurately. The experimental results show that the composite algorithm combined with performance predictors is much better than Lucene's current document collection scoring algorithm.
【作者單位】: 北京大學信息科學技術(shù)學院;
【基金】:國家重點基礎(chǔ)研究發(fā)展計劃(973計劃) 國家自然科學基金~~
【分類號】:TP391.3
,
本文編號:2504778
[Abstract]:Although Boolean query is a concept proposed earlier in the field of information retrieval, a lot of research on Boolean query is mainly aimed at Boolean query with consistent Boolean operation. At present, there is not much research on complex Boolean query, but complex Boolean query is used more and more frequently (such as text recommendation field). In order to make this kind of query be executed more efficiently, a document collection scoring strategy based on DAAT (document-at-a-time) framework, DCQ (DAAT for complex query) algorithm, is proposed and compared with Lucene, a famous open source search engine. The query performance has been significantly improved. In addition, a set of regression prediction mechanism for query performance is proposed, which can determine the timing of DCQ algorithm more accurately. The experimental results show that the composite algorithm combined with performance predictors is much better than Lucene's current document collection scoring algorithm.
【作者單位】: 北京大學信息科學技術(shù)學院;
【基金】:國家重點基礎(chǔ)研究發(fā)展計劃(973計劃) 國家自然科學基金~~
【分類號】:TP391.3
,
本文編號:2504778
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2504778.html
最近更新
教材專著