基于依存與排序?qū)W習(xí)技術(shù)的冗長查詢處理
發(fā)布時(shí)間:2018-07-08 19:48
本文選題:冗長查詢 + 查詢擴(kuò)展。 參考:《大連理工大學(xué)》2013年碩士論文
【摘要】:用戶一般通過搜索引擎來查詢所需信息,而搜索引擎的返回結(jié)果很多,每個(gè)檢索結(jié)果并不一定都是符合用戶需求的。同時(shí)由于教育文化背景的差異,用戶即使有相同的查詢意圖,但其提交的查詢關(guān)鍵詞也是千差萬別。因此,用戶輸入相同的查詢關(guān)鍵詞,返回同樣的結(jié)果并不能令每個(gè)用戶都滿意。在信息檢索系統(tǒng)中,一般通過查詢擴(kuò)展技術(shù)來提高檢索性能。用戶通常有時(shí)會(huì)把自己的詳細(xì)需求信息全部輸入至信息檢索系統(tǒng)進(jìn)行檢索,即用戶會(huì)輸入冗長查詢。這就給信息檢索系統(tǒng)帶來了壓力,迫使檢索系統(tǒng)不斷進(jìn)行改進(jìn)以滿足用戶輸入的冗長查詢的需求。目前以往的搜索引擎在處理冗長查詢時(shí)性能都會(huì)下降,檢索結(jié)果不能聚焦在查詢主題上,返回的信息不能滿足用戶需求。 本文提出了兩種冗長查詢處理方法:基于依存關(guān)系的冗長查詢重構(gòu)模型方法;基于語義與排序?qū)W習(xí)技術(shù)的冗長查詢處理方法。 基于依存關(guān)系的冗長查詢重構(gòu)模型方法區(qū)別于基于關(guān)鍵詞的處理方式,而是針對冗長查詢自身所具有的特征,冗長查詢中的詞項(xiàng)間具有良好的語法關(guān)系。本方法對文檔進(jìn)行依存關(guān)系分析,由于依存關(guān)系類型很多有些會(huì)產(chǎn)生噪音,因此本文對依存關(guān)系類型進(jìn)行了篩選抽取有效關(guān)系對,這點(diǎn)達(dá)到了縮短查詢詞的效果,之后對不同的關(guān)系對按其在重構(gòu)模型中的重要程度分配不同的權(quán)重,這點(diǎn)達(dá)到了重新加權(quán)的效果。實(shí)驗(yàn)驗(yàn)證了本方法對特別是對低召回率的檢索性能的提升作用,MAP和P@N兩個(gè)評價(jià)指標(biāo)上都有很大提升。 基于語義與排序?qū)W習(xí)技術(shù)的冗長查詢處理方法。利用文檔在不同主題空間上的分布以及計(jì)算其香農(nóng)距離,并利用排序?qū)W習(xí)的方法對原始檢索結(jié)果進(jìn)行重新排序。這說明排序?qū)W習(xí)方法能夠?yàn)槿唛L查詢處理技術(shù)提供較大幫助。實(shí)驗(yàn)結(jié)果表明,對于冗長查詢,不能像傳統(tǒng)的查詢擴(kuò)展那樣認(rèn)為查詢中每個(gè)詞是相互獨(dú)立的來看待,要充分利用其特有的依存語義信息才能夠?qū)ζ錂z索性能進(jìn)行改進(jìn)。 本文的實(shí)驗(yàn)所用的語料均來自公開數(shù)據(jù)集TREC標(biāo)準(zhǔn)語料,運(yùn)用多種方式對本文的實(shí)驗(yàn)結(jié)果進(jìn)行了評測,實(shí)驗(yàn)結(jié)果表明本文提出的兩種對于冗長查詢的處理技術(shù)對檢索系統(tǒng)性能有了較大的改進(jìn)。
[Abstract]:Users generally query the required information through search engines, but search engines return a lot of results, each search result does not necessarily meet the needs of the user. At the same time, because of the difference of educational and cultural background, even if the user has the same query intention, the key words submitted by the user are also very different. Therefore, user input the same query keywords, return the same results can not be satisfied with every user. In information retrieval system, query expansion technology is generally used to improve retrieval performance. Users sometimes input their detailed requirements information to the information retrieval system for retrieval, that is, users will enter lengthy queries. This puts pressure on the information retrieval system, forcing the retrieval system to continuously improve to meet the needs of lengthy queries input by users. At present, the performance of previous search engines in dealing with lengthy queries will decline, the retrieval results can not focus on the query topic, and the information returned can not meet the needs of users. In this paper, two methods of processing verbose queries are proposed: the method of reconstructing the model of verbose queries based on dependency, and the method of processing verbose queries based on semantic and sort learning techniques. The method of reconstructing the model of verbose query based on dependency is different from the processing method based on keywords, but aiming at the characteristics of the verbose query itself, there is a good grammatical relation among the words in the verbose query. This method analyzes the dependency relation of the document. Because there are many dependency types, some of them will produce noise, so we select and extract the effective relation pair for the dependent relation type, which can shorten the query words. After that, different relationships are assigned different weights according to their importance in the reconstruction model, which achieves the effect of reweighting. The experimental results show that the proposed method can improve the retrieval performance, especially for the low recall rate. Both map and P@ N are greatly improved. A long query processing method based on semantic and sorting learning techniques. The distribution of documents on different topic spaces and the Shannon distance are calculated, and the original retrieval results are reordered by sorting learning method. This shows that the sorting learning method can provide a great help for the lengthy query processing technology. The experimental results show that every word in the query can not be regarded as independent as the traditional query extension, and the retrieval performance can only be improved by making full use of its unique dependency semantic information. The data used in this paper are all from the public data set TREC standard corpus, using a variety of ways to evaluate the results of the experiment. The experimental results show that the two techniques proposed in this paper improve the performance of the retrieval system.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 宋巍;張宇;劉挺;李生;;基于檢索歷史上下文的個(gè)性化查詢重構(gòu)技術(shù)研究[J];中文信息學(xué)報(bào);2010年03期
2 崔航,文繼榮,李敏強(qiáng);基于用戶日志的查詢擴(kuò)展統(tǒng)計(jì)模型[J];軟件學(xué)報(bào);2003年09期
相關(guān)博士學(xué)位論文 前1條
1 葉正;基于網(wǎng)絡(luò)挖掘與機(jī)器學(xué)習(xí)技術(shù)的相關(guān)反饋研究[D];大連理工大學(xué);2011年
相關(guān)碩士學(xué)位論文 前1條
1 李正華;依存句法分析統(tǒng)計(jì)模型及樹庫轉(zhuǎn)化研究[D];哈爾濱工業(yè)大學(xué);2008年
,本文編號:2108558
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2108558.html
最近更新
教材專著