信息檢索模型風(fēng)險及其評價方法研究

發(fā)布時間：2018-10-13 11:35

【摘要】：隨著信息檢索技術(shù)的不斷發(fā)展,信息檢索模型不同階段出現(xiàn)的風(fēng)險問題,如相關(guān)性估計中的風(fēng)險,文檔排序中的風(fēng)險,查詢擴展中的風(fēng)險逐漸地被關(guān)注。研究這些風(fēng)險問題的突破點是設(shè)計一種可以同時評價性能均值和模型風(fēng)險的方法,量化這些風(fēng)險的大小,進而再找出降低風(fēng)險的策略。本文的研究重點有兩個,其一是對信息檢索風(fēng)險評價指標(biāo)的研究,具體工作是將基于偏差方差分解的風(fēng)險評價指標(biāo)從平均準(zhǔn)確率(AP)一般化到其他評價指標(biāo),并將該指標(biāo)中的目標(biāo)模型設(shè)置得更加公平無偏。另一個研究重點是如何降低信息檢索模型中存在的查詢擴展失敗的風(fēng)險。針對該問題,本文提出了一種基于知識圖譜的查詢擴展方法來降低檢索模型的風(fēng)險,即增加了查詢擴展模型中與查詢相關(guān)的信息來降低風(fēng)險,具體做法是從知識圖譜中抽取與查詢相關(guān)的若干實體及實體屬性作為擴展詞來重構(gòu)查詢,更好地表達(dá)用戶的信息需求;且在計算擴展詞權(quán)重時,參考了投資組合理論中的收益-風(fēng)險分析方法,最大化擴展詞和原查詢的相關(guān)性收益,同時也最小化擴展詞可能帶來的查詢漂移的風(fēng)險,更進一步控制了查詢擴展中的風(fēng)險。為檢驗本文所提出的基于偏差方差分解的風(fēng)險評價方法的合理性,實驗部分首先利用該評價方法重新評價了TREC Ad Hoc(1993-1999)和Web Track(2010-2013)兩個任務(wù)上提交的模型檢索結(jié)果,說明了所提評價方法對衡量模型整體性能的合理性,并利用偏差和方差對模型的有效性和穩(wěn)定性之間存在的折中現(xiàn)象做了量化分析。接著針對本文所提基于知識圖譜來降低查詢擴展風(fēng)險的策略,實驗部分在兩個網(wǎng)頁數(shù)據(jù)集上驗證了該策略的有效性,并和基于偽相關(guān)反饋的查詢擴展模型(RM3)做了對比分析,實驗結(jié)果表明該本文所提擴展模型在有效性和穩(wěn)定性上都優(yōu)于RM3。
[Abstract]:With the development of information retrieval technology, the risk problems in different stages of information retrieval model, such as the risk in correlation estimation, the risk in document sorting and the risk in query expansion, have been paid more and more attention. The breakthrough point of studying these risk problems is to design a method that can evaluate both the performance mean and model risk, quantify the size of these risks, and then find out the risk reduction strategy. There are two emphases in this paper. One is to study the risk evaluation index of information retrieval. The specific work is to generalize the risk evaluation index based on deviation variance decomposition from average accuracy (AP) to other evaluation index. And the target model in this index is set more fairly and unbiased. Another research focus is how to reduce the risk of query expansion failure in the information retrieval model. To solve this problem, this paper proposes a query extension method based on knowledge atlas to reduce the risk of retrieval model, that is, to reduce the risk by adding information related to query expansion model. The specific method is to extract a number of entities and entity attributes related to the query from the knowledge map as extension words to reconstruct the query, to better express the information needs of users, and to calculate the weight of the extended words. Referring to the profit-risk analysis method in portfolio theory, the paper maximizes the correlation benefit between the extension word and the original query, and minimizes the risk of query drift caused by the extended word, which further controls the risk in query expansion. In order to test the rationality of the risk assessment method based on deviation variance decomposition, the model retrieval results submitted by TREC Ad Hoc (1993-1999 and Web Track (2010-2013) were reevaluated in the experimental part. The rationality of the proposed evaluation method for measuring the overall performance of the model is explained, and the tradeoff between the validity and stability of the model is analyzed quantitatively by using deviation and variance. Then, aiming at the strategy of reducing the risk of query expansion based on knowledge atlas, the experimental results show that the strategy is effective in two web data sets, and is compared with the query extension Model (RM3) based on pseudo-correlation feedback. The experimental results show that the extended model proposed in this paper is more effective and stable than RM3..
【學(xué)位授予單位】：天津大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.3

【相似文獻】

相關(guān)期刊論文前10條

1 魯屹華;;信息檢索模型相關(guān)研究現(xiàn)狀及分析[J];科技經(jīng)濟市場;2011年11期

2 趙琳;;幾種信息檢索模型的比較[J];煤炭技術(shù);2012年08期

3 王娟;;基于中文科技期刊數(shù)據(jù)庫信息檢索模型的研究[J];科教文匯(中旬刊);2012年10期

4 齊繼國,高X},汪東升;基于多用戶協(xié)同反饋的信息檢索模型[J];小型微型計算機系統(tǒng);2003年07期

5 吳晨;張全;繆建明;;基于語言概念空間的跨語種信息檢索模型[J];計算機工程;2006年18期

6 吳麗華;羅云鋒;張宏斌;;信息檢索模型及相關(guān)性算法的研究[J];情報雜志;2006年12期

7 周竹榮;黃果;周亭;;一種混合的文本信息檢索模型研究[J];計算機工程與設(shè)計;2007年11期

8 鐘振鴻;印潤遠(yuǎn);于慶梅;;基于本體驅(qū)動的法律信息檢索模型[J];微計算機信息;2007年30期

9 劉偉成;孫吉紅;;跨語言信息檢索模型應(yīng)用研究[J];情報雜志;2007年10期

10 張小芳;;幾種常見信息檢索模型的分析與評價[J];情報雜志;2008年03期

相關(guān)會議論文前10條

1 梅偉;劉惟一;;基于可信度的信息檢索模型[A];第十九屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2002年

2 黃名選;嚴(yán)小衛(wèi);張師超;;基于完全加權(quán)關(guān)聯(lián)規(guī)則挖掘的信息檢索模型[A];第三屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集[C];2007年

3 黃國斌;王明文;葉浩;;一種新的基于中間語義的跨語言信息檢索模型[A];第四屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集（上）[C];2008年

4 張文雅;宋大為;趙曉朝;張鵬;李競飛;;基于可讀性的信息檢索模型研究[A];第十二屆全國人機語音通訊學(xué)術(shù)會議（NCMMSC'2013）論文集[C];2013年

5 李廣原;馮嘉禮;;基于屬性坐標(biāo)的文本信息檢索模型[A];廣西計算機學(xué)會2005年學(xué)術(shù)年會論文集[C];2005年

6 普東航;唐常杰;元昌安;廖勇;張?zhí)鞈c;于中華;;一種基于相鄰地址的信息檢索模型AAM[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（研究報告篇）[C];2004年

7 盛俊;王明文;余俊英;;一種基于潛在語義的Markov網(wǎng)絡(luò)信息檢索模型[A];第二屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議（NCIRCS-2005）論文集[C];2005年

8 吳晨;張全;繆建明;韋向峰;;自然語言語義理解下的信息檢索模型[A];第三屆學(xué)生計算語言學(xué)研討會論文集[C];2006年

9 孫斌;呂學(xué)強;蘇祺;;義項矩陣模型SMM簡介[A];NCIRCS2004第一屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集[C];2004年

10 黃明初;鐘威;何擁軍;蒙斌;;基于查詢擴展的數(shù)字檔案檢索策略[A];廣西計算機學(xué)會2010年學(xué)術(shù)年會論文集[C];2010年

相關(guān)博士學(xué)位論文前9條

1 楊為民;基于場論的信息檢索模型的研究[D];安徽大學(xué);2007年

2 徐建民;基于術(shù)語關(guān)系的貝葉斯網(wǎng)絡(luò)信息檢索模型擴展研究[D];天津大學(xué);2007年

3 陳圣兵;基于商空間理論的海量信息檢索模型的研究[D];安徽大學(xué);2010年

4 程凡;基于排序?qū)W習(xí)的信息檢索模型研究[D];中國科學(xué)技術(shù)大學(xué);2012年

5 涂新輝;基于概念的信息檢索模型研究[D];華中師范大學(xué);2012年

6 梁作鵬;面向Web的XML檢索關(guān)鍵技術(shù)研究[D];東南大學(xué);2005年

7 高琰;基于多特征的Web社區(qū)發(fā)現(xiàn)關(guān)鍵技術(shù)研究[D];中南大學(xué);2007年

8 郭曉黎;煤礦安全事件本體及其在查詢擴展中的應(yīng)用研究[D];中國礦業(yè)大學(xué)(北京);2016年

9 仲兆滿;事件本體及其在查詢擴展中的應(yīng)用[D];上海大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 郝林雪;信息檢索模型風(fēng)險及其評價方法研究[D];天津大學(xué);2016年

2 任克江;基于地理信息的檢索和用戶數(shù)據(jù)挖掘[D];大連理工大學(xué);2013年

3 丁志剛;基于類別意圖的信息檢索模型[D];北京郵電大學(xué);2009年

4 王慶華;用戶個性化信息檢索模型的設(shè)計與實現(xiàn)[D];大連理工大學(xué);2004年

5 黃果;文本信息檢索模型研究[D];西南大學(xué);2007年

6 張文雅;基于可讀性的信息檢索模型研究[D];天津大學(xué);2016年

7 張東偉;中英文跨語言信息檢索模型研究[D];黑龍江大學(xué);2006年

8 廖亞男;基于多層Markov網(wǎng)絡(luò)的信息檢索模型[D];江西師范大學(xué);2014年

9 左家莉;基于Markov網(wǎng)絡(luò)的信息檢索模型[D];江西師范大學(xué);2005年

10 王艷萍;基于XML的移動信息檢索模型研究[D];大連理工大學(xué);2006年

，

本文編號：2268471

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2268471.html

上一篇：一種基于時-空多尺度的運動目標(biāo)檢測方法
下一篇：基于粒計算和粗糙集的聚類算法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

信息檢索模型風(fēng)險及其評價方法研究