天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

信息檢索中相關反饋算法的研究

發(fā)布時間:2018-10-14 12:25
【摘要】:信息檢索是關于信息的結構、分析、組織、存儲、搜索和檢索的領域。概括的說,信息檢索就是從非結構化的信息集合中找出與用戶需求相關的信息。信息檢索的一個核心問題是注重用戶和他們的信息需求,因為對搜索的評價是以用戶為中心的。這種理念引發(fā)了大量關于人們怎樣與搜索引擎進行交互的研究,特別是開發(fā)幫助用戶表達他們的信息需求的技術。 在用戶參與的檢索過程中,用戶提交一個簡短的查詢,系統(tǒng)返回初次查詢結果,,用戶對部分結果進行標注,標注為相關或不相關,系統(tǒng)基于用戶的反饋計算出一個更好的查詢來表示信息需求,并重新返回一批更有可能滿足用戶需求的新的檢索結果,這個過程叫做相關反饋。在信息檢索過程中使用相關反饋技術能夠優(yōu)化查詢結果,提高查詢效率。 本文從介紹相關反饋技術的現(xiàn)狀出發(fā),給出了相關反饋技術的有關算法,包括向量空間模型,概率模型和布爾模型中的相關反饋技術。其中,以基于向量空間模型的Rocchio相關反饋算法為主,詳細介紹了該算法的思想和執(zhí)行過程及其在某些特定情況下查詢效果不好的現(xiàn)象,如某個查詢的答案集合本身就需要不同類的文檔來組成和通常以多個具體概念的或關系來出現(xiàn)的詞這兩個方面,對Rocchio相關反饋算法進行改進,使該算法在這兩種特殊情況下也能得到好的返回結果。 本文就此做了以下貢獻: (1)在查詢語句包含多條件內容時,根據(jù)Rocchio相關反饋算法的思想,提出了將包含有兩個條件信息的文檔集看成新的交叉類,在交叉類范圍內,從離初始查詢最近的質心開始,向著另一個質心不斷移動,在此過程中找到理想結果。改進后的Rocchio相關反饋算法能夠有效解決多條件查詢時返回結果不理想的狀況。 (2)在多義詞查詢時,系統(tǒng)返回的結果往往混亂無序,本文設計了一種對結果屬性進行聚類的算法:層次收縮算法。該算法首先獲取系統(tǒng)返回結果的關鍵詞,用布爾矩陣表達,然后以文檔間關鍵詞個數(shù)作為度量方式,計算文檔間相似度,按照文檔間相似度,以合取方式將文檔層次合并,聚類結束后提取返回的標簽。在不考慮召回率的情況下,該算法的最終結果收斂于對簇中文檔具有高度表達性的關鍵詞,具有較高的正確率。
[Abstract]:Information retrieval is about the structure, analysis, organization, storage, search and retrieval of information. Generally speaking, information retrieval is to find out the information related to the user's needs from the unstructured information set. One of the core problems of information retrieval is to focus on users and their information needs, because the evaluation of search is user-centered. This concept has led to a great deal of research on how people interact with search engines, especially the development of technologies to help users express their information needs. In the retrieval process, the user submits a short query, the system returns the first query results, and the user marks some of the results as relevant or irrelevant. The system computes a better query to represent the information requirement based on the user's feedback and returns a batch of new retrieval results which are more likely to satisfy the user's needs. This process is called correlation feedback. In the process of information retrieval, the related feedback technique can optimize the query results and improve the query efficiency. In this paper, based on the introduction of the current situation of the correlation feedback technology, the relevant algorithms of the correlation feedback technology are presented, including the vector space model, the probability model and the Boolean model. Among them, the Rocchio correlation feedback algorithm based on vector space model is mainly used. The idea and execution process of the algorithm and the phenomenon that the query effect is not good in some special cases are introduced in detail. For example, the answer set of a query itself requires documents of different classes to compose and words that usually appear in multiple concrete concepts or relationships to improve the Rocchio correlation feedback algorithm. So that the algorithm can also get good results in these two special cases. In this paper, the following contributions are made: (1) when a query statement contains multiple conditional content, according to the idea of Rocchio correlation feedback algorithm, a document set containing two conditional information is considered as a new crossover class, which is within the scope of a cross-class. Starting with the center of mass nearest to the initial query, moving to another center of mass, the desired result is found in the process. The improved Rocchio correlation feedback algorithm can effectively solve the unsatisfactory result of multi-conditional query. (2) in polysemy query, the system returns chaotic and disordered results. In this paper, a hierarchical shrinkage algorithm is designed to cluster the result attributes. The algorithm firstly acquires the key words returned by the system, expresses them with Boolean matrix, then calculates the similarity between documents by taking the number of keywords among documents as a measure, and merges the document hierarchy according to the similarity between documents. The returned label is extracted after clustering. Without considering the recall rate, the final result of the algorithm converges to the key words that are highly expressive to the documents in the cluster, and has a high accuracy.
【學位授予單位】:河南大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3

【參考文獻】

相關碩士學位論文 前1條

1 敬斌;全景視覺足球機器人視覺處理系統(tǒng)設計[D];西安電子科技大學;2007年



本文編號:2270449

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2270449.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶c6625***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com