私有信息檢索算法研究
發(fā)布時間:2019-02-27 08:33
【摘要】:隨著信息技術(shù)的廣泛應用,公共可訪問的數(shù)據(jù)庫和搜索引擎是用戶獲取最新信息的重要資源。但是,由于傳統(tǒng)的私有信息檢索模型本身存在的不足,,很難應用于實際的大型數(shù)據(jù)庫和搜索引擎中。因此,研究新的、實用的私有信息檢索模型及算法具有重要的意義。 通過對現(xiàn)有的私有信息檢索系統(tǒng)以及基于詞語語義相似度的私有信息檢索系統(tǒng)的功能要求進行分析,給出了一個基于詞語語義相似度的私有信息檢索模型。對模型中的詞語語義相似度計算、偽造關鍵字的選擇策略、查詢信息隱藏和查詢結(jié)果過濾進行了相關的分析,設計了私有信息檢索系統(tǒng)的總體架構(gòu)。系統(tǒng)架構(gòu)包括詞語語義相似度計算模塊、查詢處理模塊和頁面抓取過濾模塊。 給出了基于WordNet和HowNet的詞語語義相似度計算的算法實現(xiàn)。在已有的基于WordNet的詞語語義相似度計算算法的基礎上,引入節(jié)點深度的影響因素。然后將基于WordNet的詞語語義相似度的計算算法應用于HowNet的義原相似度計算中。實驗表明,改進算法的相似度計算結(jié)果更精確,更符合人們?nèi)粘5恼Z義習慣。 給出了基于詞語語義相似度的私有信息檢索算法。其中偽造關鍵字的選擇標準是算法的關鍵之處。該算法選擇詞語語義相似度作為偽造關鍵字的選擇標準,要求偽造關鍵字與目標關鍵字的語義相似度滿足一定的條件。該算法的時間復雜度是O (k),其中k表示偽造關鍵字的個數(shù)。實驗表明,基于詞語語義相似度的私有信息檢索模型同GooPir模型相比,查詢結(jié)果質(zhì)量有一定的提高,信息熵有所下降,但降幅不大。
[Abstract]:With the wide application of information technology, publicly accessible databases and search engines are important resources for users to obtain the latest information. However, due to the shortcomings of the traditional private information retrieval model, it is difficult to apply to the actual large-scale database and search engine. Therefore, it is of great significance to study new and practical private information retrieval models and algorithms. By analyzing the functional requirements of existing private information retrieval systems and private information retrieval systems based on word semantic similarity, a private information retrieval model based on word semantic similarity is proposed. In this paper, the semantic similarity calculation of words in the model, the selection strategy of forged keywords, the hiding of query information and the filtering of query results are analyzed, and the overall architecture of private information retrieval system is designed. The system architecture includes word semantic similarity computing module, query processing module and page crawling filter module. The algorithm implementation of semantic similarity calculation based on WordNet and HowNet is given. Based on the existing algorithms for computing semantic similarity of words based on WordNet, the influencing factors of node depth are introduced. Then the semantic similarity calculation algorithm based on WordNet is applied to the semantic similarity calculation of HowNet. Experimental results show that the similarity calculation results of the improved algorithm are more accurate and more consistent with the daily semantic habits of people. A private information retrieval algorithm based on semantic similarity of words is presented. The key point of the algorithm is the selection criteria of forged keywords. This algorithm chooses semantic similarity of words as the selection criterion of forged keywords and requires semantic similarity between forged keywords and target keywords to satisfy certain conditions. The time complexity of the algorithm is O (k), where k denotes the number of forged keywords. The experimental results show that compared with GooPir model, the quality of query results is improved, the entropy of information is decreased, but the decrease of information entropy is not obvious in the private information retrieval model based on semantic similarity of words.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
本文編號:2431286
[Abstract]:With the wide application of information technology, publicly accessible databases and search engines are important resources for users to obtain the latest information. However, due to the shortcomings of the traditional private information retrieval model, it is difficult to apply to the actual large-scale database and search engine. Therefore, it is of great significance to study new and practical private information retrieval models and algorithms. By analyzing the functional requirements of existing private information retrieval systems and private information retrieval systems based on word semantic similarity, a private information retrieval model based on word semantic similarity is proposed. In this paper, the semantic similarity calculation of words in the model, the selection strategy of forged keywords, the hiding of query information and the filtering of query results are analyzed, and the overall architecture of private information retrieval system is designed. The system architecture includes word semantic similarity computing module, query processing module and page crawling filter module. The algorithm implementation of semantic similarity calculation based on WordNet and HowNet is given. Based on the existing algorithms for computing semantic similarity of words based on WordNet, the influencing factors of node depth are introduced. Then the semantic similarity calculation algorithm based on WordNet is applied to the semantic similarity calculation of HowNet. Experimental results show that the similarity calculation results of the improved algorithm are more accurate and more consistent with the daily semantic habits of people. A private information retrieval algorithm based on semantic similarity of words is presented. The key point of the algorithm is the selection criteria of forged keywords. This algorithm chooses semantic similarity of words as the selection criterion of forged keywords and requires semantic similarity between forged keywords and target keywords to satisfy certain conditions. The time complexity of the algorithm is O (k), where k denotes the number of forged keywords. The experimental results show that compared with GooPir model, the quality of query results is improved, the entropy of information is decreased, but the decrease of information entropy is not obvious in the private information retrieval model based on semantic similarity of words.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前3條
1 夏天;;漢語詞語語義相似度計算研究[J];計算機工程;2007年06期
2 吳健,吳朝暉,李瑩,鄧水光;基于本體論和詞匯語義相似度的Web服務發(fā)現(xiàn)[J];計算機學報;2005年04期
3 祁X;黃劉生;羅永龍;荊巍巍;;一種高效的私有信息檢索方案[J];小型微型計算機系統(tǒng);2007年07期
本文編號:2431286
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2431286.html
最近更新
教材專著