天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于自動(dòng)文摘與用戶反饋的個(gè)性化搜索引擎系統(tǒng)的研究與設(shè)計(jì)

發(fā)布時(shí)間:2018-12-30 16:14
【摘要】:在信息爆炸的今天,搜索引擎已經(jīng)成為了一種從大量的數(shù)據(jù)信息中發(fā)現(xiàn)、推理知識的有效工具。但是,傳統(tǒng)的搜索引擎系統(tǒng)存在著對于不同用戶的同樣查詢會返回相同結(jié)果的弊端,而且用戶也越來越迫切地希望系統(tǒng)能返回更高準(zhǔn)確率的結(jié)果。所以,本文將自動(dòng)文摘和用戶反饋技術(shù)引入到傳統(tǒng)的搜索引擎系統(tǒng)中,以此提高系統(tǒng)的精確率。 本文通過分析傳統(tǒng)搜索引擎MG(Managing Gigabytes)系統(tǒng)模型,研究并設(shè)計(jì)了一個(gè)相對完整的個(gè)性化搜索引擎系統(tǒng)。根據(jù)需求分析,本文把系統(tǒng)分為了文檔處理模塊、聚類模塊、用戶查詢處理模塊、用戶分類模塊、系統(tǒng)反饋模塊、相似度計(jì)算模塊、排序模塊、結(jié)果顯示模塊以及系統(tǒng)評估模塊。系統(tǒng)首先對用戶進(jìn)行聚類分析,提取用戶的興趣模型;然后根據(jù)用戶反饋信息,在計(jì)算查詢向量與文檔向量的相似度時(shí),調(diào)整個(gè)性化參數(shù),使查詢結(jié)果更加精確。同時(shí)還對文檔的特征項(xiàng)約簡算法進(jìn)行了改進(jìn),首先對文檔進(jìn)行自動(dòng)文摘處理,其次分析文檔摘要提取特征項(xiàng)集,然后對特征項(xiàng)按照對文檔類別的貢獻(xiàn)度進(jìn)行排序,最后在保證精確率的前提下以犧牲完備性來換取特征項(xiàng)的快速收斂。系統(tǒng)還結(jié)合了最小完美哈希函數(shù)與大內(nèi)存存儲技術(shù),降低了倒排文檔字典的存儲空間并且提升了倒排文檔索引的讀取速度。最后通過建立最小堆數(shù)據(jù)結(jié)構(gòu)對海量文檔的排序進(jìn)行了空間上的優(yōu)化。 通過理論分析和實(shí)驗(yàn)論證,相比MG搜索引擎系統(tǒng)而言,特征項(xiàng)約簡算法改進(jìn)后,時(shí)間效率有了一定地提高;倒排文檔索引字典的存儲空間節(jié)省了將近一半;文檔排序算法改進(jìn)后,降低了排序的空間復(fù)雜度;相似度計(jì)算算法改進(jìn)后,,對于個(gè)人的興趣而言,使查詢的個(gè)性化精確率有了一定地提升。
[Abstract]:With the information explosion, search engine has become an effective tool for discovering and reasoning knowledge from a large amount of data. However, the traditional search engine system has the disadvantage that the same query for different users will return the same result, and users are more and more eager for the system to return more accurate results. In this paper, automatic abstracts and user feedback techniques are introduced into the traditional search engine system to improve the accuracy of the system. By analyzing the traditional search engine MG (Managing Gigabytes) system model, this paper studies and designs a relatively complete personalized search engine system. According to requirement analysis, the system is divided into document processing module, clustering module, user query processing module, user classification module, system feedback module, similarity calculation module, sorting module. The result display module and the system evaluation module. The system firstly analyzes the users and extracts the interest model of the users, then adjusts the personalized parameters to make the query result more accurate when calculating the similarity between the query vector and the document vector according to the user feedback information. At the same time, the algorithm of feature item reduction is improved. Firstly, the document is abstracted automatically, then the feature item set is extracted by analyzing the document summary, and then the feature item is sorted according to the contribution to the document category. Finally, the fast convergence of the feature term is obtained at the expense of completeness. The system also combines the minimum perfect hash function and the large memory storage technology to reduce the storage space of the inverted document dictionary and to improve the reading speed of the inverted document index. Finally, the sorting of massive documents is optimized by building the minimum heap data structure. Through theoretical analysis and experimental demonstration, compared with MG search engine system, the time efficiency of feature item reduction algorithm is improved, and the storage space of inverted document index dictionary is saved nearly half. After the improvement of document sorting algorithm, the complexity of sorting space is reduced. After the improvement of similarity calculation algorithm, the personalized accuracy rate of query is improved to a certain extent.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 吳從p

本文編號:2395809


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2395809.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8fefe***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com