基于分類技術(shù)的個性化檢索系統(tǒng)的研究與設(shè)計
[Abstract]:With the rapid development of Internet and network information technology, the network resources increase exponentially. The query results of traditional general search engine only depend on the query keywords, but in fact, even if the same query words, Different users may query for different purposes, and the desired return results will vary from person to person. In view of this situation, people urgently need a search tool to provide more accurate query results according to individual characteristics. In this paper, a user-centered personalized search engine based on classification is proposed. Based on the thorough analysis of the relevant technologies of personalized information retrieval, this paper studies the common technologies of personalized search engine and the main technology of understanding the purpose of user search in the search engine. According to the user's browsing and query log, the model of retrieval system is established. This paper introduces the automatic text classification, presents several common text representation models, and makes use of WEKA and LibSVM to classify the text automatically. Based on text classification, a sorting algorithm is proposed, in which as many categories as possible can be displayed in the retrieval results, so that users of as many different categories as possible can find the information of the corresponding subject categories. At the same time, according to the user behavior characteristics, that is, the user's click rate of each topic category and the average visit time of each topic category web page, by modifying the lucene scoring field, we can change the lucene's own ranking score on the documents. It is proved by experiments that different result pages can be retrieved when users with different interests query the same words after considering the behavior characteristics of users. Because a large part of the search keywords are repeated, 20% of the search terms account for 80% of the total search times according to the law of 2 / 8. When the user submits a query consisting of a set of keywords, the system determines whether the corresponding record of the query exists in the cache, and if not, submits the query statement to the searcher. The synthetic document number sequence of the result returned by the searcher is stored in a file and the offset value of the stored sequence in the file is saved in the cache. If it already exists, the offset of the stored record is obtained from Cache. Then the design and implementation of the prototype of the system is given. Firstly, the complete architecture of the system is given, and then several main modules, such as retrieval module, result ranking module, query cache module, etc., are described in detail, and several main data structures in the system are analyzed. Finally, the system is tested and analyzed, and the feasibility is verified. Finally, the paper summarizes the work of this paper and looks forward to the next work plan. At the same time, some defects of the system are pointed out, and the improvement method of the whole system is put forward.
【學(xué)位授予單位】:武漢理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 李巍巍;;全文檢索引擎工具包Lucene的結(jié)構(gòu)與索引原理的研究[J];才智;2008年09期
2 趙銀春,付關(guān)友,朱征宇;基于Web瀏覽內(nèi)容和行為相結(jié)合的用戶興趣挖掘[J];計算機(jī)工程;2005年12期
3 原福永;梁順攀;;元搜索引擎的現(xiàn)狀與發(fā)展[J];計算機(jī)工程與設(shè)計;2005年12期
4 吳小蘭;汪琪;;元搜索引擎研究綜述[J];圖書情報工作;2009年09期
5 門鳳超;濮德敏;王東菊;;論元搜索引擎的實(shí)現(xiàn)技術(shù)與發(fā)展趨勢[J];現(xiàn)代情報;2008年07期
相關(guān)碩士學(xué)位論文 前10條
1 吳代文;基于Lucene的二次全文檢索系統(tǒng)設(shè)計與實(shí)現(xiàn)[D];西安電子科技大學(xué);2009年
2 黃衛(wèi)平;個性化搜索引擎的研究與實(shí)現(xiàn)[D];武漢理工大學(xué);2011年
3 藺繼國;基于點(diǎn)擊數(shù)據(jù)分析的個性化搜索引擎研究[D];國防科學(xué)技術(shù)大學(xué);2010年
4 蘇力華;基于向量空間模型的文本分類技術(shù)研究[D];西安電子科技大學(xué);2006年
5 霍長青;個性化元搜索引擎研究與設(shè)計[D];山東科技大學(xué);2006年
6 龐劍鋒;基于向量空間模型的自反饋的文本分類系統(tǒng)的研究與實(shí)現(xiàn)[D];中國科學(xué)院研究生院(計算技術(shù)研究所);2001年
7 鄒漢斌;支持向量機(jī)在文本分類中的應(yīng)用[D];江南大學(xué);2006年
8 董梅;文本內(nèi)容的信息過濾技術(shù)研究[D];合肥工業(yè)大學(xué);2006年
9 丁瓊;基于向量空間模型的文本自動分類系統(tǒng)的研究與實(shí)現(xiàn)[D];同濟(jì)大學(xué);2007年
10 王小燕;文本分類相關(guān)技術(shù)與應(yīng)用研究[D];西北大學(xué);2007年
,本文編號:2120952
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2120952.html