基于用戶日志分析的搜索引擎排序算法的設計與實現(xiàn)
[Abstract]:With the rapid development of the Internet, how to find effective data from mass information becomes more and more important. Search engine provides users with high-quality query service interface by crawling and organizing the information in the network. Its appearance makes the acquisition of target information more convenient. Search engine has become an indispensable tool for Internet users to access network resources, but because of the huge amount of information on the Internet, search engines can not return satisfactory results every time: first, when users enter a query, The search engine will return a large number of related results, while the results most concerned by the user are not displayed in the front or most prominent position; Secondly, because the users have different understanding of the search engine, most users can not express the retrieval idea accurately through the retrieval request, which leads to the inaccuracy of the search results. Therefore, it is important to understand the user's intention through the search behavior to improve the accuracy of search engine results ranking. Based on the statistical analysis of search engine query log, this paper finds out the general rules of user access by the behavior of a large number of users, and then optimizes the sorting algorithm of web pages to guide the final result ranking. Improve the accuracy of search engine results sorting. This paper mainly includes two aspects: (1) analyzing search engine user query log. This paper studies the characteristics of search logs and their relationships, summarizes some basic behavior rules of Chinese search engine users, and finds out the changing trend of search behavior of Chinese search engine users according to the analysis of search logs in different periods. It provides the foundation for user behavior analysis of search engine in the future. (2) optimize the original sorting algorithm of Lucene. The original algorithm is a TF-IDF algorithm based on vector space model. The algorithm only pays attention to the frequency of keywords and the matching degree of documents, and does not consider the characteristics of web pages. A web page ranking algorithm based on word frequency matching and web page characteristics is designed. According to a large number of user query behavior logs, the user search behavior trend is studied, and the sorting factor of user recognition is added to the original sorting algorithm. According to the need of search engine, the weight coefficient of this factor can be adjusted to optimize the ranking of web pages. This can not only guarantee the correlation and matching degree of search results, but also make the ranking of the returned results more in line with the users' needs. The search engine system designed in this paper improves the sorting algorithm by boost factor, and makes a comparative analysis of the results of the original sorting algorithm and the optimized post-sorting algorithm combined with user feedback information. The results show that the optimized post-sorting algorithm can improve the order of query return results and provide a reference for future research on search engine users' query intention.
【學位授予單位】:武漢理工大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前9條
1 王建勇,單松巍,雷鳴,謝正茂,李曉明;海量Web搜索引擎系統(tǒng)中用戶行為的分布特征及其啟示[J];中國科學E輯:技術科學;2001年04期
2 王繼民,彭波;搜索引擎用戶訪問量模型[J];計算機工程與應用;2004年25期
3 陳紅濤;楊放春;陳磊;;基于大規(guī)模中文搜索引擎的搜索日志挖掘[J];計算機應用研究;2008年06期
4 李璐;江葆紅;孫紅紅;;如何提高文獻信息檢索中的查全率與查準率[J];科技文獻信息管理;2010年01期
5 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學報;2007年01期
6 岑榮偉;劉奕群;張敏;茹立云;馬少平;;基于日志挖掘的搜索引擎用戶行為分析[J];中文信息學報;2010年03期
7 詹圣君;邵雄凱;劉建舟;;一種考慮用戶行為的改進N—PageRank算法[J];計算機技術與發(fā)展;2011年08期
8 陳勇;張漢國;成筠;;基于Lucene的全文搜索引擎[J];現(xiàn)代計算機(專業(yè)版);2009年11期
9 張賢;周婭;;基于Lucene網(wǎng)頁排序算法的改進[J];計算機系統(tǒng)應用;2009年02期
相關碩士學位論文 前8條
1 楊晶晶;基于用戶隱性反饋的信息覓食模型研究[D];北京郵電大學;2011年
2 王宇;基于搜索歷史的用戶興趣建模[D];復旦大學;2011年
3 任麗蕓;搜索引擎中文分詞技術研究[D];重慶理工大學;2011年
4 王亮;搜索引擎及其相關性排序研究[D];武漢大學;2004年
5 王嘉杰;面向博客領域的垂直搜索引擎的研究與實現(xiàn)[D];北京郵電大學;2009年
6 徐海;基于Lucene垂直搜索引擎的研究與實現(xiàn)[D];西安科技大學;2009年
7 金祖旭;基于用戶反饋的搜索引擎排名算法研究[D];復旦大學;2010年
8 王霞;基于WEB瀏覽的用戶行為分析系統(tǒng)的研究與設計[D];北京郵電大學;2010年
,本文編號:2321200
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2321200.html