基于用戶日志分析的搜索引擎排序算法的設(shè)計(jì)與實(shí)現(xiàn)
[Abstract]:With the rapid development of the Internet, how to find effective data from mass information becomes more and more important. Search engine provides users with high-quality query service interface by crawling and organizing the information in the network. Its appearance makes the acquisition of target information more convenient. Search engine has become an indispensable tool for Internet users to access network resources, but because of the huge amount of information on the Internet, search engines can not return satisfactory results every time: first, when users enter a query, The search engine will return a large number of related results, while the results most concerned by the user are not displayed in the front or most prominent position; Secondly, because the users have different understanding of the search engine, most users can not express the retrieval idea accurately through the retrieval request, which leads to the inaccuracy of the search results. Therefore, it is important to understand the user's intention through the search behavior to improve the accuracy of search engine results ranking. Based on the statistical analysis of search engine query log, this paper finds out the general rules of user access by the behavior of a large number of users, and then optimizes the sorting algorithm of web pages to guide the final result ranking. Improve the accuracy of search engine results sorting. This paper mainly includes two aspects: (1) analyzing search engine user query log. This paper studies the characteristics of search logs and their relationships, summarizes some basic behavior rules of Chinese search engine users, and finds out the changing trend of search behavior of Chinese search engine users according to the analysis of search logs in different periods. It provides the foundation for user behavior analysis of search engine in the future. (2) optimize the original sorting algorithm of Lucene. The original algorithm is a TF-IDF algorithm based on vector space model. The algorithm only pays attention to the frequency of keywords and the matching degree of documents, and does not consider the characteristics of web pages. A web page ranking algorithm based on word frequency matching and web page characteristics is designed. According to a large number of user query behavior logs, the user search behavior trend is studied, and the sorting factor of user recognition is added to the original sorting algorithm. According to the need of search engine, the weight coefficient of this factor can be adjusted to optimize the ranking of web pages. This can not only guarantee the correlation and matching degree of search results, but also make the ranking of the returned results more in line with the users' needs. The search engine system designed in this paper improves the sorting algorithm by boost factor, and makes a comparative analysis of the results of the original sorting algorithm and the optimized post-sorting algorithm combined with user feedback information. The results show that the optimized post-sorting algorithm can improve the order of query return results and provide a reference for future research on search engine users' query intention.
【學(xué)位授予單位】:武漢理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 王建勇,單松巍,雷鳴,謝正茂,李曉明;海量Web搜索引擎系統(tǒng)中用戶行為的分布特征及其啟示[J];中國(guó)科學(xué)E輯:技術(shù)科學(xué);2001年04期
2 王繼民,彭波;搜索引擎用戶訪問量模型[J];計(jì)算機(jī)工程與應(yīng)用;2004年25期
3 陳紅濤;楊放春;陳磊;;基于大規(guī)模中文搜索引擎的搜索日志挖掘[J];計(jì)算機(jī)應(yīng)用研究;2008年06期
4 李璐;江葆紅;孫紅紅;;如何提高文獻(xiàn)信息檢索中的查全率與查準(zhǔn)率[J];科技文獻(xiàn)信息管理;2010年01期
5 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期
6 岑榮偉;劉奕群;張敏;茹立云;馬少平;;基于日志挖掘的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2010年03期
7 詹圣君;邵雄凱;劉建舟;;一種考慮用戶行為的改進(jìn)N—PageRank算法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年08期
8 陳勇;張漢國(guó);成筠;;基于Lucene的全文搜索引擎[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2009年11期
9 張賢;周婭;;基于Lucene網(wǎng)頁(yè)排序算法的改進(jìn)[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2009年02期
相關(guān)碩士學(xué)位論文 前8條
1 楊晶晶;基于用戶隱性反饋的信息覓食模型研究[D];北京郵電大學(xué);2011年
2 王宇;基于搜索歷史的用戶興趣建模[D];復(fù)旦大學(xué);2011年
3 任麗蕓;搜索引擎中文分詞技術(shù)研究[D];重慶理工大學(xué);2011年
4 王亮;搜索引擎及其相關(guān)性排序研究[D];武漢大學(xué);2004年
5 王嘉杰;面向博客領(lǐng)域的垂直搜索引擎的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2009年
6 徐海;基于Lucene垂直搜索引擎的研究與實(shí)現(xiàn)[D];西安科技大學(xué);2009年
7 金祖旭;基于用戶反饋的搜索引擎排名算法研究[D];復(fù)旦大學(xué);2010年
8 王霞;基于WEB瀏覽的用戶行為分析系統(tǒng)的研究與設(shè)計(jì)[D];北京郵電大學(xué);2010年
,本文編號(hào):2321200
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2321200.html