面向網(wǎng)頁排序的關(guān)鍵詞權(quán)值計算
[Abstract]:With the development of information technology and the increasing popularity of the Internet, search engines are paid more attention by people. In recent years, the most mainstream search engine is the search engine based on keyword search, which is based on keyword search engine. The accuracy of calculating the weight of each word in the user query statement will directly affect the order of the subsequent web pages, so it is very important to correctly calculate the word weight value in the retrieval condition. In this paper, we try to find a method to calculate the keyword weight of user query statements in order to make the search engine based on keyword search reach a higher level. It lays a good foundation for the subsequent retrieval processing. In order to accomplish the purpose of the research, this paper mainly includes the following three parts: the characteristics of user query statements. This paper analyzes the relationship between the characteristics of the 5000 sentence query sentences marked with the core words and the weight of the words, and analyzes the stop words contained in the query statements and the stop words in the modern Chinese corpus. At the same time, the analysis and examples of stop-word in query statements under different categories are given. Keyword weight calculation for web page sorting. The segmentation and part of speech tagging of user query log is carried out, and the task of keyword extraction is regarded as a classification task. Combining with the characteristics of query statements, the eight contextual features of each word are finally determined as the characteristics of forest classification in decision tree. The calculation methods of each characteristic are introduced respectively. Error analysis of the experimental results is carried out, and some rules are added to post-process the results of model classification. Analysis of experimental results. The results of decision tree classification method and traditional keyword extraction and weight calculation methods are compared and analyzed. About 1000 query statements are randomly extracted from the user's query log for manual evaluation. The accuracy and recall rate of the model are evaluated by cross-validation. Compare the winning rate between the model method and the traditional weight calculation method in web page sorting; Several query statements are selected to search on "Baidu", and the influence of the keyword sequence determined by the model and the search statement that does not deal with the keywords on the ranking effect of the web pages is obtained. The experimental results show that the method of keyword extraction and weight calculation used in this paper is feasible in the weight calculation of web page sorting.
【學(xué)位授予單位】:中國社會科學(xué)院研究生院
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前10條
1 羅智勇;宋柔;;基于多特征的自適應(yīng)新詞識別[J];北京工業(yè)大學(xué)學(xué)報;2007年07期
2 李衛(wèi)東;宋威;李欣;楊炳儒;;一種多標準決策樹剪枝方法及其在入侵檢測中的應(yīng)用[J];北京科技大學(xué)學(xué)報;2007年04期
3 呂鳴劍;;數(shù)據(jù)挖掘在知識工程中的應(yīng)用研究[J];電腦知識與技術(shù);2011年23期
4 熊文新;宋柔;;信息檢索用戶查詢語句的停用詞過濾[J];計算機工程;2007年06期
5 張映海;何中市;陳永鋒;;搜索引擎結(jié)果中Web文檔的排序研究[J];計算機與數(shù)字工程;2007年02期
6 文炯;;搜索引擎之競價排名研究[J];江西圖書館學(xué)刊;2006年01期
7 游榮彥;Zipf定律與漢字字頻分布[J];中文信息學(xué)報;2000年03期
8 黃永文,何中市;基于互信息的統(tǒng)計語言模型平滑技術(shù)[J];中文信息學(xué)報;2005年04期
9 索紅光;劉玉樹;曹淑英;;一種基于詞匯鏈的關(guān)鍵詞抽取方法[J];中文信息學(xué)報;2006年06期
10 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期
相關(guān)會議論文 前2條
1 張建強;;基于語料庫的現(xiàn)代漢語疑問句使用情況調(diào)查[A];第五屆全國語言文字應(yīng)用學(xué)術(shù)研討會論文集[C];2007年
2 魏志成;;漢語句型系統(tǒng)的解構(gòu)與重構(gòu)[A];中國英漢語比較研究會第七次全國學(xué)術(shù)研討會論文集[C];2006年
相關(guān)博士學(xué)位論文 前1條
1 張俊林;基于語言模型的信息檢索系統(tǒng)研究[D];中國科學(xué)院研究生院(軟件研究所);2004年
相關(guān)碩士學(xué)位論文 前1條
1 毛婷婷;中文專有名詞識別的研究[D];大連理工大學(xué);2006年
本文編號:2304434
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2304434.html