天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

面向網(wǎng)頁排序的關(guān)鍵詞權(quán)值計算

發(fā)布時間:2018-11-01 16:11
【摘要】:隨著信息科技的發(fā)展和互聯(lián)網(wǎng)的日益普及,搜索引擎深受人們的重視,近年來最主流的搜索引擎是基于關(guān)鍵詞檢索的搜索引擎,在基于關(guān)鍵詞檢索的搜索引擎中,用戶查詢語句中各個詞語權(quán)值計算的精度將直接影響到后續(xù)網(wǎng)頁排序的好壞,因此正確計算檢索條件中詞語權(quán)值是至關(guān)重要的。 本文的研究是試圖尋找一種面向網(wǎng)頁排序的用戶查詢語句關(guān)鍵詞權(quán)值計算方法,使基于關(guān)鍵詞檢索的搜索引擎在網(wǎng)頁排序這一環(huán)節(jié)達到一個更高的水平,為后續(xù)檢索處理打下良好的基礎(chǔ)。為了完成研究目的,本文的工作主要包括以下三個部分: 用戶查詢語句自身特點分析。對標注了核心詞的5000句查詢語句自身特點與詞語權(quán)值關(guān)系進行分析,對查詢語句中含有的停用詞和現(xiàn)代漢語語料中停用詞進行分析,并對不同類別下查詢語句中停用詞進行了分析和舉例。 面向網(wǎng)頁排序的關(guān)鍵詞權(quán)值計算。對用戶查詢?nèi)罩具M行分詞和詞性標注,將關(guān)鍵詞抽取任務(wù)視為分類任務(wù),結(jié)合查詢語句自身的特點,,最終確定出每個詞語的八個上下文特征作為決策樹森林分類的特征,并分別介紹了各個特征的計算方法。并對實驗結(jié)果進行錯誤分析,加入一些規(guī)則對模型分類的結(jié)果進行后處理。 實驗結(jié)果分析。對決策樹分類方法與傳統(tǒng)關(guān)鍵詞提取和權(quán)值計算方法的結(jié)果進行對比分析,從用戶查詢?nèi)罩局须S機抽取1000條左右查詢語句進行人工評測,使用交叉驗證的方法評測模型準確率和召回率;比較模型方法與傳統(tǒng)的網(wǎng)頁排序中權(quán)值計算方法的勝出率;選擇幾個查詢語句,到“百度”上搜索,得出由模型確定的關(guān)鍵詞序列進行搜索與不對關(guān)鍵詞進行處理的查詢語句搜索對網(wǎng)頁排序效果的影響。實驗結(jié)果表明本文采用的關(guān)鍵詞抽取和權(quán)值計算方法在網(wǎng)頁排序的權(quán)值計算中是切實可行的。
[Abstract]:With the development of information technology and the increasing popularity of the Internet, search engines are paid more attention by people. In recent years, the most mainstream search engine is the search engine based on keyword search, which is based on keyword search engine. The accuracy of calculating the weight of each word in the user query statement will directly affect the order of the subsequent web pages, so it is very important to correctly calculate the word weight value in the retrieval condition. In this paper, we try to find a method to calculate the keyword weight of user query statements in order to make the search engine based on keyword search reach a higher level. It lays a good foundation for the subsequent retrieval processing. In order to accomplish the purpose of the research, this paper mainly includes the following three parts: the characteristics of user query statements. This paper analyzes the relationship between the characteristics of the 5000 sentence query sentences marked with the core words and the weight of the words, and analyzes the stop words contained in the query statements and the stop words in the modern Chinese corpus. At the same time, the analysis and examples of stop-word in query statements under different categories are given. Keyword weight calculation for web page sorting. The segmentation and part of speech tagging of user query log is carried out, and the task of keyword extraction is regarded as a classification task. Combining with the characteristics of query statements, the eight contextual features of each word are finally determined as the characteristics of forest classification in decision tree. The calculation methods of each characteristic are introduced respectively. Error analysis of the experimental results is carried out, and some rules are added to post-process the results of model classification. Analysis of experimental results. The results of decision tree classification method and traditional keyword extraction and weight calculation methods are compared and analyzed. About 1000 query statements are randomly extracted from the user's query log for manual evaluation. The accuracy and recall rate of the model are evaluated by cross-validation. Compare the winning rate between the model method and the traditional weight calculation method in web page sorting; Several query statements are selected to search on "Baidu", and the influence of the keyword sequence determined by the model and the search statement that does not deal with the keywords on the ranking effect of the web pages is obtained. The experimental results show that the method of keyword extraction and weight calculation used in this paper is feasible in the weight calculation of web page sorting.
【學(xué)位授予單位】:中國社會科學(xué)院研究生院
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3

【參考文獻】

相關(guān)期刊論文 前10條

1 羅智勇;宋柔;;基于多特征的自適應(yīng)新詞識別[J];北京工業(yè)大學(xué)學(xué)報;2007年07期

2 李衛(wèi)東;宋威;李欣;楊炳儒;;一種多標準決策樹剪枝方法及其在入侵檢測中的應(yīng)用[J];北京科技大學(xué)學(xué)報;2007年04期

3 呂鳴劍;;數(shù)據(jù)挖掘在知識工程中的應(yīng)用研究[J];電腦知識與技術(shù);2011年23期

4 熊文新;宋柔;;信息檢索用戶查詢語句的停用詞過濾[J];計算機工程;2007年06期

5 張映海;何中市;陳永鋒;;搜索引擎結(jié)果中Web文檔的排序研究[J];計算機與數(shù)字工程;2007年02期

6 文炯;;搜索引擎之競價排名研究[J];江西圖書館學(xué)刊;2006年01期

7 游榮彥;Zipf定律與漢字字頻分布[J];中文信息學(xué)報;2000年03期

8 黃永文,何中市;基于互信息的統(tǒng)計語言模型平滑技術(shù)[J];中文信息學(xué)報;2005年04期

9 索紅光;劉玉樹;曹淑英;;一種基于詞匯鏈的關(guān)鍵詞抽取方法[J];中文信息學(xué)報;2006年06期

10 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期

相關(guān)會議論文 前2條

1 張建強;;基于語料庫的現(xiàn)代漢語疑問句使用情況調(diào)查[A];第五屆全國語言文字應(yīng)用學(xué)術(shù)研討會論文集[C];2007年

2 魏志成;;漢語句型系統(tǒng)的解構(gòu)與重構(gòu)[A];中國英漢語比較研究會第七次全國學(xué)術(shù)研討會論文集[C];2006年

相關(guān)博士學(xué)位論文 前1條

1 張俊林;基于語言模型的信息檢索系統(tǒng)研究[D];中國科學(xué)院研究生院(軟件研究所);2004年

相關(guān)碩士學(xué)位論文 前1條

1 毛婷婷;中文專有名詞識別的研究[D];大連理工大學(xué);2006年



本文編號:2304434

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2304434.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶aec8a***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com