基于百度百科的查詢意圖分類

發(fā)布時間：2018-11-24 11:14

【摘要】：萬維網(wǎng)中的多數(shù)網(wǎng)頁都是基于HTML語言編寫的，隨著網(wǎng)頁數(shù)目劇增，搜索引擎的搜索難度增加。如果搜索引擎能自動識別查詢意圖，將返回結果進行意圖分類，用戶在意圖類別中找到相應的查詢結果，這樣的查詢結果會大幅度地提高了用戶滿意度。實際查詢中，用戶輸入查詢詞可能包含多個查詢意圖，搜索引擎通過分析用戶瀏覽行為可能預測用戶的查詢意圖。如果搜索引擎能自動識別用戶查詢意圖，并對查詢結果進行有效排序，，良好的返回結果對用戶是非常有用的。因此，搜索引擎主動預測用戶查詢意圖是未來搜索行為的關鍵所在。如果用戶輸入查詢詞較短且查詢信息需求不足，通用搜索引擎返回的查詢結果大多數(shù)不符合用戶的查詢需求。針對查詢結果不準確的問題，搜索引擎能否將查詢結果按查詢意圖分類？然而，對于查詢意圖分類問題也有著巨大挑戰(zhàn)，其中包括：意圖表示、意圖范圍、句子表示三方面內(nèi)容。本文主要方法是基于百度百科的查詢意圖分類，百科中包含有很多概念和類別，而且絕大多數(shù)概念都有特定領域的關鍵詞，每一個概念都是由一篇文章組成。用戶輸入新查詢詞與百科中概念進行句子相似度計算，在最相似類別下進行隨機游走，最終得到用戶滿意的查詢結果。實驗結果表明，本文提出的方法的實驗結果良好。
[Abstract]:Most web pages in the World wide Web are based on the HTML language. With the number of web pages increasing dramatically, search engines become more difficult to search. If the search engine can automatically identify the query intention, the result will be returned to classify the intention, and the user will find the corresponding query results in the intention category, which will greatly improve the user satisfaction. In the actual query, the user input query words may contain multiple query intentions, and the search engine may predict the user's query intention by analyzing the user's browsing behavior. If the search engine can automatically identify the user's query intention and sort the query results effectively, a good return result is very useful to the user. Therefore, it is the key of future search behavior that the search engine actively predicts the user's query intention. If the user input query term is short and the query information requirement is insufficient, most of the query results returned by the general search engine do not meet the query requirements of the user. In view of the inaccuracy of the query results, can search engine classify the query results according to the query intention? However, there is also a great challenge to query the classification of intention, which includes intention representation, intention scope and sentence representation. The main method of this paper is based on Baidu Encyclopedia query intention classification, encyclopedia contains many concepts and categories, and most of the concepts have specific domain keywords, each concept is composed of an article. The users input the new query words and the concepts in encyclopedia to calculate the sentence similarity, walk randomly under the most similar category, and finally get the satisfactory query results. The experimental results show that the proposed method has good experimental results.
【學位授予單位】：吉林大學
【學位級別】：碩士
【學位授予年份】：2013
【分類號】：TP391.3

【共引文獻】

相關期刊論文前7條

1 趙火軍;溫有奎;;基于引文鏈的知識元挖掘研究[J];情報雜志;2009年03期

2 李玉紅;柴林燕;張琪;;結合分詞技術與語句相似度的主觀題自動判分算法[J];計算機工程與設計;2010年11期

3 邸書靈;劉曉飛;李歡;;基于分詞的語句相似度計算的改進[J];石家莊鐵道大學學報(自然科學版);2011年04期

4 李偉;楊思春;紀濱;;自動答疑系統(tǒng)中問題的聚類分析[J];計算機技術與發(fā)展;2012年03期

5 肖明;曾莉;;基于實例的機器翻譯系統(tǒng)的模型設計[J];西南民族大學學報(自然科學版);2009年04期

6 譚新星;江華;;基于Petri網(wǎng)的機器翻譯研究[J];譯林(學術版);2011年Z1期

7 周群芳;;相似專利檢測研究[J];現(xiàn)代圖書情報技術;2012年11期

相關碩士學位論文前10條

1 陳繼祥;基于J2EE的網(wǎng)絡考試系統(tǒng)的研究與實現(xiàn)[D];南昌大學;2010年

2 李海光;基于位置和語義特征的中文命名實體關系抽取研究[D];合肥工業(yè)大學;2011年

3 何亞;主觀題輔助評分方法的研究與應用[D];中南大學;2010年

4 李洋;基于本體的智能電網(wǎng)知識檢索系統(tǒng)[D];北京理工大學;2011年

5 錢躍;基于文本挖掘的學者簡歷自動生成[D];大連理工大學;2011年

6 王繼遠;一種用于軟件作業(yè)評判系統(tǒng)的程序結構分析算法的設計與實現(xiàn)[D];北京郵電大學;2007年

7 張鵬;C程序相似代碼識別方法的研究與實現(xiàn)[D];大連理工大學;2008年

8 劉利軍;云南省大型科學儀器共用網(wǎng)業(yè)務系統(tǒng)關鍵技術研究[D];昆明理工大學;2008年

9 王清;基于Globish的規(guī)范子集英漢翻譯系統(tǒng)研究[D];上海師范大學;2008年

10 宋振秋;基于短語模板的機器翻譯研究[D];大連理工大學;2008年

本文編號：2353481

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2353481.html

上一篇：基于Prefuse和層次聚類的信息檢索主題知識圖譜研究
下一篇：鏈接分析中的數(shù)據(jù)采集技術研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于百度百科的查詢意圖分類