個(gè)性化智能搜索引擎中查詢擴(kuò)展技術(shù)研究

發(fā)布時(shí)間：2019-03-25 12:09

【摘要】：隨著互聯(lián)網(wǎng)的不斷發(fā)展，網(wǎng)絡(luò)信息量日益增加，面對海量的信息，人們對搜索引擎在查全率，查準(zhǔn)率以及個(gè)性化方面的要求越來越高。查詢擴(kuò)展是個(gè)性化智能搜索引擎中的關(guān)鍵技術(shù)，它在搜索引擎檢索用戶查詢前對用戶查詢進(jìn)行擴(kuò)展，有效地提高了搜索引擎的查全率和查準(zhǔn)率。首先，我們對用戶輸入的查詢關(guān)鍵詞進(jìn)行詞義上的擴(kuò)展。利用同義詞詞林和知網(wǎng)（HowNet）知識庫進(jìn)行詞語相似度計(jì)算，找到與用戶查詢關(guān)鍵詞相似度最大的詞語進(jìn)行關(guān)鍵詞的同義詞、近義詞擴(kuò)展，提高搜索引擎的查全率和查準(zhǔn)率。其次，我們對用戶輸入的查詢問句進(jìn)行語義上的擴(kuò)展。這一功能的實(shí)現(xiàn)由兩部分組成，一方面進(jìn)行問句關(guān)鍵詞的提取和擴(kuò)展，對問句進(jìn)行去冗余，中文分詞，詞性標(biāo)注，去停用詞等一系列操作，提取出問句中包含用戶核心語義的關(guān)鍵詞或關(guān)鍵詞集合，然后對取得的關(guān)鍵詞進(jìn)行關(guān)鍵詞擴(kuò)展；另一方面利用問句答案常用詞對問句進(jìn)行擴(kuò)展，構(gòu)建問句分類體系，對用戶查詢問句進(jìn)行分類，同時(shí)利用問句答案語料庫，統(tǒng)計(jì)每種類型的問句答案中常會(huì)出現(xiàn)的詞，生成問句答案常用詞詞表，然后根據(jù)用戶查詢問句所屬類別對問句進(jìn)行答案常用詞擴(kuò)展；最終利用這兩方面得到詞語對用戶查詢問句進(jìn)行擴(kuò)展。然后，我們對用戶瀏覽行為進(jìn)行分析，挖掘用戶興趣。我們收集用戶IE收藏夾中的網(wǎng)址和用戶瀏覽歷史記錄，讀取相應(yīng)網(wǎng)頁，提取網(wǎng)頁正文，進(jìn)行中文切詞，生成文檔集，然后使用基于TF-IDF的向量空間模型生成文檔集對應(yīng)的向量集，對向量集進(jìn)行聚類，然后對聚類結(jié)果進(jìn)行分析，提取用戶興趣代表詞。最后，，將查詢擴(kuò)展以及用戶興趣提取應(yīng)用于個(gè)性化智能搜索引擎之中。首先對用戶查詢進(jìn)行查詢擴(kuò)展，然后將擴(kuò)展后的查詢作為檢索內(nèi)容輸入到搜索引擎的檢索模塊，并對檢索模塊返回的結(jié)果按照與用戶興趣的相符程度進(jìn)行排序展示。
[Abstract]:With the continuous development of the Internet, the amount of information in the network is increasing day by day. In the face of the massive amount of information, people have higher and higher requirements on the recall, precision and personalization of the search engine. Query extension is the key technology in personalized intelligent search engine. It extends user query before searching user query and improves the recall rate and precision rate of search engine effectively. First, we extend the word meaning of the query keyword entered by the user. The synonym forest and (HowNet) knowledge base are used to calculate the similarity of words. The synonyms of the words with the largest similarity to the user query keywords are found, and the synonyms are extended to improve the recall and precision of the search engine. Secondly, we extend the semantic of the query questions entered by the user. The realization of this function consists of two parts. On the one hand, the extraction and extension of the key words of the question sentence, the redundancy of the question sentence, the Chinese word segmentation, the part of speech tagging, the deactivation of the word, and a series of operations, such as a series of operations, The key words or the set of keywords which contain the user's core semantics are extracted from the questions, and then the keywords obtained are extended. On the other hand, it extends the question by using the common words of question answer, constructs the question classification system, classifies the user query question, and at the same time makes use of the question answer corpus to count the words that often appear in each type of question answer. Generate the common vocabulary of the question answer, and then expand the common word of the question according to the category of the user's query question. Finally, we use these two words to expand the user query questions. Then, we analyze user browsing behavior, mining user interest. We collect the web sites in the IE favorites and user browsing history, read the corresponding web pages, extract the text of the web page, cut Chinese words, and generate a set of documents. Then the TF-IDF-based vector space model is used to generate the vector set corresponding to the document set, and then the vector set is clustered. Then the clustering results are analyzed and the user interest representative words are extracted. Finally, query extension and user interest extraction are applied to personalized intelligent search engine. First, the user query is expanded, then the expanded query is input into the search module as the retrieval content, and the results returned by the search module are sorted and displayed according to the degree of conformity with the user's interest.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 田久樂;趙蔚;;基于同義詞詞林的詞語相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期

2 魏桂英,鄭玄軒;層次聚類方法的CURE算法研究[J];科技和產(chǎn)業(yè);2005年11期

3 龍樹全;趙正文;唐華;;中文分詞算法概述[J];電腦知識與技術(shù);2009年10期

4 程濤;施水才;王霞;呂學(xué)強(qiáng);;基于同義詞詞林的中文文本主題詞提取[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年02期

5 劉遠(yuǎn)超,王曉龍,劉秉權(quán),鐘彬彬;基于聚類分析策略的用戶偏好挖掘[J];計(jì)算機(jī)應(yīng)用研究;2005年12期

6 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴(kuò)展技術(shù)進(jìn)展與展望[J];計(jì)算機(jī)應(yīng)用與軟件;2007年11期

7 張立娜;楊之音;楊波;;第三代搜索引擎發(fā)展現(xiàn)狀研究[J];科技情報(bào)開發(fā)與經(jīng)濟(jì);2011年34期

8 王林;搜索引擎的原理和發(fā)展[J];圖書館理論與實(shí)踐;2004年04期

9 張宇,劉挺,文勖;基于改進(jìn)貝葉斯模型的問題分類[J];中文信息學(xué)報(bào);2005年02期

10 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期

本文編號：2446966

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2446966.html

上一篇：元搜索引擎原理在實(shí)現(xiàn)分布式虛擬聯(lián)合目錄中的應(yīng)用研究
下一篇：淺談網(wǎng)絡(luò)搜索引擎在期刊工作中的作用

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

個(gè)性化智能搜索引擎中查詢擴(kuò)展技術(shù)研究