個(gè)性化智能搜索引擎中查詢擴(kuò)展技術(shù)研究
[Abstract]:With the continuous development of the Internet, the amount of information in the network is increasing day by day. In the face of the massive amount of information, people have higher and higher requirements on the recall, precision and personalization of the search engine. Query extension is the key technology in personalized intelligent search engine. It extends user query before searching user query and improves the recall rate and precision rate of search engine effectively. First, we extend the word meaning of the query keyword entered by the user. The synonym forest and (HowNet) knowledge base are used to calculate the similarity of words. The synonyms of the words with the largest similarity to the user query keywords are found, and the synonyms are extended to improve the recall and precision of the search engine. Secondly, we extend the semantic of the query questions entered by the user. The realization of this function consists of two parts. On the one hand, the extraction and extension of the key words of the question sentence, the redundancy of the question sentence, the Chinese word segmentation, the part of speech tagging, the deactivation of the word, and a series of operations, such as a series of operations, The key words or the set of keywords which contain the user's core semantics are extracted from the questions, and then the keywords obtained are extended. On the other hand, it extends the question by using the common words of question answer, constructs the question classification system, classifies the user query question, and at the same time makes use of the question answer corpus to count the words that often appear in each type of question answer. Generate the common vocabulary of the question answer, and then expand the common word of the question according to the category of the user's query question. Finally, we use these two words to expand the user query questions. Then, we analyze user browsing behavior, mining user interest. We collect the web sites in the IE favorites and user browsing history, read the corresponding web pages, extract the text of the web page, cut Chinese words, and generate a set of documents. Then the TF-IDF-based vector space model is used to generate the vector set corresponding to the document set, and then the vector set is clustered. Then the clustering results are analyzed and the user interest representative words are extracted. Finally, query extension and user interest extraction are applied to personalized intelligent search engine. First, the user query is expanded, then the expanded query is input into the search module as the retrieval content, and the results returned by the search module are sorted and displayed according to the degree of conformity with the user's interest.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 田久樂(lè);趙蔚;;基于同義詞詞林的詞語(yǔ)相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期
2 魏桂英,鄭玄軒;層次聚類方法的CURE算法研究[J];科技和產(chǎn)業(yè);2005年11期
3 龍樹(shù)全;趙正文;唐華;;中文分詞算法概述[J];電腦知識(shí)與技術(shù);2009年10期
4 程濤;施水才;王霞;呂學(xué)強(qiáng);;基于同義詞詞林的中文文本主題詞提取[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年02期
5 劉遠(yuǎn)超,王曉龍,劉秉權(quán),鐘彬彬;基于聚類分析策略的用戶偏好挖掘[J];計(jì)算機(jī)應(yīng)用研究;2005年12期
6 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴(kuò)展技術(shù)進(jìn)展與展望[J];計(jì)算機(jī)應(yīng)用與軟件;2007年11期
7 張立娜;楊之音;楊波;;第三代搜索引擎發(fā)展現(xiàn)狀研究[J];科技情報(bào)開(kāi)發(fā)與經(jīng)濟(jì);2011年34期
8 王林;搜索引擎的原理和發(fā)展[J];圖書(shū)館理論與實(shí)踐;2004年04期
9 張宇,劉挺,文勖;基于改進(jìn)貝葉斯模型的問(wèn)題分類[J];中文信息學(xué)報(bào);2005年02期
10 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期
本文編號(hào):2446966
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2446966.html