天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

個(gè)性化智能搜索引擎中查詢擴(kuò)展技術(shù)研究

發(fā)布時(shí)間:2019-03-25 12:09
【摘要】:隨著互聯(lián)網(wǎng)的不斷發(fā)展,網(wǎng)絡(luò)信息量日益增加,面對(duì)海量的信息,人們對(duì)搜索引擎在查全率,查準(zhǔn)率以及個(gè)性化方面的要求越來(lái)越高。查詢擴(kuò)展是個(gè)性化智能搜索引擎中的關(guān)鍵技術(shù),它在搜索引擎檢索用戶查詢前對(duì)用戶查詢進(jìn)行擴(kuò)展,有效地提高了搜索引擎的查全率和查準(zhǔn)率。 首先,我們對(duì)用戶輸入的查詢關(guān)鍵詞進(jìn)行詞義上的擴(kuò)展。利用同義詞詞林和知網(wǎng)(HowNet)知識(shí)庫(kù)進(jìn)行詞語(yǔ)相似度計(jì)算,找到與用戶查詢關(guān)鍵詞相似度最大的詞語(yǔ)進(jìn)行關(guān)鍵詞的同義詞、近義詞擴(kuò)展,提高搜索引擎的查全率和查準(zhǔn)率。 其次,我們對(duì)用戶輸入的查詢問(wèn)句進(jìn)行語(yǔ)義上的擴(kuò)展。這一功能的實(shí)現(xiàn)由兩部分組成,一方面進(jìn)行問(wèn)句關(guān)鍵詞的提取和擴(kuò)展,對(duì)問(wèn)句進(jìn)行去冗余,中文分詞,詞性標(biāo)注,去停用詞等一系列操作,提取出問(wèn)句中包含用戶核心語(yǔ)義的關(guān)鍵詞或關(guān)鍵詞集合,然后對(duì)取得的關(guān)鍵詞進(jìn)行關(guān)鍵詞擴(kuò)展;另一方面利用問(wèn)句答案常用詞對(duì)問(wèn)句進(jìn)行擴(kuò)展,構(gòu)建問(wèn)句分類體系,對(duì)用戶查詢問(wèn)句進(jìn)行分類,同時(shí)利用問(wèn)句答案語(yǔ)料庫(kù),統(tǒng)計(jì)每種類型的問(wèn)句答案中常會(huì)出現(xiàn)的詞,生成問(wèn)句答案常用詞詞表,然后根據(jù)用戶查詢問(wèn)句所屬類別對(duì)問(wèn)句進(jìn)行答案常用詞擴(kuò)展;最終利用這兩方面得到詞語(yǔ)對(duì)用戶查詢問(wèn)句進(jìn)行擴(kuò)展。 然后,我們對(duì)用戶瀏覽行為進(jìn)行分析,挖掘用戶興趣。我們收集用戶IE收藏夾中的網(wǎng)址和用戶瀏覽歷史記錄,讀取相應(yīng)網(wǎng)頁(yè),提取網(wǎng)頁(yè)正文,進(jìn)行中文切詞,生成文檔集,然后使用基于TF-IDF的向量空間模型生成文檔集對(duì)應(yīng)的向量集,對(duì)向量集進(jìn)行聚類,然后對(duì)聚類結(jié)果進(jìn)行分析,提取用戶興趣代表詞。 最后,,將查詢擴(kuò)展以及用戶興趣提取應(yīng)用于個(gè)性化智能搜索引擎之中。首先對(duì)用戶查詢進(jìn)行查詢擴(kuò)展,然后將擴(kuò)展后的查詢作為檢索內(nèi)容輸入到搜索引擎的檢索模塊,并對(duì)檢索模塊返回的結(jié)果按照與用戶興趣的相符程度進(jìn)行排序展示。
[Abstract]:With the continuous development of the Internet, the amount of information in the network is increasing day by day. In the face of the massive amount of information, people have higher and higher requirements on the recall, precision and personalization of the search engine. Query extension is the key technology in personalized intelligent search engine. It extends user query before searching user query and improves the recall rate and precision rate of search engine effectively. First, we extend the word meaning of the query keyword entered by the user. The synonym forest and (HowNet) knowledge base are used to calculate the similarity of words. The synonyms of the words with the largest similarity to the user query keywords are found, and the synonyms are extended to improve the recall and precision of the search engine. Secondly, we extend the semantic of the query questions entered by the user. The realization of this function consists of two parts. On the one hand, the extraction and extension of the key words of the question sentence, the redundancy of the question sentence, the Chinese word segmentation, the part of speech tagging, the deactivation of the word, and a series of operations, such as a series of operations, The key words or the set of keywords which contain the user's core semantics are extracted from the questions, and then the keywords obtained are extended. On the other hand, it extends the question by using the common words of question answer, constructs the question classification system, classifies the user query question, and at the same time makes use of the question answer corpus to count the words that often appear in each type of question answer. Generate the common vocabulary of the question answer, and then expand the common word of the question according to the category of the user's query question. Finally, we use these two words to expand the user query questions. Then, we analyze user browsing behavior, mining user interest. We collect the web sites in the IE favorites and user browsing history, read the corresponding web pages, extract the text of the web page, cut Chinese words, and generate a set of documents. Then the TF-IDF-based vector space model is used to generate the vector set corresponding to the document set, and then the vector set is clustered. Then the clustering results are analyzed and the user interest representative words are extracted. Finally, query extension and user interest extraction are applied to personalized intelligent search engine. First, the user query is expanded, then the expanded query is input into the search module as the retrieval content, and the results returned by the search module are sorted and displayed according to the degree of conformity with the user's interest.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 田久樂(lè);趙蔚;;基于同義詞詞林的詞語(yǔ)相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期

2 魏桂英,鄭玄軒;層次聚類方法的CURE算法研究[J];科技和產(chǎn)業(yè);2005年11期

3 龍樹(shù)全;趙正文;唐華;;中文分詞算法概述[J];電腦知識(shí)與技術(shù);2009年10期

4 程濤;施水才;王霞;呂學(xué)強(qiáng);;基于同義詞詞林的中文文本主題詞提取[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年02期

5 劉遠(yuǎn)超,王曉龍,劉秉權(quán),鐘彬彬;基于聚類分析策略的用戶偏好挖掘[J];計(jì)算機(jī)應(yīng)用研究;2005年12期

6 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴(kuò)展技術(shù)進(jìn)展與展望[J];計(jì)算機(jī)應(yīng)用與軟件;2007年11期

7 張立娜;楊之音;楊波;;第三代搜索引擎發(fā)展現(xiàn)狀研究[J];科技情報(bào)開(kāi)發(fā)與經(jīng)濟(jì);2011年34期

8 王林;搜索引擎的原理和發(fā)展[J];圖書(shū)館理論與實(shí)踐;2004年04期

9 張宇,劉挺,文勖;基于改進(jìn)貝葉斯模型的問(wèn)題分類[J];中文信息學(xué)報(bào);2005年02期

10 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期



本文編號(hào):2446966

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2446966.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶44e2e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
久热在线视频这里只有精品| 亚洲男人的天堂色偷偷| 亚洲精品高清国产一线久久| 亚洲深夜精品福利一区| 欧美一区二区口爆吞精| 国内自拍偷拍福利视频| 爱在午夜降临前在线观看| 亚洲男女性生活免费视频| 91久久精品中文内射| 精品al亚洲麻豆一区| 九九热这里只有精品哦| 99久久国产综合精品二区| 日本少妇中文字幕不卡视频 | 午夜久久精品福利视频| 美女黄片大全在线观看| 正在播放玩弄漂亮少妇高潮 | 日韩中文字幕人妻精品| 久久一区内射污污内射亚洲| 国产精品刮毛视频不卡| 熟女免费视频一区二区| 欧美做爰猛烈叫床大尺度| 国产日韩欧美在线播放| 美女露小粉嫩91精品久久久| 中文日韩精品视频在线| 国产麻豆视频一二三区| 亚洲天堂国产精品久久精品| 亚洲黄香蕉视频免费看| 在线欧洲免费无线码二区免费| 日本成人中文字幕一区| 91插插插外国一区二区| 国产视频一区二区三区四区| 九九九热在线免费视频| 亚洲欧美国产精品一区二区| 99国产高清不卡视频| 久久99亚洲小姐精品综合| 国自产拍偷拍福利精品图片| 欧美人妻免费一区二区三区| 91麻豆精品欧美一区| 久久热这里只有精品视频| 亚洲男人的天堂色偷偷| 99久久成人精品国产免费|