一種在搜索日志中挖掘用戶搜索意圖并推薦相關(guān)搜索詞的方法
發(fā)布時(shí)間:2018-07-23 19:17
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,用戶需要面對(duì)的數(shù)據(jù)越來(lái)越多,要想從這海量的數(shù)據(jù)中有效地找到符合需求的數(shù)據(jù),當(dāng)前只能使用搜索引擎。然而實(shí)際上大多數(shù)用戶面對(duì)搜索引擎返回的成千上萬(wàn)的結(jié)果,往往無(wú)從下手,其中存在著大量與用戶搜索意圖不相關(guān)的干擾結(jié)果。另外,傳統(tǒng)搜索引擎的結(jié)果返回方式為一維線性列表,也降低了用戶的查詢效率。目前,對(duì)于提高用戶搜索效率的研究越來(lái)越受到重視,許多學(xué)者從搜索結(jié)果文檔或搜索日志入手提出各種各樣提高搜索效率的方法。 本文主要研究和探索如何基于現(xiàn)有搜索引擎資源提高用戶的搜索效率,實(shí)現(xiàn)一種能彌補(bǔ)現(xiàn)有系統(tǒng)不足的方法。該方法從搜索日志入手,對(duì)搜索日志信息進(jìn)行有效的處理和提取,得到相關(guān)的數(shù)據(jù)集。然后構(gòu)造種子搜索詞在數(shù)據(jù)集中提取滿足不同層面搜索意圖的候選詞語(yǔ),并提取有效特征數(shù)據(jù)進(jìn)行訓(xùn)練,得到一個(gè)二分類模型。對(duì)于用戶查詢?cè)~,先用分類模型得到用于推薦的相關(guān)搜索詞,再通過(guò)短文本相似度計(jì)算等方法合并相似文本。最后返回給用戶不同意圖的相關(guān)搜索詞以及結(jié)構(gòu)更加合理的搜索文檔。實(shí)驗(yàn)表明,該方法能夠提取出符合預(yù)期的相關(guān)搜索詞,進(jìn)而有效提升搜索效率。
[Abstract]:With the rapid development of the Internet, users have to face more and more data. If we want to find the data that meets the needs effectively, we can only use search engine. However, in fact, most users often have no way to deal with the tens of thousands of results returned by search engines, among which there are a large number of disturbing results that are irrelevant to the users' search intentions. In addition, the traditional search engine returns results in one dimensional linear list, which also reduces the query efficiency of users. At present, more and more attention has been paid to the research of improving the efficiency of user search. Many scholars have put forward various methods to improve the efficiency of search from the point of search result document or search log. This paper mainly studies and explores how to improve the search efficiency of users based on the existing search engine resources and realize a method that can make up the deficiency of the existing system. In this method, the search log information is processed and extracted effectively, and the relevant data sets are obtained. Then a seed search term is constructed to extract candidate words satisfying different levels of search intention in the data set, and the valid feature data are extracted for training, and a two-classification model is obtained. For user query words, the related search terms used for recommendation are obtained by classification model, and then similar text is merged by calculating the similarity of short text. Finally, it returns relevant search terms with different intentions and more reasonably structured search documents. The experimental results show that the proposed method can extract the relevant search terms in accordance with the expectation and improve the search efficiency effectively.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
[Abstract]:With the rapid development of the Internet, users have to face more and more data. If we want to find the data that meets the needs effectively, we can only use search engine. However, in fact, most users often have no way to deal with the tens of thousands of results returned by search engines, among which there are a large number of disturbing results that are irrelevant to the users' search intentions. In addition, the traditional search engine returns results in one dimensional linear list, which also reduces the query efficiency of users. At present, more and more attention has been paid to the research of improving the efficiency of user search. Many scholars have put forward various methods to improve the efficiency of search from the point of search result document or search log. This paper mainly studies and explores how to improve the search efficiency of users based on the existing search engine resources and realize a method that can make up the deficiency of the existing system. In this method, the search log information is processed and extracted effectively, and the relevant data sets are obtained. Then a seed search term is constructed to extract candidate words satisfying different levels of search intention in the data set, and the valid feature data are extracted for training, and a two-classification model is obtained. For user query words, the related search terms used for recommendation are obtained by classification model, and then similar text is merged by calculating the similarity of short text. Finally, it returns relevant search terms with different intentions and more reasonably structured search documents. The experimental results show that the proposed method can extract the relevant search terms in accordance with the expectation and improve the search efficiency effectively.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 顧益軍,樊孝忠,王建華,汪濤,黃維金;中文停用詞表的自動(dòng)選取[J];北京理工大學(xué)學(xué)報(bào);2005年04期
2 張磊;張代遠(yuǎn);;中文分詞算法解析[J];電腦知識(shí)與技術(shù);2009年01期
3 龍樹(shù)全;趙正文;唐華;;中文分詞算法概述[J];電腦知識(shí)與技術(shù);2009年10期
4 王成;劉亞峰;王新成;閆桂榮;;分類器的分類性能評(píng)價(jià)指標(biāo)[J];電子設(shè)計(jì)工程;2011年08期
5 王繼民,陳,
本文編號(hào):2140393
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2140393.html
最近更新
教材專著