天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于查詢?nèi)罩镜牟樵償U(kuò)展研究

發(fā)布時(shí)間:2018-06-17 22:56

  本文選題:查詢擴(kuò)展查 + 查詢?nèi)罩痉治?/strong>。 參考:《北京郵電大學(xué)》2013年碩士論文


【摘要】:如今互聯(lián)網(wǎng)已經(jīng)步入人們生活的每一個(gè)角落。互聯(lián)網(wǎng)上的信息量越來(lái)越大,增長(zhǎng)速度也越來(lái)越快。在互聯(lián)網(wǎng)的海量信息面前,如何從中獲取人們所需要的信息已經(jīng)成為信息檢索領(lǐng)域的熱點(diǎn)。目前,主流的搜索引擎的查詢方式仍然是基于關(guān)鍵字匹配。面對(duì)海量信息,僅僅基于關(guān)鍵字匹配的查詢方法很難給出用戶滿意的查詢結(jié)果,因此查詢擴(kuò)展技術(shù)應(yīng)運(yùn)而生。目前,查詢擴(kuò)展已經(jīng)有了一定的發(fā)展。本文在分析以往算法不足的基礎(chǔ)上,將眾包思想與用戶查詢?nèi)罩鞠嘟Y(jié)合,提出了基于眾包思想的查詢擴(kuò)展算法。實(shí)驗(yàn)表明,新算法對(duì)查詢效果有明顯的改善。論文的主要工作如下: 首先,本文介紹了查詢擴(kuò)展的研究背景、查詢擴(kuò)展的發(fā)展概況并簡(jiǎn)單描述了本文的研究和工作內(nèi)容。其次,本文介紹了信息檢索與查詢擴(kuò)展相關(guān)理論,并且詳細(xì)研究了目前主流的查詢擴(kuò)展算法并分析其優(yōu)缺點(diǎn)。再次,本文還簡(jiǎn)要介紹了眾包思想及其實(shí)現(xiàn)算法——“最大期望算法(Expectation Maximization,EM)"的原理,并對(duì)其進(jìn)行改進(jìn),為眾包思想與用戶查詢?nèi)罩镜慕Y(jié)合提供準(zhǔn)備。 本文對(duì)用戶查詢?nèi)罩具M(jìn)行了詳盡的統(tǒng)計(jì)分析,主要包括用戶查詢?cè)~特征分析、查詢過(guò)程中的會(huì)話特征分析和用戶點(diǎn)擊分析。這些分析既是查詢擴(kuò)展的原因,也是查詢擴(kuò)展的基礎(chǔ)。 本文利用搜狗公司提供的數(shù)據(jù)集,對(duì)其進(jìn)行了一些預(yù)處理后利用Indri搜索引擎建立起了一個(gè)與用戶查詢?nèi)罩鞠嗥ヅ涞暮?jiǎn)易搜索引擎平臺(tái),用于進(jìn)行實(shí)驗(yàn)。 本文提出了基于眾包的查詢擴(kuò)展算法。將眾包思想引入查詢擴(kuò)展,根據(jù)用戶查詢?nèi)罩镜奶攸c(diǎn),將用戶的查詢過(guò)程轉(zhuǎn)化為一個(gè)眾包過(guò)程。隨后,本文利用改進(jìn)的EM算法對(duì)相關(guān)文檔進(jìn)行重排序,并在重排序后的文檔中篩選擴(kuò)展詞。本文在自建的搜索平臺(tái)中進(jìn)行了實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明,本文提出的算法與一些傳統(tǒng)的查詢擴(kuò)展算法相比在P@20的評(píng)價(jià)標(biāo)準(zhǔn)上對(duì)查詢效果有明顯的改善。
[Abstract]:The Internet has entered every corner of people ' s life . The amount of information on the Internet is getting more and more rapid . How to get the information needed by people has become a hot spot in the field of information retrieval . At present , the query method of the mainstream search engine is still based on the key word matching . At present , the query extension technology is developed . At present , the query extension technology based on the keyword matching is very difficult to give the query result of the user ' s satisfaction . The experiment shows that the new algorithm has obvious improvement on the query effect . The main work of this paper is as follows :

Firstly , this paper introduces the research background of query extension , the development of query extension and the brief description of the research and work contents of this paper . Secondly , this paper introduces the theory of information retrieval and query extension , and studies the current mainstream query expansion algorithm and analyzes its advantages and disadvantages . Thirdly , this paper also briefly introduces the principle of the idea of crowdsourcing and its realization algorithm _ " Maximum expectation algorithm ( EM ) " , and provides the preparation for the combination of crowdsourcing ideas and user query logs . This paper makes a detailed statistical analysis of the user ' s query log , including the characteristic analysis of the user ' s query words , the conversation feature analysis in the query process and the user ' s click analysis . These analyses are both the cause of query expansion and the foundation of query extension . In this paper , a simple search engine platform matched with the user ' s query log is established by using the data set provided by the search dog company , and the experiment is carried out by using the Indri search engine . In this paper , a query extension algorithm based on crowdsourcing is proposed . The idea of crowdsourcing is introduced into query extension . The query process of users is transformed into a crowdsourcing process based on the characteristics of user query logs . The paper makes use of the improved EM algorithm to reorder the relevant documents , and filters the expanded words in the documents after reordering . The results show that the proposed algorithm has obvious improvement on the query results compared with some traditional query expansion algorithms .
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 熊忠陽(yáng);向海燕;張玉芳;;結(jié)合用戶日志的局部上下文分析方法[J];計(jì)算機(jī)工程與應(yīng)用;2012年12期

2 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴(kuò)展技術(shù)進(jìn)展與展望[J];計(jì)算機(jī)應(yīng)用與軟件;2007年11期

3 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期

,

本文編號(hào):2032807


本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2032807.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶5e359***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com