基于相關(guān)反饋的微博搜索結(jié)果優(yōu)化
發(fā)布時(shí)間:2018-08-09 20:07
【摘要】:微博作為一種非常流行的社交方式,可以提供海量實(shí)時(shí)的文本信息,比如新聞事件,熱門(mén)評(píng)論,熱點(diǎn)話題等信息。微博搜索與傳統(tǒng)搜索引擎有很大不同,因其實(shí)時(shí)性和社交性特點(diǎn),用戶對(duì)微博搜索的需求越來(lái)越大,如何對(duì)搜索結(jié)果進(jìn)行優(yōu)化也成為研究重點(diǎn)。相關(guān)反饋技術(shù)作為查詢擴(kuò)展中提升性能的關(guān)鍵技術(shù),對(duì)結(jié)果優(yōu)化有著重要影響。本文針對(duì)微博語(yǔ)料集,主要研究相關(guān)反饋技術(shù),提出基于相關(guān)反饋的重排序算法,主要完成以下工作: 第一,本文提出了改進(jìn)相關(guān)模型的反饋算法,對(duì)傳統(tǒng)相關(guān)模型進(jìn)行改進(jìn),提升話題擴(kuò)展的檢索性能;設(shè)計(jì)了基于詞激活力的反饋算法,構(gòu)建詞網(wǎng),挖掘話題詞所激活的擴(kuò)展詞。 第二,本文針對(duì)微博特征和語(yǔ)料集特點(diǎn),創(chuàng)新性地將擴(kuò)展詞結(jié)果作為特征融入到排序?qū)W習(xí)模型中,而不是直接進(jìn)行二次檢索;并且單獨(dú)分析擴(kuò)展詞特征,URL內(nèi)容特征對(duì)排序結(jié)果的影響,并提出了融合多種特征的重排序算法,對(duì)搜索結(jié)果進(jìn)行優(yōu)化。在TREC2011-2013微博評(píng)測(cè)的Twitter語(yǔ)料集上進(jìn)行驗(yàn)證,實(shí)驗(yàn)證明該方法檢索指標(biāo)P@30, MAP等值均有大幅提高,最后設(shè)計(jì)并實(shí)現(xiàn)了基于相關(guān)反饋的微博搜索系統(tǒng)。
[Abstract]:As a very popular social way, Weibo can provide a large amount of real-time text information, such as news events, hot comments, hot topics and other information. Weibo search is very different from traditional search engine. Because of its real-time and social characteristics, users need more and more Weibo search, so how to optimize search results has become the focus of research. As a key technology to improve the performance of query extension, correlation feedback plays an important role in the optimization of results. In this paper, we mainly study the correlation feedback technology for Weibo corpus, and propose a reordering algorithm based on correlation feedback. The main work is as follows: first, this paper proposes a feedback algorithm to improve the correlation model. Improve the traditional related model to improve the retrieval performance of topic extension; design a feedback algorithm based on word activation to construct word network and mine the extended words activated by topic words. Secondly, according to the features of Weibo and corpus, we creatively incorporate the extended word results into the ranking learning model, instead of directly performing secondary retrieval. The influence of URL content features of extended words on the sorting results is analyzed separately, and a reordering algorithm is proposed to optimize the search results. It is verified on the Twitter corpus evaluated by TREC2011-2013 Weibo, and the experiment proves that the index of MAP and the index of MAP are improved greatly. Finally, a Weibo search system based on correlation feedback is designed and implemented.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類(lèi)號(hào)】:TP391.1;TP391.3
本文編號(hào):2175174
[Abstract]:As a very popular social way, Weibo can provide a large amount of real-time text information, such as news events, hot comments, hot topics and other information. Weibo search is very different from traditional search engine. Because of its real-time and social characteristics, users need more and more Weibo search, so how to optimize search results has become the focus of research. As a key technology to improve the performance of query extension, correlation feedback plays an important role in the optimization of results. In this paper, we mainly study the correlation feedback technology for Weibo corpus, and propose a reordering algorithm based on correlation feedback. The main work is as follows: first, this paper proposes a feedback algorithm to improve the correlation model. Improve the traditional related model to improve the retrieval performance of topic extension; design a feedback algorithm based on word activation to construct word network and mine the extended words activated by topic words. Secondly, according to the features of Weibo and corpus, we creatively incorporate the extended word results into the ranking learning model, instead of directly performing secondary retrieval. The influence of URL content features of extended words on the sorting results is analyzed separately, and a reordering algorithm is proposed to optimize the search results. It is verified on the Twitter corpus evaluated by TREC2011-2013 Weibo, and the experiment proves that the index of MAP and the index of MAP are improved greatly. Finally, a Weibo search system based on correlation feedback is designed and implemented.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類(lèi)號(hào)】:TP391.1;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 王莉軍;;基于Indri的檢索模型研究[J];電子設(shè)計(jì)工程;2012年24期
2 趙正文;康耀紅;;統(tǒng)計(jì)語(yǔ)言模型在信息檢索中的應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2006年36期
3 嚴(yán)華云;劉其平;肖良軍;;信息檢索中的相關(guān)反饋技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2009年01期
,本文編號(hào):2175174
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2175174.html
最近更新
教材專著