基于聯(lián)邦檢索思想的微博搜索研究

發(fā)布時間：2018-08-02 08:24

【摘要】：隨著Web2.0時代的到來,互聯(lián)網(wǎng)中的各類應(yīng)用越來越多,用戶在網(wǎng)絡(luò)中的參與度正在逐漸提高,人們所處的網(wǎng)絡(luò)也正在朝著社會化網(wǎng)絡(luò)邁進。微博服務(wù)就是社會化網(wǎng)絡(luò)中最為典型的一個應(yīng)用,它以內(nèi)容精簡、發(fā)布方便等特點吸引著越來越多的用戶。隨著微博用戶數(shù)量的不斷增加,用戶在微博平臺中生成的內(nèi)容也呈指數(shù)級形式增長。然而,針對微博內(nèi)容的搜索還是采用傳統(tǒng)的集中式檢索模式,帶來了一定的問題。首先,由于微博數(shù)據(jù)量龐大,直接搜索全部微博會比較耗時,降低用戶的搜索體驗;其次,微博主題太多,采用集中式檢索有可能造成準確率不高;最后,集中式檢索只能使用一種檢索模型,而聯(lián)邦檢索可以針對不同數(shù)據(jù)集提供不同的檢索模型,靈活性更強。聯(lián)邦檢索是信息檢索的一個重要分支研究領(lǐng)域,它可以分布式地搜索不同的數(shù)據(jù)集,解決了集中式檢索中效率、準確率均不高的問題。聯(lián)邦檢索首先會判斷每個數(shù)據(jù)集和查詢詞的相關(guān)性,然后將查詢詞送往相關(guān)性較大的數(shù)據(jù)集進行檢索,最后將檢索結(jié)果合并后返回給用戶。因為查詢的數(shù)據(jù)集都相對相關(guān),在搜索結(jié)果準確率方面比集中式檢索相對要高,同時,解決了數(shù)據(jù)集過于龐大,無法有效檢索的問題�；诼�(lián)邦檢索的優(yōu)勢所在,本文提出了一種基于聯(lián)邦檢索思想的微博搜索技術(shù)。該技術(shù)將聯(lián)邦檢索的思想應(yīng)用到微博搜索領(lǐng)域,同時考慮到微博文本的特殊性,融入微博作者的權(quán)威度因子,使文檔排序得分的計算更加精確。在真實微博數(shù)據(jù)集上的實驗結(jié)果表明,本文所提出的方法能提高微博搜索的準確率。本文主要做了以下幾個方面的工作:(1)開發(fā)基于聯(lián)邦檢索思想的微博搜索框架。本文的研究重點是針對微博數(shù)據(jù)采用聯(lián)邦檢索技術(shù)進行信息搜索。為此,首先建立適應(yīng)微博搜索的聯(lián)邦數(shù)據(jù)集,生成每個數(shù)據(jù)集的數(shù)據(jù)集描述;然后采用數(shù)據(jù)集選擇方法,根據(jù)已經(jīng)建立好的數(shù)據(jù)集描述,計算查詢詞和每個數(shù)據(jù)集的匹配得分,將數(shù)據(jù)集按照相關(guān)性進行排序,選擇若干相關(guān)性較大的數(shù)據(jù)集;接下來將查詢詞送往被選擇的數(shù)據(jù)集進行搜索;最后,合并不同數(shù)據(jù)集返回的結(jié)果,形成單一搜索結(jié)果列表,并返回給用戶。(2)提出一種融合微博作者權(quán)威的結(jié)果合并算法。本文考慮到微博的特點,在前人研究的基礎(chǔ)上,提出了一種融合微博作者權(quán)威的結(jié)果合并方法。實驗結(jié)果表明,與以往的結(jié)果合并方法相比,本文所提出的方法能有效提高搜索結(jié)果的準確率。(3)設(shè)計基于聯(lián)邦檢索思想的微博搜索系統(tǒng)。在前兩章的基礎(chǔ)上,設(shè)計實現(xiàn)了基于聯(lián)邦檢索思想的微博搜索原型系統(tǒng)。系統(tǒng)主要包括微博索引建立、普通搜索以及聯(lián)邦檢索三大功能模塊,最后本文對系統(tǒng)進行了演示。
[Abstract]:With the arrival of Web2.0 era, more and more applications in the Internet, user participation in the network is gradually increasing, people in the network is also moving towards the social network. Weibo service is the most typical application in social network. It attracts more and more users because of its simple content and convenient distribution. With the increasing number of Weibo users, the content generated by users in Weibo platform also increases exponentially. However, the search for Weibo content still adopts the traditional centralized retrieval mode, which brings some problems. First, because of the large amount of Weibo data, direct search for all Weibo will be time-consuming and reduce the user's search experience. Secondly, there are too many topics in Weibo, so centralized retrieval may result in low accuracy. Centralized retrieval can only use one retrieval model, while federated retrieval can provide different retrieval models for different data sets, so it is more flexible. Federated retrieval is an important branch of information retrieval. It can search different data sets distributed and solve the problem of low efficiency and accuracy in centralized retrieval. Federated retrieval first determines the correlation between each data set and the query term, then sends the query term to the highly correlated data set for retrieval. Finally, the retrieval results are merged and returned to the user. Because the data sets of the query are relative related, the accuracy of search results is higher than that of centralized retrieval. At the same time, the problem that the data set is too large to be retrieved effectively is solved. Based on the advantages of federated retrieval, this paper proposes a Weibo search technology based on federated retrieval idea. This technique applies the idea of federated retrieval to the field of Weibo search and takes into account the particularity of Weibo text and integrates the authority factor of Weibo authors so as to make the calculation of document sorting score more accurate. Experimental results on real Weibo datasets show that the proposed method can improve the accuracy of Weibo search. The main work of this paper is as follows: (1) A Weibo search framework based on federated retrieval is developed. The research focus of this paper is to use federated retrieval technology to search for Weibo data. In order to solve this problem, a federated data set suitable for Weibo search is first established to generate the data set description of each dataset. Then, according to the established data set description, the matching score between the query term and each data set is calculated by using the dataset selection method. Sort the data set according to the correlation, select several data sets with high correlation; then send the query term to the selected data set for search; finally, merge the results returned from the different data sets to form a single search result list. And it is returned to the user. (2) A result merging algorithm combining the authority of Weibo authors is proposed. In this paper, considering the characteristics of Weibo, a method of merging the authorship of Weibo authors is proposed based on previous studies. The experimental results show that the proposed method can effectively improve the accuracy of search results compared with the previous results merging methods. (3) A Weibo search system based on federated retrieval idea is designed. On the basis of the first two chapters, a prototype system of Weibo search based on federated retrieval idea is designed and implemented. The system mainly includes three function modules: Weibo index building, general search and federated retrieval. Finally, this paper demonstrates the system.
【學(xué)位授予單位】：湖南科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.3;TP393.092

【參考文獻】

相關(guān)期刊論文前10條

1 裴一蕾;薛萬欣;李丹丹;;基于TAM的搜索引擎用戶體驗與用戶忠誠關(guān)系的實證研究[J];情報科學(xué);2017年01期

2 王李冬;張慧熙;;基于HowNet的微博文本語義檢索研究[J];情報科學(xué);2016年09期

3 孫芯宇;吳江;蒲強;;基于穩(wěn)定性語義聚類的相關(guān)模型估計[J];計算機應(yīng)用;2016年05期

4 衛(wèi)冰潔;史亮;王斌;;一種融合聚類和時間信息的微博排序新方法[J];中文信息學(xué)報;2015年03期

5 衛(wèi)冰潔;王斌;張帥;李鵬;;微博檢索的研究進展[J];中文信息學(xué)報;2015年02期

6 李銳;王斌;;一種基于作者建模的微博檢索模型[J];中文信息學(xué)報;2014年02期

7 邵康;張建偉;;基于BM25F模型的Web文本挖掘個性化推薦研究[J];情報理論與實踐;2013年11期

8 王千;王成;馮振元;葉金鳳;;K-means聚類算法研究綜述[J];電子設(shè)計工程;2012年07期

9 曹鵬;李靜遠;滿彤;劉悅;程學(xué)旗;;Twitter中近似重復(fù)消息的判定方法研究[J];中文信息學(xué)報;2011年01期

10 劉偉成;現(xiàn)代情報檢索模型理論比較與發(fā)展研究[J];圖書情報知識;2004年03期

相關(guān)碩士學(xué)位論文前2條

1 陳志敏;聯(lián)邦檢索系統(tǒng)的關(guān)鍵技術(shù)研究與實現(xiàn)[D];華南理工大學(xué);2015年

2 李緒維;微博短文本檢索關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年

，

本文編號：2158806

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2158806.html

上一篇：電信運營商開放式IaaS云平臺研究
下一篇：信息中心網(wǎng)絡(luò)中內(nèi)容處理相關(guān)技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于聯(lián)邦檢索思想的微博搜索研究