基于聯(lián)邦檢索思想的微博搜索研究
[Abstract]:With the arrival of Web2.0 era, more and more applications in the Internet, user participation in the network is gradually increasing, people in the network is also moving towards the social network. Weibo service is the most typical application in social network. It attracts more and more users because of its simple content and convenient distribution. With the increasing number of Weibo users, the content generated by users in Weibo platform also increases exponentially. However, the search for Weibo content still adopts the traditional centralized retrieval mode, which brings some problems. First, because of the large amount of Weibo data, direct search for all Weibo will be time-consuming and reduce the user's search experience. Secondly, there are too many topics in Weibo, so centralized retrieval may result in low accuracy. Centralized retrieval can only use one retrieval model, while federated retrieval can provide different retrieval models for different data sets, so it is more flexible. Federated retrieval is an important branch of information retrieval. It can search different data sets distributed and solve the problem of low efficiency and accuracy in centralized retrieval. Federated retrieval first determines the correlation between each data set and the query term, then sends the query term to the highly correlated data set for retrieval. Finally, the retrieval results are merged and returned to the user. Because the data sets of the query are relative related, the accuracy of search results is higher than that of centralized retrieval. At the same time, the problem that the data set is too large to be retrieved effectively is solved. Based on the advantages of federated retrieval, this paper proposes a Weibo search technology based on federated retrieval idea. This technique applies the idea of federated retrieval to the field of Weibo search and takes into account the particularity of Weibo text and integrates the authority factor of Weibo authors so as to make the calculation of document sorting score more accurate. Experimental results on real Weibo datasets show that the proposed method can improve the accuracy of Weibo search. The main work of this paper is as follows: (1) A Weibo search framework based on federated retrieval is developed. The research focus of this paper is to use federated retrieval technology to search for Weibo data. In order to solve this problem, a federated data set suitable for Weibo search is first established to generate the data set description of each dataset. Then, according to the established data set description, the matching score between the query term and each data set is calculated by using the dataset selection method. Sort the data set according to the correlation, select several data sets with high correlation; then send the query term to the selected data set for search; finally, merge the results returned from the different data sets to form a single search result list. And it is returned to the user. (2) A result merging algorithm combining the authority of Weibo authors is proposed. In this paper, considering the characteristics of Weibo, a method of merging the authorship of Weibo authors is proposed based on previous studies. The experimental results show that the proposed method can effectively improve the accuracy of search results compared with the previous results merging methods. (3) A Weibo search system based on federated retrieval idea is designed. On the basis of the first two chapters, a prototype system of Weibo search based on federated retrieval idea is designed and implemented. The system mainly includes three function modules: Weibo index building, general search and federated retrieval. Finally, this paper demonstrates the system.
【學(xué)位授予單位】:湖南科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 裴一蕾;薛萬欣;李丹丹;;基于TAM的搜索引擎用戶體驗與用戶忠誠關(guān)系的實證研究[J];情報科學(xué);2017年01期
2 王李冬;張慧熙;;基于HowNet的微博文本語義檢索研究[J];情報科學(xué);2016年09期
3 孫芯宇;吳江;蒲強;;基于穩(wěn)定性語義聚類的相關(guān)模型估計[J];計算機應(yīng)用;2016年05期
4 衛(wèi)冰潔;史亮;王斌;;一種融合聚類和時間信息的微博排序新方法[J];中文信息學(xué)報;2015年03期
5 衛(wèi)冰潔;王斌;張帥;李鵬;;微博檢索的研究進(jìn)展[J];中文信息學(xué)報;2015年02期
6 李銳;王斌;;一種基于作者建模的微博檢索模型[J];中文信息學(xué)報;2014年02期
7 邵康;張建偉;;基于BM25F模型的Web文本挖掘個性化推薦研究[J];情報理論與實踐;2013年11期
8 王千;王成;馮振元;葉金鳳;;K-means聚類算法研究綜述[J];電子設(shè)計工程;2012年07期
9 曹鵬;李靜遠(yuǎn);滿彤;劉悅;程學(xué)旗;;Twitter中近似重復(fù)消息的判定方法研究[J];中文信息學(xué)報;2011年01期
10 劉偉成;現(xiàn)代情報檢索模型理論比較與發(fā)展研究[J];圖書情報知識;2004年03期
相關(guān)碩士學(xué)位論文 前2條
1 陳志敏;聯(lián)邦檢索系統(tǒng)的關(guān)鍵技術(shù)研究與實現(xiàn)[D];華南理工大學(xué);2015年
2 李緒維;微博短文本檢索關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年
,本文編號:2158806
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2158806.html