微博搜索的關(guān)鍵技術(shù)研究
[Abstract]:Weibo has quickly become an important source of real-time information. There are two important problems in its search: the correlation calculation between query words and Weibo messages, and the sorting of search results. Correlation calculation measures the similarity between messages and query words in terms of content and semantics. Search results organize messages in a concise and orderly manner to overcome redundancy and non-standard writing. The main sorting methods include classification, summary and so on. Taking Twitter as an example, this paper explores several important issues in Weibo search research tasks: correlation calculation, query result classification, summary and comparative topic summary. In order to solve the problem of correlation computation, two message ordering models are proposed, which are based on learning sorting and recursive neural network language model. Compared with the existing correlation sorting algorithm in Weibo search service, the former significantly improves the correlation of message sequences, while the latter shortens the gap in the calculation of semantic correlation and improves the coverage of query results. The model based on learning ranking systematically studies the role of text correlation features, Weibo writing features and authorship features of Weibo in the calculation of Weibo correlation. The ranking model based on recursive neural network language model introduces semantic similarity into message correlation calculation and calculates lexical semantic similarity between messages on word vector granularity. To solve the problem of search result classification, a cooperative classification model based on message association is proposed, and a topic classification system is defined for Weibo. Compared with the feature-based benchmark model, the accuracy and F value of the model are increased by 5.38% and 4.74%, respectively. The model applies two kinds of shared topic relationships between messages to three graph-based cooperative classification models, considering local features and category distribution from associated messages. At the same time, it classifies a batch of Weibo messages to reduce the effect of data sparsity. The precision and recall rate of the classifier are greatly improved, and the iterative classification algorithm using #hashtag relation is optimal. In order to solve the problem of search result summary, a time axis based autoenhancement model of associative interaction is proposed. Compared with the graph-based benchmark model, the average ROUGE-1 of this model is increased by 14%. Given the search results of the query words, the model divides the query words into several sub-topics according to the time order, and considers the content of the text, the author's social influence and the importance of calculating the message of the text quality. Weibo messages are sorted and extracted according to importance and diversity to generate abstracts. Experiments show that the author's social influence and text quality can effectively improve the measurement of text importance. Aiming at the problem of comparative topic summary, an optimal model of contrast topic summary based on message association relationship is proposed. Compared with the benchmark model based on content similarity calculation, the comparison attribute coverage and comparison message pair accuracy of the model are improved by 14.7% and 11.6%, respectively. The model makes full use of the similarity relationship between messages and three kinds of shared topic relationships, and uses web page sorting algorithm and SimRank method to maximize the internal comparison and topic representation of message pairs. Generalize and compare the common points and differences in the search results of query terms to generate a summary.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉志明;劉魯;;微博網(wǎng)絡(luò)輿情中的意見領(lǐng)袖識(shí)別及分析[J];系統(tǒng)工程;2011年06期
2 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計(jì)算機(jī)研究與發(fā)展;2011年10期
3 楊亮;林原;林鴻飛;;基于情感分布的微博熱點(diǎn)事件發(fā)現(xiàn)[J];中文信息學(xué)報(bào);2012年01期
4 張劍峰;夏云慶;姚建民;;微博文本處理研究綜述[J];中文信息學(xué)報(bào);2012年04期
5 文坤梅;徐帥;李瑞軒;辜希武;李玉華;;微博及中文微博信息處理研究綜述[J];中文信息學(xué)報(bào);2012年06期
6 彭澤環(huán);孫樂;韓先培;石貝;;基于排序?qū)W習(xí)的微博用戶推薦[J];中文信息學(xué)報(bào);2013年04期
7 李銳;王斌;;一種基于作者建模的微博檢索模型[J];中文信息學(xué)報(bào);2014年02期
8 何黎;何躍;霍葉青;;微博用戶特征分析和核心用戶挖掘[J];情報(bào)理論與實(shí)踐;2011年11期
9 平亮;宗利永;;基于社會(huì)網(wǎng)絡(luò)中心性分析的微博信息傳播研究——以Sina微博為例[J];圖書情報(bào)知識(shí);2010年06期
10 李軍;陳震;黃霽崴;;微博影響力評(píng)價(jià)研究[J];信息網(wǎng)絡(luò)安全;2012年03期
,本文編號(hào):2149383
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2149383.html