天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

微博搜索的關(guān)鍵技術(shù)研究

發(fā)布時(shí)間:2018-07-28 07:39
【摘要】:微博迅速成為一種重要的實(shí)時(shí)信息源,其搜索存在兩個(gè)重要問題:查詢?cè)~與微博消息的相關(guān)性計(jì)算、搜索結(jié)果整理。相關(guān)性計(jì)算從內(nèi)容、語義上衡量消息與查詢?cè)~之間的相似程度;搜索結(jié)果整理以簡(jiǎn)明有序的方式組織消息,克服冗余性和不規(guī)范書寫,主要整理方式包括分類、摘要等。本文以推特為例,探索了微博搜索研究任務(wù)中幾個(gè)重要問題:相關(guān)性計(jì)算、查詢結(jié)果分類、摘要和對(duì)比話題摘要。 針對(duì)相關(guān)性計(jì)算問題,提出兩種消息排序模型,分別基于學(xué)習(xí)排序和遞歸神經(jīng)網(wǎng)絡(luò)語言模型。與目前微博搜索服務(wù)中的相關(guān)性排序算法比較,前者顯著提高了消息序列的相關(guān)性,后者縮短消息在計(jì)算語義相關(guān)度時(shí)的鴻溝,提高查詢結(jié)果的覆蓋率;趯W(xué)習(xí)排序的模型系統(tǒng)研究了文本相關(guān)性特征、微博書寫特征和微博的作者權(quán)威度特征在微博相關(guān)性計(jì)算中的作用;谶f歸神經(jīng)網(wǎng)絡(luò)語言模型的排序模型將語義相似度引入消息的相關(guān)性計(jì)算中,在詞向量粒度上計(jì)算消息之間的詞法語義相似度。 針對(duì)搜索結(jié)果分類問題,提出基于消息關(guān)聯(lián)關(guān)系的協(xié)同分類模型,為微博定義了一個(gè)話題分類體系。與基于特征的基準(zhǔn)模型相比,該模型的準(zhǔn)確率和F值分別提高了5.38%和4.74%。該模型將消息之間的兩種共享話題關(guān)系應(yīng)用到三種基于圖的協(xié)同分類模型中,考慮本地特征和來自關(guān)聯(lián)消息的類別分布,同時(shí)為一批微博消息分類,降低數(shù)據(jù)稀疏的影響,極大地提高了分類器的精確率和召回率,其中采用共享話題標(biāo)簽(#hashtag)關(guān)系的迭代分類算法結(jié)果最優(yōu)。 針對(duì)搜索結(jié)果摘要問題,提出基于時(shí)間軸的關(guān)聯(lián)交互自增強(qiáng)式摘要模型。與基于圖的基準(zhǔn)模型比較,該模型的ROUGE-1平均提高了14%。給定查詢?cè)~的搜索結(jié)果,該模型按照時(shí)間順序?qū)⑵鋭澐殖扇舾勺釉掝},同時(shí)考慮文本內(nèi)容、作者社會(huì)影響力和文本質(zhì)量計(jì)算消息的重要度,根據(jù)重要度和多樣性對(duì)微博消息進(jìn)行排序和抽取以生成摘要。實(shí)驗(yàn)表明,作者的社會(huì)影響力和文本質(zhì)量有效地改進(jìn)了文本重要度的度量。 針對(duì)對(duì)比話題摘要問題,提出基于消息關(guān)聯(lián)關(guān)系的最優(yōu)化對(duì)比話題摘要模型。與基于內(nèi)容相似度計(jì)算的基準(zhǔn)模型比較,該模型的對(duì)比屬性覆蓋率和比較消息對(duì)準(zhǔn)確率分別提高了14.7%和11.6%。該模型充分利用消息之間的相似度關(guān)系和三種共享話題關(guān)系,采用網(wǎng)頁排序算法和SimRank方法,最大化消息對(duì)的內(nèi)部對(duì)比性和話題代表性,概括對(duì)比查詢?cè)~搜索結(jié)果中的共同點(diǎn)和不同點(diǎn)生成摘要。
[Abstract]:Weibo has quickly become an important source of real-time information. There are two important problems in its search: the correlation calculation between query words and Weibo messages, and the sorting of search results. Correlation calculation measures the similarity between messages and query words in terms of content and semantics. Search results organize messages in a concise and orderly manner to overcome redundancy and non-standard writing. The main sorting methods include classification, summary and so on. Taking Twitter as an example, this paper explores several important issues in Weibo search research tasks: correlation calculation, query result classification, summary and comparative topic summary. In order to solve the problem of correlation computation, two message ordering models are proposed, which are based on learning sorting and recursive neural network language model. Compared with the existing correlation sorting algorithm in Weibo search service, the former significantly improves the correlation of message sequences, while the latter shortens the gap in the calculation of semantic correlation and improves the coverage of query results. The model based on learning ranking systematically studies the role of text correlation features, Weibo writing features and authorship features of Weibo in the calculation of Weibo correlation. The ranking model based on recursive neural network language model introduces semantic similarity into message correlation calculation and calculates lexical semantic similarity between messages on word vector granularity. To solve the problem of search result classification, a cooperative classification model based on message association is proposed, and a topic classification system is defined for Weibo. Compared with the feature-based benchmark model, the accuracy and F value of the model are increased by 5.38% and 4.74%, respectively. The model applies two kinds of shared topic relationships between messages to three graph-based cooperative classification models, considering local features and category distribution from associated messages. At the same time, it classifies a batch of Weibo messages to reduce the effect of data sparsity. The precision and recall rate of the classifier are greatly improved, and the iterative classification algorithm using #hashtag relation is optimal. In order to solve the problem of search result summary, a time axis based autoenhancement model of associative interaction is proposed. Compared with the graph-based benchmark model, the average ROUGE-1 of this model is increased by 14%. Given the search results of the query words, the model divides the query words into several sub-topics according to the time order, and considers the content of the text, the author's social influence and the importance of calculating the message of the text quality. Weibo messages are sorted and extracted according to importance and diversity to generate abstracts. Experiments show that the author's social influence and text quality can effectively improve the measurement of text importance. Aiming at the problem of comparative topic summary, an optimal model of contrast topic summary based on message association relationship is proposed. Compared with the benchmark model based on content similarity calculation, the comparison attribute coverage and comparison message pair accuracy of the model are improved by 14.7% and 11.6%, respectively. The model makes full use of the similarity relationship between messages and three kinds of shared topic relationships, and uses web page sorting algorithm and SimRank method to maximize the internal comparison and topic representation of message pairs. Generalize and compare the common points and differences in the search results of query terms to generate a summary.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉志明;劉魯;;微博網(wǎng)絡(luò)輿情中的意見領(lǐng)袖識(shí)別及分析[J];系統(tǒng)工程;2011年06期

2 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計(jì)算機(jī)研究與發(fā)展;2011年10期

3 楊亮;林原;林鴻飛;;基于情感分布的微博熱點(diǎn)事件發(fā)現(xiàn)[J];中文信息學(xué)報(bào);2012年01期

4 張劍峰;夏云慶;姚建民;;微博文本處理研究綜述[J];中文信息學(xué)報(bào);2012年04期

5 文坤梅;徐帥;李瑞軒;辜希武;李玉華;;微博及中文微博信息處理研究綜述[J];中文信息學(xué)報(bào);2012年06期

6 彭澤環(huán);孫樂;韓先培;石貝;;基于排序?qū)W習(xí)的微博用戶推薦[J];中文信息學(xué)報(bào);2013年04期

7 李銳;王斌;;一種基于作者建模的微博檢索模型[J];中文信息學(xué)報(bào);2014年02期

8 何黎;何躍;霍葉青;;微博用戶特征分析和核心用戶挖掘[J];情報(bào)理論與實(shí)踐;2011年11期

9 平亮;宗利永;;基于社會(huì)網(wǎng)絡(luò)中心性分析的微博信息傳播研究——以Sina微博為例[J];圖書情報(bào)知識(shí);2010年06期

10 李軍;陳震;黃霽崴;;微博影響力評(píng)價(jià)研究[J];信息網(wǎng)絡(luò)安全;2012年03期

,

本文編號(hào):2149383

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2149383.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶83a99***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产精品久久久久久久久久久痴汉 | 91亚洲精品综合久久| 午夜国产精品福利在线观看 | 国产精品欧美一级免费| 天堂热东京热男人天堂| 国产又粗又猛又大爽又黄| 欧洲精品一区二区三区四区| 亚洲av在线视频一区| 黑鬼糟蹋少妇资源在线观看| 日本欧美三级中文字幕| 不卡一区二区高清视频| 色婷婷在线视频免费播放| 日韩在线免费看中文字幕| 一级片黄色一区二区三区| 91国自产精品中文字幕亚洲| 日韩欧美高清国内精品| 麻豆视传媒短视频免费观看| 少妇特黄av一区二区三区| 日韩精品小视频在线观看| 亚洲一区二区三区有码| 国产男女激情在线视频| 中文字幕一区二区熟女| 在线观看视频日韩成人| 日本少妇三级三级三级| 中文字幕精品人妻一区| 日本欧美一区二区三区就| 欧美人与动牲交a精品| 成人午夜视频精品一区| 久草国产精品一区二区| 欧美日韩亚洲巨色人妻| 成人国产激情在线视频| 厕所偷拍一区二区三区视频| 亚洲综合色婷婷七月丁香| 超碰在线播放国产精品| 欧美一本在线免费观看| 日本av一区二区不卡| 精品推荐久久久国产av| 最近中文字幕高清中文字幕无| 夫妻性生活动态图视频| 中国少妇精品偷拍视频| 精品国产亚洲免费91|