天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

微博熱點(diǎn)發(fā)現(xiàn)技術(shù)的研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-07-20 15:37
【摘要】:隨著WEB2.0和社交網(wǎng)站蓬勃發(fā)展,互聯(lián)網(wǎng)進(jìn)入了一個(gè)完全嶄新的“自媒體”時(shí)代。以新浪微博、Twitter等為代表的微博網(wǎng)站成為了人們關(guān)注的焦點(diǎn),但隨之而來的巨大的信息量也給人們帶來了困擾,如何從海量的微博信息流中獲得最新的熱門話題,便成人們一種迫切的需求。 通過分析微博信息特點(diǎn),并結(jié)合國內(nèi)外話題跟蹤檢測的方法,首先重點(diǎn)改進(jìn)了單遍聚類算法,該算法通過計(jì)算微博信息流的質(zhì)心,過濾掉大量離質(zhì)心距離過遠(yuǎn)的微博,有效降低了計(jì)算的復(fù)雜度,解決了對大數(shù)據(jù)量的樣本集進(jìn)行聚類時(shí)所出現(xiàn)的計(jì)算量過高,,無法進(jìn)行實(shí)時(shí)運(yùn)算的問題,同時(shí)改善了單遍聚類算法的準(zhǔn)確率對于樣本輸入的順序依賴過高的缺點(diǎn);其次,對樸素貝葉斯信息分類技術(shù)進(jìn)行了改進(jìn),提出了一種在微博文本短小、特征少的情況下提高分類準(zhǔn)確率的方法;最后,在文本特征提取中,采用搜索引擎技術(shù)來對文本特征項(xiàng)提取過程中的互信息進(jìn)行計(jì)算,解決了大規(guī)模短文本難以計(jì)算互信息的問題。 通過搭建微博熱點(diǎn)發(fā)現(xiàn)平臺,并在長期的使用中表明,該微博熱點(diǎn)發(fā)現(xiàn)技術(shù)取得了良好的效果,該算法比傳統(tǒng)的算法更適用于微博的平臺,具有速度快、精確度高、可進(jìn)行大數(shù)據(jù)量實(shí)時(shí)計(jì)算的優(yōu)點(diǎn),有較高的理論意義和實(shí)用價(jià)值。
[Abstract]:With Web 2.0 and social networking sites booming, the Internet has entered a completely new era of self-media. The Weibo websites, such as Sina Weibo Twitter and so on, have become the focus of attention, but the huge amount of information that follows has also brought people trouble, how to get the latest hot topic from the massive Weibo information flow, It becomes an urgent need for people. By analyzing the characteristics of Weibo information and combining the methods of topic tracking and detection at home and abroad, the single-pass clustering algorithm is improved. By calculating the centroid of Weibo information flow, the algorithm filters out a large number of Weibo which are far away from the centroid. The complexity of computation is reduced effectively, and the problem that the amount of computation is too high for the large data set to be clustered is solved, which can not be used in real time operation. At the same time, it improves the accuracy of single-pass clustering algorithm, which depends too much on the order of sample input. Secondly, the naive Bayesian information classification technology is improved, and a short text in Weibo is proposed. Finally, in the text feature extraction, search engine technology is used to calculate the mutual information in the text feature extraction process. The problem that mutual information is difficult to calculate in large-scale short text is solved. Through the construction of Weibo hot spot discovery platform, and in the long-term application, it shows that the Weibo hot spot discovery technology has achieved good results, this algorithm is more suitable for the platform of Weibo than the traditional algorithm, and has fast speed and high accuracy. The advantages of real-time calculation of large amount of data have high theoretical significance and practical value.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 張華平,劉群;基于角色標(biāo)注的中國人名自動(dòng)識別研究[J];計(jì)算機(jī)學(xué)報(bào);2004年01期

2 袁軍鵬;朱東華;李毅;李連宏;黃進(jìn);;文本挖掘技術(shù)研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2006年02期

3 黃永光;劉挺;車萬翔;胡曉光;;面向變異短文本的快速聚類算法[J];中文信息學(xué)報(bào);2007年02期

4 陸玉昌,魯明羽,李凡,周立柱;向量空間法中單詞權(quán)重函數(shù)的分析和構(gòu)造[J];計(jì)算機(jī)研究與發(fā)展;2002年10期



本文編號:2133994

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2133994.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶a2b8a***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
中国一区二区三区人妻| 欧美区一区二在线播放| 日韩精品视频香蕉视频| 亚洲欧洲日韩综合二区| 青青免费操手机在线视频| 青青免费操手机在线视频| 91福利免费一区二区三区| 亚洲精品国男人在线视频| 久久精品国产亚洲av久按摩| 丰满少妇被粗大猛烈进出视频| 欧美一区二区三区喷汁尤物| 国产美女精品午夜福利视频| 欧美日韩一区二区综合| 视频一区二区三区自拍偷| 高中女厕偷拍一区二区三区| 成年女人下边潮喷毛片免费| 亚洲av首页免费在线观看| 九九热精品视频免费在线播放| 国产欧美高清精品一区| 少妇一区二区三区精品| 日本黄色录像韩国黄色录像| 99热九九在线中文字幕| 精品国产亚洲区久久露脸| 高潮少妇高潮久久精品99| 激情偷拍一区二区三区视频| 老司机精品一区二区三区| 91精品视频免费播放| 亚洲av成人一区二区三区在线| 久久女同精品一区二区| 俄罗斯胖女人性生活视频| 日本av在线不卡一区| 久久精品一区二区少妇| 国产精品日韩精品一区| 亚洲国产成人av毛片国产 | 尹人大香蕉中文在线播放| 欧美激情视频一区二区三区| 色婷婷丁香激情五月天| 欧美日韩综合在线精品| 欧美一本在线免费观看| 自拍偷女厕所拍偷区亚洲综合| 情一色一区二区三区四|