天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于分布式MBUT-LDA的微博用戶主題挖掘

發(fā)布時間:2018-05-20 14:07

  本文選題:微博 + 用戶主題。 參考:《重慶大學》2014年碩士論文


【摘要】:微博作為當下最主流的社交網(wǎng)絡(luò)平臺之一,已經(jīng)成為用戶發(fā)布和獲取實時信息的重要手段。微博主題建模能夠從海量信息中挖掘用戶感興趣的話題和其他用戶。但是由于微博具有消息文本短、信息更新快、以及數(shù)據(jù)量巨大等特點,傳統(tǒng)的主題建模方法并不能有效挖掘出用戶真正感興趣的信息。 本文在研究已有的主題建模方法的基礎(chǔ)上,提出一種基于微博用戶和時間維度的建模方法MBUT-LDA。其中MB代表微博(MicroBlog)、U代表用戶(User)、T代表時間(Time)。該方法具有以下特點: ⑴本文在分析研究已有主題模型的基礎(chǔ)上,并且充分利用微博消息的主題在時間上有明顯的集中性特點,將用戶的微博信息按照時間進行劃分。此方法解決了微博文本信息短引起的信息量不完整問題,并且充分利用了微博消息的主題有明顯的時間集中性特點,提高了微博用戶主題的準確度。 ⑵在分析微博用戶和好友關(guān)系的提出上,提出“關(guān)注度”的概念;并結(jié)合TF-IDF算法,提出新的權(quán)重計算公式ATF-IDF,用以衡量微博詞匯預測主題的能力大小。 ⑶現(xiàn)今微博用戶量劇增,并且微博平臺允許微博用戶通過各種移動客戶端發(fā)布即時信息,導致微博信息文檔規(guī)模龐大,單一節(jié)點在分析微博海量信息時容易遇到性能瓶頸問題。本文利用分布式和虛擬化技術(shù)的優(yōu)勢,將提出的新的主題建模方法部署到分布式計算平臺Hadoop上,,實現(xiàn)了一個基于分布式框架Hadoop的MBUT-LDA微博用戶主題挖掘方法。 本文利用提出的分布式MBUT-LDA主題建模方法,通過大量微博消息訓練微博主題模型,并在訓練好的主題的基礎(chǔ)上,挖掘微博用戶的感興趣的主題。實驗證明,經(jīng)過ATF-IDF優(yōu)化的MBUT-LDA的推廣度和主題的準確度要高于MBUT-LDA和U-LDA(基于微博用戶的主題建模)。通過對不同用戶數(shù)量和不同節(jié)點數(shù)量的分布式MBUT-LDA實驗結(jié)果分析發(fā)現(xiàn),隨著節(jié)點增加,能夠有效的減少處理數(shù)據(jù)的時間,并且能夠有效的處理龐大的數(shù)據(jù)。
[Abstract]:As one of the most popular social network platforms, Weibo has become an important means for users to publish and obtain real-time information. Weibo topic modeling can mine topics of interest to users and other users from mass information. However, because Weibo has the characteristics of short message text, fast updating of information and huge amount of data, the traditional method of topic modeling can not effectively mine the information that users are really interested in. In this paper, based on the research of existing thematic modeling methods, a modeling method MBUT-LDA based on Weibo user and time dimension is proposed. MB stands for Weibo MicroBlogn U for user and time for time. The method has the following characteristics: 1. On the basis of analyzing and studying the existing topic models, this paper makes full use of the obvious centrality of the topic of Weibo message in time, and divides the user's Weibo information according to time. This method solves the problem of incomplete information caused by short text information of Weibo, and makes full use of the obvious time centrality of the topic of Weibo message, and improves the accuracy of Weibo user topic. 2 on the analysis of Weibo user and friend relationship, the concept of "concern" is put forward, and a new weight calculation formula ATF-IDF is put forward based on TF-IDF algorithm, which can be used to measure the ability of Weibo vocabulary to predict topic. At present, the number of Weibo users increases dramatically, and the Weibo platform allows Weibo users to publish instant information through various mobile clients, which leads to the large scale of Weibo information documents, and the single node is prone to meet the performance bottleneck problem when analyzing the huge amount of Weibo information. Based on the advantages of distributed and virtualization technology, this paper deploys the new topic modeling method to the distributed computing platform Hadoop, and implements a MBUT-LDA Weibo user topic mining method based on distributed framework Hadoop. In this paper, we use the distributed MBUT-LDA topic modeling method to train the Weibo topic model through a large number of Weibo messages, and mine the topics of interest to Weibo users on the basis of the well trained topics. Experimental results show that the generalization degree and accuracy of MBUT-LDA optimized by ATF-IDF are higher than those of MBUT-LDA and U-LDA (topic modeling based on Weibo users). By analyzing the results of distributed MBUT-LDA experiments with different number of users and different nodes, it is found that with the increase of nodes, the processing time of data can be reduced effectively, and the large amount of data can be processed effectively.
【學位授予單位】:重慶大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092;TP391.1

【參考文獻】

相關(guān)期刊論文 前4條

1 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計算機研究與發(fā)展;2011年10期

2 汪中;劉貴全;陳恩紅;;一種優(yōu)化初始中心點的K-means算法[J];模式識別與人工智能;2009年02期

3 張志飛;苗奪謙;高燦;;基于LDA主題模型的短文本分類方法[J];計算機應(yīng)用;2013年06期

4 羅軍舟;金嘉暉;宋愛波;東方;;云計算:體系架構(gòu)與關(guān)鍵技術(shù)[J];通信學報;2011年07期



本文編號:1914921

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1914921.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c62d5***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
黄色片一区二区三区高清| 日韩18一区二区三区| 免费在线成人午夜视频| 日韩黄色一级片免费收看| 黄片免费在线观看日韩| 日本精品啪啪一区二区三区| 成人欧美一区二区三区视频| 日韩中文字幕免费在线视频| 伊人久久五月天综合网| 久久久精品日韩欧美丰满 | 日本东京热加勒比一区二区| 国产一级精品色特级色国产| 好吊妞视频这里有精品| 欧美一区二区三区播放| 国产伦精品一一区二区三区高清版| 丁香六月婷婷基地伊人| 内射精子视频欧美一区二区| 99久久精品视频一区二区| 免费大片黄在线观看日本| 日本一区不卡在线观看| 国产精品欧美在线观看| 中文字幕亚洲视频一区二区| 视频一区日韩经典中文字幕| 亚洲一区二区三区四区性色av | 爽到高潮嗷嗷叫之在现观看| 性欧美唯美尤物另类视频| 国内九一激情白浆发布| 美女被后入视频在线观看| 国产精品欧美激情在线| 精品国模一区二区三区欧美| 亚洲欧美日韩综合在线成成| 精品日韩av一区二区三区| 久久热在线视频免费观看| 日韩精品免费一区二区三区 | 老司机精品一区二区三区| 精品一区二区三区三级视频| 91亚洲国产成人久久精品麻豆| 欧美视频在线观看一区| 国产又色又粗又黄又爽| 午夜国产精品福利在线观看| 欧美同性视频免费观看|