天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于主題模型和社區(qū)發(fā)現(xiàn)的微博熱點(diǎn)事件檢測研究

發(fā)布時間:2018-07-17 03:56
【摘要】:憑借簡便快捷的信息生成機(jī)制和傳播機(jī)制,微博這一新興的社交網(wǎng)絡(luò)服務(wù)媒體在Web2.0時代已無處不在。與傳統(tǒng)媒體相比,在新聞事件的播報(bào)和傳播上,微博更加及時高效。因而基于微博數(shù)據(jù)的熱點(diǎn)事件檢測成為近幾年的一個研究熱點(diǎn)。但微博的一些特性為微博熱點(diǎn)事件檢測任務(wù)帶來挑戰(zhàn)。首先,微博數(shù)據(jù)流中有大量無價值、無意義的“噪聲”微博,如何有效的從微博數(shù)據(jù)流中將令人感興趣的事件微博與大量“噪聲”微博區(qū)分開是微博熱點(diǎn)事件檢測面對的首要挑戰(zhàn)。其次,一條微博只有不超過140個字符,文本極其稀疏,且常常包含拼寫和語法錯誤、混合語言文字等,這些都使傳統(tǒng)的文本分析技術(shù)無法直接應(yīng)用于微博事件檢測。 本文首先研究了國內(nèi)外現(xiàn)有微博熱點(diǎn)事件檢測相關(guān)技術(shù),然后根據(jù)現(xiàn)有技術(shù)不足,在靜態(tài)和動態(tài)兩種類型的微博熱點(diǎn)事件檢測上進(jìn)行了相關(guān)研究和擴(kuò)展。在靜態(tài)微博事件檢測方面,本文提出一種基于主題模型和貝葉斯方法的文本分類方法在靜態(tài)微博數(shù)據(jù)上檢測事件微博,該方法將靜態(tài)微博數(shù)據(jù)映射到主題空間表述,并挖掘主題與文本類型之間的關(guān)系,然后根據(jù)微博的主題類別屬性是否為事件類判斷該微博的類別屬性。在動態(tài)事件檢測方面,本文提出一種基于社區(qū)發(fā)現(xiàn)和圖核計(jì)算的動態(tài)事件檢測方法,該方法首先根據(jù)本文提出的一種動態(tài)事件詞選取算法選取事件詞;然后分時間片將動態(tài)實(shí)時微博數(shù)據(jù)流中的微博根據(jù)其所含事件詞狀態(tài)構(gòu)建成微博語義圖,每個時間片的微博語義圖以微博博文為結(jié)點(diǎn),以結(jié)點(diǎn)之間是否出現(xiàn)相同事件詞為邊,然后使用一種社區(qū)發(fā)現(xiàn)算法發(fā)現(xiàn)每個時間片微博語義圖中的事件社區(qū),并返回每個事件社區(qū)的關(guān)鍵結(jié)點(diǎn)微博作為該事件社區(qū)所反映事件的描述;本文還提出一種基于主題語義的編碼方案為事件社區(qū)圖中每個結(jié)點(diǎn)編制一個比特?cái)?shù)組編碼標(biāo)簽,得到新的帶標(biāo)簽的事件社區(qū)圖,最后應(yīng)用一種圖核算法,計(jì)算在相鄰時間片的標(biāo)簽事件社區(qū)圖的相似度,并根據(jù)計(jì)算結(jié)果匹配描述同一事件的事件社區(qū),達(dá)到事件追蹤的目的。本文以實(shí)時爬取的中文微博數(shù)據(jù)為實(shí)驗(yàn)數(shù)據(jù),分別應(yīng)用上述兩種方法檢測微博熱點(diǎn)事件,實(shí)驗(yàn)結(jié)果表明,上述兩種方法均能達(dá)到預(yù)期效果。
[Abstract]:With the convenient and fast mechanism of information generation and dissemination, Weibo, a new social network service media, has become ubiquitous in the era of Web 2.0. Compared with traditional media, Weibo is more timely and efficient in the broadcast and dissemination of news events. Therefore, hot spot event detection based on Weibo data has become a research hotspot in recent years. However, some features of Weibo bring challenges to Weibo hotspot event detection task. First, there are a lot of worthless, meaningless "noisy" Weibo in the Weibo data stream. How to effectively distinguish the interesting event Weibo from a large number of "noise" Weibo from the Weibo data stream is the primary challenge of Weibo hot event detection. Secondly, a Weibo has no more than 140 characters, the text is extremely sparse, and often contains spelling and grammar errors, mixed languages and so on, which make the traditional text analysis technology can not be directly applied to Weibo event detection. This paper first studies the existing Weibo hot spot event detection technologies at home and abroad, and then, according to the lack of the existing technology, we research and extend the static and dynamic Weibo hot spot event detection. In the aspect of static Weibo event detection, a text classification method based on topic model and Bayesian method is proposed to detect event Weibo on static Weibo data. This method maps static Weibo data to topic space representation. The relationship between the topic and the text type is mined, and then the category attribute of the Weibo is judged according to whether the subject category attribute of the Weibo is the event class. In the aspect of dynamic event detection, this paper proposes a dynamic event detection method based on community discovery and graph kernel computing. Firstly, this method selects event words according to a dynamic event word selection algorithm proposed in this paper. Then, the Weibo in the dynamic real-time Weibo data stream is constructed into a Weibo semantic map according to the status of the event words in the dynamic real-time Weibo data stream. The Weibo semantic map of each time slice is based on the Weibo blog as the node and the same event word as the edge between the nodes. Then a community discovery algorithm is used to find the event community in the Weibo semantic graph of each time slice and return the key node of each event community Weibo as the description of the event community reflected by the event community. In this paper, we also propose a coding scheme based on topic semantics to compile a bit-array coding tag for each node in the event community graph, and obtain a new tagged event community map. Finally, a graph accounting method is applied. The similarity of the tagged event community graph in the adjacent time slice is calculated, and the event community describing the same event is matched according to the calculated results to achieve the purpose of event tracking. In this paper, the Chinese Weibo data collected in real time are used as experimental data, and the two methods mentioned above are used to detect the hot spot events of Weibo. The experimental results show that the two methods can achieve the desired results.
【學(xué)位授予單位】:西南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 路榮;項(xiàng)亮;劉明榮;楊青;;基于隱主題分析和文本聚類的微博客中新聞話題的發(fā)現(xiàn)[J];模式識別與人工智能;2012年03期

,

本文編號:2128903

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2128903.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶009bb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美激情视频一区二区三区| 色综合伊人天天综合网中文 | 在线日韩欧美国产自拍| 夫妻性生活动态图视频| 日本高清不卡在线一区| 国产精品一区二区丝袜| 日韩av生活片一区二区三区| 亚洲伦片免费偷拍一区| 国产传媒一区二区三区| 中文文精品字幕一区二区 | 精品国产亚洲av成人一区| 亚洲免费观看一区二区三区| 黄色三级日本在线观看| 亚洲熟妇中文字幕五十路| 不卡中文字幕在线视频| 国产亚洲二区精品美女久久| 中日韩美女黄色一级片| 午夜精品久久久99热连载| 久久精品久久精品中文字幕| 日韩丝袜诱惑一区二区| 欧美不卡高清一区二区三区| 欧美日韩综合免费视频| 亚洲av又爽又色又色| 97人妻精品免费一区二区| 日本熟女中文字幕一区| 欧美日本道一区二区三区| 日本深夜福利视频在线| 亚洲av一区二区三区精品| 国产成人亚洲精品青草天美| 青青操日老女人的穴穴| 久久本道综合色狠狠五月| 日韩成人中文字幕在线一区| 国产又爽又猛又粗又色对黄| 少妇人妻无一区二区三区| 九九热精品视频在线观看| 日韩成人动画在线观看| 日本道播放一区二区三区| 精品一区二区三区免费看| 久热这里只有精品九九| 亚洲欧美日韩中文字幕二欧美| 午夜日韩在线观看视频|