天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于改進主題模型的微博短文本情感分析的研究

發(fā)布時間:2018-05-14 19:46

  本文選題:主題模型 + 情感分析 ; 參考:《東南大學》2017年碩士論文


【摘要】:隨著Web2.0以及社交媒體的發(fā)展,人們越來越廣泛地使用互聯(lián)網來發(fā)布和分享信息。其中,用戶大部分產生的內容為短文本信息,如微博、商品評論等。這些微博短文本信息的長度雖短,但規(guī)模大、更新快,且蘊含著大量的個人情感表達信息,挖掘這些信息對輿情檢測、用戶分析、商品信息分析等領域都有重要的意義。然而,從微博短文本這類數(shù)據中挖掘出帶有主觀情感色彩的主題并非易事。短文本的內容非常稀疏,上下文信息嚴重不足,且通常包含很多錯別字、新生詞等。這導致現(xiàn)有的基于主題模型的方法,都不能挖掘出短文本中高質量的帶有主觀情感色彩的主題。針對以上問題,本文提出兩個面向微博短文本的情感主題模型,旨在挖掘出短文本中高質量的帶有主觀情感色彩的主題。具體而言,本文的主要工作和貢獻如下:(1)提出一種聯(lián)合時間和用戶信息建模的情感主題混合模型,即時間用戶情感模型(Time-User Sentiment Latent Dirichlet Allocation,TUS-LDA)。它將同一時間下或同一用戶發(fā)出的帖子聚合成一個偽的長文檔,豐富上下文信息,一定程度上緩解了短文本數(shù)據稀疏的問題,挖掘出高質量的情感相關的主題信息。(2)提出一種結合時間、用戶和hashtag信息建模的情感主題混合模型,即微博情感模型(WeiboSentimentModel,WSM)。該模型擴展 TUS-LDA 模型,利用 hashtag 帶來的語義知識,進一步豐富上下文信息。(3)本文通過在3個真實數(shù)據集上的多個的實驗對比,評估了 7個模型在挖掘情感相關主題和情感分類上的效果。其中本文提出的兩個模型TUS-LDA和WSM都優(yōu)于其他5個對比模型,WSM的性能又比TUS-LDA略好。TUS-LDA和WSM挖掘到了高質量的情感相關的主題,對商品情感分析和輿情分析有重大的幫助。(4)設計和實現(xiàn)了以WSM為核心的微博情感分析系統(tǒng)WSAS。
[Abstract]:With the development of Web2.0 and social media, more and more people use the Internet to publish and share information. Among them, most of the content generated by users is short text information, such as Weibo, commodity reviews and so on. Although the length of these Weibo short texts is short, the information is large, updated quickly, and contains a large amount of personal emotional expression information. Mining these information is of great significance to public opinion detection, user analysis, commodity information analysis and other fields. However, it is not easy to extract subjective themes from data such as Weibo essays. The content of short text is very sparse, the context information is seriously insufficient, and usually contains a lot of wrong words, new words and so on. As a result, none of the existing methods based on topic model can mine the high quality subject with subjective emotion in short text. In view of the above problems, this paper proposes two affective subject models for short text of Weibo, which aims to find out the theme of high quality and subjective emotion in short text. Specifically, the main work and contributions of this paper are as follows: 1) A hybrid emotional subject model, Time-User Sentiment Latent Dirichlet location (TUS-LDAA), is proposed, which combines time and user information modeling. It aggregates posts issued at the same time or by the same user into a long pseudo-document, which enriches the context information, and to some extent alleviates the problem of sparse data in short text books. (2) A hybrid model of emotion theme based on time, user and hashtag information is proposed, that is, the Weibo emotion model is WeiboSentification Model (WSM). This model extends the TUS-LDA model and further enriches context information by using the semantic knowledge brought by hashtag. The effects of 7 models on emotion related themes and emotion classification were evaluated. The two models proposed in this paper, TUS-LDA and WSM, are better than the other five contrast models. TUS-LDA and WSM mining high quality affective related themes. It is of great help to commodity emotion analysis and public opinion analysis. (4) A Weibo emotional analysis system based on WSM is designed and implemented.
【學位授予單位】:東南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1

【參考文獻】

相關期刊論文 前1條

1 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學報;2010年08期

,

本文編號:1889222

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1889222.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶a8174***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com