天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于K關(guān)聯(lián)圖的流分類算法及其在微博情感分析中的應(yīng)用

發(fā)布時間:2018-05-26 15:30

  本文選題:微博 + 數(shù)據(jù)流 ; 參考:《鄭州大學(xué)》2014年碩士論文


【摘要】:隨著信息黃金時代的到來,人們越來越意識到數(shù)據(jù)的重要性,而從這些大量的數(shù)據(jù)中挖掘有用的信息也變得越來越困難。特別是微博的興起,使得每天產(chǎn)生大量的微博文本數(shù)據(jù),而這些微博文本較短,信息量較少,通常被稱作短文本流。在這些短文本流中,蘊藏著大量意見資源。比如產(chǎn)品的評論,這些評論對于賣家和買家都很有價值;又如熱點事件的評論,這些評論對于政府部門了解人民群眾對某些事件的態(tài)度也很重要。因此,如何從短文本流中挖掘有用的知識是人們關(guān)心的問題,這些需求也促使著數(shù)據(jù)流挖掘成為近年來研究的熱點和難點。 本文在總結(jié)了一些成熟的數(shù)據(jù)流分類算法的基礎(chǔ)之上,提出了一種基于K關(guān)聯(lián)圖的數(shù)據(jù)流分類算法(K-associated Graphs Based Classifier,KGBC),該算法首先把整個數(shù)據(jù)塊表示成一張K關(guān)聯(lián)圖,通過K關(guān)聯(lián)圖能夠表示數(shù)據(jù)實例之間的相似關(guān)系和子圖的純度。然后根據(jù)K關(guān)聯(lián)圖優(yōu)化算法對數(shù)據(jù)塊劃分的結(jié)果去選擇基礎(chǔ)分類器中與當前待分類的數(shù)據(jù)塊概念相似的基礎(chǔ)分類器,最后對這些基礎(chǔ)分類器進行集成,使用概念相似度作為基礎(chǔ)分類器的權(quán)重對測試數(shù)據(jù)進行分類。該算法不用每當新的數(shù)據(jù)塊來的時候重新訓(xùn)練分類器,從而節(jié)省時間。實驗表明,KGBC算法具有較好的預(yù)測準確率。 本文的另一項工作是短文本流中的情感分析。短文本流情感分析關(guān)鍵是如何判別文本消息的情感傾向性,而判別情感傾向性的首要條件是構(gòu)建一個適合微博文本的情感詞詞典。因此,本文提出了一種基于依存句法的微博情感詞抽取算法,根據(jù)微博情感詞在依存句法中常出現(xiàn)的位置總結(jié)出一些模版,根據(jù)模版自動的識別網(wǎng)絡(luò)上新的情感詞?紤]到中文微博表達多元化的特點,本文采用微博文本中的情感詞、詞性、上下文關(guān)系和主題特征等作為情感分類的特征,通過實驗對比KGBC算法和傳統(tǒng)的情感分類算法,,從而驗證了KGBC算法在短文本流情感分類的有效性。
[Abstract]:With the arrival of the golden age of information, people are becoming more and more aware of the importance of data, and it is becoming more and more difficult to mine useful information from these large amounts of data. Especially with the rise of Weibo, a large amount of Weibo text data is produced every day, and these Weibo texts are short and have less information, so they are usually called short text stream. In these short text stream, contain a large number of opinion resources. For example, product reviews, which are valuable to both sellers and buyers, as well as hot spot reviews, are also important for government departments to understand people's attitudes to certain events. Therefore, how to mine useful knowledge from short text stream is a problem that people are concerned about, and these requirements make data stream mining become a hot and difficult point in recent years. On the basis of summarizing some mature data stream classification algorithms, this paper proposes a K-associated Graphs Based classifier KGBCU algorithm based on K-associative graph, which first represents the whole data block as a K-correlation graph. The similarity relation between data instances and the purity of subgraph can be expressed by K correlation graph. Then, according to the result of data block partition based on K-correlation graph optimization algorithm, we select the basic classifier in the basic classifier, which is similar to the current data block concept to be classified. Finally, we integrate these basic classifiers. The concept similarity is used as the weight of the basic classifier to classify the test data. The algorithm does not need to retrain the classifier whenever a new block of data comes in, thus saving time. Experiments show that the KGBC algorithm has better prediction accuracy. Another work of this paper is the emotional analysis in the text stream. The key of emotional analysis is how to judge the emotional tendency of text message, and the first condition of judging emotional tendency is to construct a dictionary of affective words suitable for Weibo text. Therefore, this paper presents an algorithm for extracting Weibo affective words based on dependency syntax. According to the common location of Weibo affective words in dependency syntax, some templates are summed up, and new emotive words are automatically recognized on the network according to template. Considering the diversity of the expression of Chinese Weibo, this paper uses the affective words, parts of speech, context and subject features in Weibo text as the features of emotional classification, and compares the KGBC algorithm with the traditional affective classification algorithm through experiments. The validity of KGBC algorithm in short text stream emotion classification is verified.
【學(xué)位授予單位】:鄭州大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP311.13

【參考文獻】

相關(guān)期刊論文 前1條

1 謝麗星;周明;孫茂松;;基于層次結(jié)構(gòu)的多策略中文微博情感分析和特征抽取[J];中文信息學(xué)報;2012年01期



本文編號:1937767

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1937767.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶18be9***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com