網(wǎng)絡(luò)熱點(diǎn)話題趨勢(shì)分析及預(yù)測(cè)研究
發(fā)布時(shí)間:2019-03-08 17:19
【摘要】:近年來(lái),針對(duì)社會(huì)媒體信息的自然語(yǔ)言處理相關(guān)研究越來(lái)越受到廣泛關(guān)注,特別是對(duì)社會(huì)突發(fā)事件及網(wǎng)絡(luò)敏感信息的監(jiān)控與預(yù)警,對(duì)社會(huì)輿論的情感趨勢(shì)變化的分析與預(yù)測(cè),都有非常重要的研究?jī)r(jià)值。本文面向新浪微博數(shù)據(jù),對(duì)熱點(diǎn)話題的情感趨勢(shì)進(jìn)行深入的分析和計(jì)算,并根據(jù)歷史微博數(shù)據(jù)進(jìn)行趨勢(shì)建模,預(yù)測(cè)熱點(diǎn)話題的未來(lái)趨勢(shì)。本文根據(jù)微博數(shù)據(jù)的特點(diǎn),將熱點(diǎn)話題分為長(zhǎng)期話題和短期話題,對(duì)這兩種不同的話題分別進(jìn)行事件趨勢(shì)分析與預(yù)測(cè),重點(diǎn)對(duì)預(yù)測(cè)趨勢(shì)發(fā)展的各種特征進(jìn)行深入研究。本文的主要研究工作如下:1.提出了一種基于聯(lián)合深度學(xué)習(xí)模型的情感分類方法,對(duì)微博數(shù)據(jù)進(jìn)行情感分類。該方法使用卷積操作將純粹的多個(gè)詞向量序列重新序列化,得到了具有n-gram信息的詞向量,實(shí)驗(yàn)結(jié)果表明,采用該方法與傳統(tǒng)的CNN方法和LSTM方法比較情感分類準(zhǔn)確率更高,該方法在COAE2016年的情感分類任務(wù)中排名第一。2.對(duì)微博短期熱點(diǎn)話題進(jìn)行趨勢(shì)分析和趨勢(shì)預(yù)測(cè),該方法通過(guò)對(duì)樣本范圍內(nèi)數(shù)據(jù)的計(jì)算,獲得影響事件趨勢(shì)的相關(guān)指標(biāo)的數(shù)據(jù)值,將2個(gè)小時(shí)劃分成1個(gè)時(shí)間段,使用不同的歷史時(shí)間段數(shù)據(jù)進(jìn)行對(duì)比,在4個(gè)時(shí)間段內(nèi)達(dá)到預(yù)測(cè)的最佳效果。在事件趨勢(shì)預(yù)測(cè)研究上按照特征類別排序,構(gòu)建回歸模型進(jìn)行話題熱度預(yù)測(cè)。實(shí)驗(yàn)對(duì)比了自回歸方法,GBDT和CNN四種預(yù)測(cè)方法,實(shí)驗(yàn)結(jié)果表明在短期話題中預(yù)測(cè)2個(gè)小時(shí)內(nèi)的趨勢(shì)時(shí),基于GBDT的方法達(dá)到最佳效果,當(dāng)預(yù)測(cè)誤差在5%以內(nèi)記為預(yù)測(cè)準(zhǔn)確時(shí),準(zhǔn)確率達(dá)79.1%。3.對(duì)于長(zhǎng)期話題,本文提出子主題分離預(yù)測(cè)法,利用在線LDA模型對(duì)相同時(shí)間片上的微博數(shù)據(jù)進(jìn)行訓(xùn)練,得到子主題演化和子主題強(qiáng)度,將話題的發(fā)展分為4類,使用SVM建立分類模型,對(duì)于不同波峰之間的數(shù)據(jù)分別進(jìn)行預(yù)測(cè),實(shí)驗(yàn)結(jié)果表明該方法對(duì)于話題熱度的分類準(zhǔn)確率達(dá)到86%,整體趨勢(shì)預(yù)測(cè)也取得了較好的結(jié)果。
[Abstract]:In recent years, more and more attention has been paid to the research on natural language processing of social media information, especially the monitoring and early warning of social emergencies and network sensitive information, and the analysis and prediction of the emotional trend of social public opinion. All have very important research value. Based on the data of Sina Weibo, this paper analyzes and calculates the emotional trend of hot topics, and models the trend of hot topics according to the historical Weibo data to predict the future trends of hot topics. According to the characteristics of Weibo's data, this paper divides the hot topics into long-term topics and short-term topics, and analyzes and forecasts the event trends of these two different topics respectively, focusing on the in-depth study of the various characteristics of the forecast trend development. The main research work of this paper is as follows: 1. In this paper, an emotion classification method based on joint deep learning model is proposed to classify the emotion of Weibo data. The convolutional operation is used to re-serialize the sequence of pure multiple word vectors, and the word vectors with n-gram information are obtained. The experimental results show that the proposed method is more accurate than the traditional CNN and LSTM methods in emotional classification. This method ranks first among the emotion classification tasks of COAE 2016. 2. Based on the trend analysis and trend prediction of Weibo's short-term hot topics, this method obtains the data values of the related indicators that affect the trend of events by calculating the data in the sample range, and divides the two hours into a period of time. The data of different historical time periods are compared to achieve the best prediction results in four time periods. In the research of event trend prediction, a regression model is constructed to predict the topic heat according to the order of feature categories. The experiment compares auto-regression method, GBDT and CNN prediction method. The experimental results show that the GBDT-based method achieves the best result when predicting the trend within 2 hours in short-term topic. When the prediction error is 5%, the prediction is accurate. The accuracy rate is 79.1%. 3. For a long-term topic, this paper proposes a method of sub-topic separation and prediction. Using the online LDA model to train Weibo data on the same time slice, we get the sub-topic evolution and sub-theme intensity, and divide the topic development into four categories. A classification model based on SVM is used to predict the data between different peaks. The experimental results show that the classification accuracy of the method for topic heat is 86%, and the overall trend prediction has achieved good results.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
本文編號(hào):2437029
[Abstract]:In recent years, more and more attention has been paid to the research on natural language processing of social media information, especially the monitoring and early warning of social emergencies and network sensitive information, and the analysis and prediction of the emotional trend of social public opinion. All have very important research value. Based on the data of Sina Weibo, this paper analyzes and calculates the emotional trend of hot topics, and models the trend of hot topics according to the historical Weibo data to predict the future trends of hot topics. According to the characteristics of Weibo's data, this paper divides the hot topics into long-term topics and short-term topics, and analyzes and forecasts the event trends of these two different topics respectively, focusing on the in-depth study of the various characteristics of the forecast trend development. The main research work of this paper is as follows: 1. In this paper, an emotion classification method based on joint deep learning model is proposed to classify the emotion of Weibo data. The convolutional operation is used to re-serialize the sequence of pure multiple word vectors, and the word vectors with n-gram information are obtained. The experimental results show that the proposed method is more accurate than the traditional CNN and LSTM methods in emotional classification. This method ranks first among the emotion classification tasks of COAE 2016. 2. Based on the trend analysis and trend prediction of Weibo's short-term hot topics, this method obtains the data values of the related indicators that affect the trend of events by calculating the data in the sample range, and divides the two hours into a period of time. The data of different historical time periods are compared to achieve the best prediction results in four time periods. In the research of event trend prediction, a regression model is constructed to predict the topic heat according to the order of feature categories. The experiment compares auto-regression method, GBDT and CNN prediction method. The experimental results show that the GBDT-based method achieves the best result when predicting the trend within 2 hours in short-term topic. When the prediction error is 5%, the prediction is accurate. The accuracy rate is 79.1%. 3. For a long-term topic, this paper proposes a method of sub-topic separation and prediction. Using the online LDA model to train Weibo data on the same time slice, we get the sub-topic evolution and sub-theme intensity, and divide the topic development into four categories. A classification model based on SVM is used to predict the data between different peaks. The experimental results show that the classification accuracy of the method for topic heat is 86%, and the overall trend prediction has achieved good results.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 李棟;徐志明;李生;劉挺;王秀文;;在線社會(huì)網(wǎng)絡(luò)中信息擴(kuò)散[J];計(jì)算機(jī)學(xué)報(bào);2014年01期
2 謝麗星;周明;孫茂松;;基于層次結(jié)構(gòu)的多策略中文微博情感分析和特征抽取[J];中文信息學(xué)報(bào);2012年01期
3 徐軍;丁宇新;王曉龍;;使用機(jī)器學(xué)習(xí)方法進(jìn)行新聞的情感自動(dòng)分類[J];中文信息學(xué)報(bào);2007年06期
相關(guān)博士學(xué)位論文 前1條
1 田野;基于微博平臺(tái)的事件趨勢(shì)分析及預(yù)測(cè)研究[D];武漢大學(xué);2012年
相關(guān)碩士學(xué)位論文 前3條
1 張華;基于優(yōu)化BP神經(jīng)網(wǎng)絡(luò)的微博輿情預(yù)測(cè)模型研究[D];華中師范大學(xué);2014年
2 王來(lái)濤;網(wǎng)絡(luò)短文本話題發(fā)現(xiàn)與趨勢(shì)預(yù)測(cè)研究[D];北京工業(yè)大學(xué);2013年
3 劉麗芳;微博客的傳播特征與傳播效果研究[D];浙江大學(xué);2010年
,本文編號(hào):2437029
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2437029.html
最近更新
教材專著