通過新浪微博數(shù)據(jù)挖掘預(yù)測(cè)上證綜指走向
發(fā)布時(shí)間:2018-12-15 14:59
【摘要】:社交網(wǎng)絡(luò)在最近幾年發(fā)展迅速,國(guó)內(nèi)的新浪微博覆蓋面廣,其內(nèi)容產(chǎn)生便捷,傳播迅速,提供了海量的直接或間接數(shù)據(jù),故本文選取新浪微博作為數(shù)據(jù)來源,通過抽取新浪微博中的文本數(shù)據(jù),結(jié)合上證綜指的漲跌信息,發(fā)掘二者之問的相關(guān)性,并嘗試建立預(yù)測(cè)模型,進(jìn)而為股市投資者提供一定的參考信息。 新浪微博文本數(shù)據(jù)的抓取,主要是通過自己編寫網(wǎng)絡(luò)爬蟲來實(shí)現(xiàn)的。其中,重點(diǎn)分析并解決了用戶登陸、高級(jí)搜索、單位時(shí)間內(nèi)IP訪問次數(shù)限制、文本析取、文本清洗、指標(biāo)提取等問題。 將整理后的新浪微博文本信息以及上證綜指收盤價(jià)信息,結(jié)合人工神經(jīng)網(wǎng)絡(luò)算法,最終建立了新浪微博對(duì)上證綜指收盤價(jià)的預(yù)測(cè)模型。 本文主要?jiǎng)?chuàng)新點(diǎn)有: 1.國(guó)內(nèi)利用新浪微博數(shù)據(jù)預(yù)測(cè)上證綜指走勢(shì)的研究尚未發(fā)現(xiàn),本文以此為出發(fā)點(diǎn),利用新浪微博數(shù)據(jù)預(yù)測(cè)上證綜指走勢(shì)。 2.新浪微博文本內(nèi)容的抓取過程中,引入分布式系統(tǒng)的機(jī)制,解決了新浪微博在用戶層次和IP層次上設(shè)置的反網(wǎng)絡(luò)爬蟲限制。 3.本研究屬于時(shí)間序列分析,文中創(chuàng)新的解決了新浪微博的搜索,在指定時(shí)間區(qū)間并指定微博相關(guān)關(guān)鍵詞的條件下,成功抓取到微博內(nèi)容。 4.個(gè)性化的改進(jìn)人工神經(jīng)網(wǎng)絡(luò)算法,加入可變數(shù)據(jù)集和自動(dòng)修正特征,提高了模型預(yù)測(cè)精度。
[Abstract]:Social networks have developed rapidly in recent years. Sina Weibo has a wide coverage, its content is easy to produce, it spreads quickly, and provides a large amount of direct or indirect data. By extracting the text data from Weibo of Sina and combining the information of the rise and fall of the Shanghai Composite Index, this paper explores the correlation of the two questions, and tries to establish a forecasting model to provide certain reference information for the stock market investors. Sina Weibo text data capture, mainly through their own web crawler to achieve. Among them, the problems of user login, advanced search, IP access times per unit time, text extraction, text cleaning and index extraction are analyzed and solved. Combining the text information of Sina Weibo and the closing price information of Shanghai Composite Index, and combining the artificial neural network algorithm, the final forecast model of the closing price of Shanghai Composite Index is established by Sina Weibo. The main innovations of this paper are as follows: 1. The research on forecasting the trend of Shanghai Composite Index by Sina Weibo data has not been found in China. This paper takes this as the starting point and forecasts the trend of Shanghai Composite Index by using the data of Sina Weibo. 2. The mechanism of distributed system is introduced in the process of text content capture of Sina Weibo, which solves the anti-network crawler restriction set at user level and IP level. 3. This research belongs to the time series analysis, the article innovatively solves the Sina Weibo's search, under the condition that designates the time interval and designates the Weibo related key words, successfully grabs the Weibo content. 4. The improved artificial neural network algorithm, the variable data set and the automatic correction feature can improve the prediction accuracy of the model.
【學(xué)位授予單位】:首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;F832.51
[Abstract]:Social networks have developed rapidly in recent years. Sina Weibo has a wide coverage, its content is easy to produce, it spreads quickly, and provides a large amount of direct or indirect data. By extracting the text data from Weibo of Sina and combining the information of the rise and fall of the Shanghai Composite Index, this paper explores the correlation of the two questions, and tries to establish a forecasting model to provide certain reference information for the stock market investors. Sina Weibo text data capture, mainly through their own web crawler to achieve. Among them, the problems of user login, advanced search, IP access times per unit time, text extraction, text cleaning and index extraction are analyzed and solved. Combining the text information of Sina Weibo and the closing price information of Shanghai Composite Index, and combining the artificial neural network algorithm, the final forecast model of the closing price of Shanghai Composite Index is established by Sina Weibo. The main innovations of this paper are as follows: 1. The research on forecasting the trend of Shanghai Composite Index by Sina Weibo data has not been found in China. This paper takes this as the starting point and forecasts the trend of Shanghai Composite Index by using the data of Sina Weibo. 2. The mechanism of distributed system is introduced in the process of text content capture of Sina Weibo, which solves the anti-network crawler restriction set at user level and IP level. 3. This research belongs to the time series analysis, the article innovatively solves the Sina Weibo's search, under the condition that designates the time interval and designates the Weibo related key words, successfully grabs the Weibo content. 4. The improved artificial neural network algorithm, the variable data set and the automatic correction feature can improve the prediction accuracy of the model.
【學(xué)位授予單位】:首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;F832.51
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張國(guó)安;鐘紹輝;;基于微博用戶評(píng)論和用戶轉(zhuǎn)發(fā)的數(shù)據(jù)挖掘[J];電腦知識(shí)與技術(shù);2012年27期
2 張宗科;;自動(dòng)下載批量網(wǎng)頁(yè)的一種模擬人工實(shí)現(xiàn)方法[J];電腦編程技巧與維護(hù);2013年12期
3 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計(jì)算機(jī)研究與發(fā)展;2011年10期
4 劉金紅;陸余良;;主題網(wǎng)絡(luò)爬蟲研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2007年10期
5 龐磊;李壽山;張慧;周國(guó)棟;;基于微博的股票投資者未來情感傾向識(shí)別研究[J];計(jì)算機(jī)科學(xué);2012年S1期
6 張e,
本文編號(hào):2380849
本文鏈接:http://sikaile.net/jingjilunwen/touziyanjiulunwen/2380849.html
最近更新
教材專著