基于社交網(wǎng)絡(luò)文本分析的短期股市行情預(yù)測(cè)
本文關(guān)鍵詞: 股票市場(chǎng) 股票評(píng)論 情感分析 股市預(yù)測(cè) 出處:《華中師范大學(xué)》2016年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:互聯(lián)網(wǎng)時(shí)代的到來(lái),標(biāo)志著我們生活方式的巨大改變。人們通過(guò)網(wǎng)絡(luò)可以獲取各種想要的信息。特別是伴隨著Web技術(shù)由Web1.0向Web2.0逐漸過(guò)渡,金融領(lǐng)域信息開(kāi)始在網(wǎng)絡(luò)上進(jìn)行集散,論壇、博客等等提供互動(dòng)的領(lǐng)域不斷地涌現(xiàn)。論壇作為眾多互動(dòng)平臺(tái)之一,越來(lái)越多的股民在股票論壇中發(fā)表個(gè)人對(duì)當(dāng)前股市的看法,產(chǎn)生了大量的具有極大研究?jī)r(jià)值的網(wǎng)絡(luò)文本,這些信息中往往包含投資者對(duì)股市的相關(guān)評(píng)論以及今后可能的投資計(jì)劃信息,由此通過(guò)這一類(lèi)型的股票評(píng)論來(lái)了解投資者的未來(lái)的行為是一條行之有效的路徑。目前,國(guó)內(nèi)外已有部分學(xué)者嘗試通過(guò)對(duì)社會(huì)網(wǎng)絡(luò)的分析來(lái)預(yù)測(cè)短期股市行情。國(guó)外的工作主要關(guān)注的是較為成熟的歐美股市,其方法對(duì)不太成熟中國(guó)股市的描述能力尚待考證;國(guó)內(nèi)已有的工作則主要是探索性工作,缺乏系統(tǒng)性和可量化預(yù)測(cè)工作。鑒于此,本文通過(guò)對(duì)國(guó)內(nèi)股市相關(guān)的文本資源的抽取和建模并結(jié)合情感分析方法,構(gòu)建了股市漲跌預(yù)測(cè)模型對(duì)短期股市行情進(jìn)行預(yù)測(cè)。本文的主要研究工作和貢獻(xiàn)如下:第一,互聯(lián)網(wǎng)上大量存在的關(guān)于股市的文字評(píng)論有可能反映當(dāng)前股市的行情,利用這些股票評(píng)論,對(duì)股市行情能做出一定的預(yù)測(cè)。本文提出了基于向量空間模型和詞向量模型對(duì)股票評(píng)論文本建模的方法。在學(xué)習(xí)得到詞向量之后,本文采用k-means聚類(lèi)方法將文本聚類(lèi)為k個(gè)類(lèi)別。隨后,本文提出從文本到詞集的映射規(guī)則,通過(guò)文本和詞集的映射規(guī)則將短文本映射到一個(gè)k維的向量空間中,最后完成對(duì)文本的建模。實(shí)驗(yàn)結(jié)果表明,在詞向量建模方式下的最優(yōu)準(zhǔn)確率68%要顯著高于在向量空間模型下的最優(yōu)準(zhǔn)確率63.8%,并且這兩個(gè)準(zhǔn)確度都要高于相關(guān)文獻(xiàn)中給出的預(yù)測(cè)結(jié)果。第二,上述基于簡(jiǎn)單文本特征的預(yù)測(cè)方法只考慮了表層特征,對(duì)文本中蘊(yùn)含的深層次信息描述能力有限。因此本文提出一種融合情感分析的股票預(yù)測(cè)方法。通過(guò)預(yù)先選取少量已標(biāo)注情感極性的詞匯作為種子詞,計(jì)算未知情感極性詞語(yǔ)與種子詞匯的相關(guān)性,最終自動(dòng)生成股票情感詞典,并以此詞典為基礎(chǔ)來(lái)對(duì)文本進(jìn)行深層次建模。實(shí)驗(yàn)結(jié)果表明,融合情感特征的方法比單獨(dú)基于簡(jiǎn)單文本特征所得到的預(yù)測(cè)準(zhǔn)確率明顯要高。
[Abstract]:The advent of the Internet era marks a great change in our way of life. People can obtain all kinds of information they want through the Internet. Especially, with the gradual transition of Web technology from Web1.0 to Web2.0, financial information begins to be distributed on the network. Forums, blogs, and other areas of interaction continue to emerge. As one of the many interactive platforms, more and more investors express their personal views on the current stock market in the Stock Forum. Has produced a large number of Internet texts of great research value, which often contain investors' comments on the stock market and possible future investment plans. It is an effective way to understand the future behavior of investors through this type of stock review. Some scholars at home and abroad have tried to predict the short-term stock market through the analysis of social network. The existing work in China is mainly exploratory work, lack of systematic and quantifiable prediction. In view of this, this paper combines the emotional analysis method with the extraction and modeling of text resources related to domestic stock market. The main research work and contributions of this paper are as follows: first, there are a lot of comments on the stock market on the Internet that may reflect the current stock market. Using these stock reviews, we can predict the stock market price. In this paper, we propose a method to model stock comment text based on vector space model and word vector model. In this paper, k-means clustering method is used to cluster the text into k categories. Then, a mapping rule from text to word set is proposed, and the short text is mapped to a k-dimensional vector space by the mapping rules of text and word set. Finally, the text modeling is completed. The experimental results show that, The optimal accuracy rate 68% in word vector modeling mode is significantly higher than that in vector space model 63.8%, and both accuracy are higher than the prediction results given in related literature. The above prediction methods based on simple text features only consider surface features. This paper presents a stock prediction method combining affective analysis. A few words with marked affective polarity are selected as seed words in advance. The correlation between unknown affective polarity words and seed words is calculated, and the stock emotion dictionary is generated automatically, based on which the text is modeled at a deep level. The experimental results show that, The prediction accuracy of affective feature fusion is higher than that of simple text feature alone.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 肖生苓;牟娌娜;王維;高曉紅;;基于數(shù)據(jù)挖掘技術(shù)的超市顧客群研究[J];資源開(kāi)發(fā)與市場(chǎng);2011年08期
2 潘宇曦;葉宇航;賀仁龍;;基于數(shù)據(jù)挖掘的電信行業(yè)精確化套餐設(shè)計(jì)方法研究[J];情報(bào)雜志;2011年S1期
3 錢(qián)萍;吳蒙;;同態(tài)加密隱私保護(hù)數(shù)據(jù)挖掘方法綜述[J];計(jì)算機(jī)應(yīng)用研究;2011年05期
4 張靖;金浩;;漢語(yǔ)詞語(yǔ)情感傾向自動(dòng)判斷研究[J];計(jì)算機(jī)工程;2010年23期
5 龔著琳;陳瑛;蘇懿;劉雅琴;徐立鈞;;數(shù)據(jù)挖掘在生物醫(yī)學(xué)數(shù)據(jù)分析中的應(yīng)用[J];上海交通大學(xué)學(xué)報(bào)(醫(yī)學(xué)版);2010年11期
6 李壽山;黃居仁;;基于Stacking組合分類(lèi)方法的中文情感分類(lèi)研究[J];中文信息學(xué)報(bào);2010年05期
7 周杰;林琛;李弼程;;基于機(jī)器學(xué)習(xí)的網(wǎng)絡(luò)新聞評(píng)論情感分類(lèi)研究[J];計(jì)算機(jī)應(yīng)用;2010年04期
8 那日薩;劉影;李媛;;消費(fèi)者網(wǎng)絡(luò)評(píng)論的情感模糊計(jì)算與產(chǎn)品推薦研究[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年01期
9 宋曉雷;王素格;李紅霞;;面向特定領(lǐng)域的產(chǎn)品評(píng)價(jià)對(duì)象自動(dòng)識(shí)別研究[J];中文信息學(xué)報(bào);2010年01期
10 黃永文;何中市;伍星;;產(chǎn)品特征的層次關(guān)系獲取[J];計(jì)算機(jī)工程與應(yīng)用;2009年22期
,本文編號(hào):1525768
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1525768.html