基于強(qiáng)化學(xué)習(xí)的自動(dòng)交易代理
發(fā)布時(shí)間:2022-09-29 20:37
本文使用強(qiáng)化學(xué)習(xí)構(gòu)建了與金融市場(chǎng)進(jìn)行智能交互的自動(dòng)交易代理。股票市場(chǎng)交易可以用于評(píng)估和開發(fā)新的機(jī)器學(xué)習(xí)方法,這些方法需要對(duì)金融市場(chǎng)交易問題的特征做出調(diào)整,尤其是強(qiáng)化學(xué)習(xí)。預(yù)測(cè)股市變化是一項(xiàng)非常艱巨的任務(wù),因?yàn)轵?qū)動(dòng)市場(chǎng)行為的基本模式是非靜態(tài)的,這意味著過去學(xué)習(xí)到的有用的預(yù)測(cè)模式可能不適合在將來應(yīng)用。強(qiáng)化學(xué)習(xí)尚未在該應(yīng)用領(lǐng)域中廣泛應(yīng)用,相比于其他技術(shù),強(qiáng)化學(xué)習(xí)的范式可以使代理具有更大自由度地直接學(xué)習(xí)交易決策模型,例如,無需預(yù)設(shè)定義用于購買或出售這些決策信號(hào)的特定閾值。價(jià)格的變化可以自然地被看作是一種獎(jiǎng)勵(lì),所以強(qiáng)化學(xué)習(xí)可以避免在監(jiān)督學(xué)習(xí)中標(biāo)注示例和構(gòu)建訓(xùn)練數(shù)據(jù)集所需的成本。在對(duì)先前文獻(xiàn)的研究中,我們發(fā)現(xiàn)現(xiàn)有的應(yīng)用強(qiáng)化學(xué)習(xí)算法來生成交易決策的研究通常不能解決非靜態(tài)環(huán)境的問題。先前文獻(xiàn)中所提出的方法得到的單一代理不會(huì)隨著時(shí)間的變化而重新校準(zhǔn),同時(shí)學(xué)到的交易策略有時(shí)會(huì)陷入局部最優(yōu)。本文提出的方法通過使用多個(gè)代理和一個(gè)多階段學(xué)習(xí)模型來緩解上述提到的問題,多個(gè)代理可以競(jìng)爭(zhēng)性地推薦最佳決策。我們的方法將在線學(xué)習(xí)與強(qiáng)化學(xué)習(xí)相結(jié)合。在線學(xué)習(xí)用于在決策點(diǎn)實(shí)時(shí)從一組代理中選擇推薦的交易策略,還可以基于最近的數(shù)據(jù)...
【文章頁數(shù)】:98 頁
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Motivation
1.2 Objective
1.3 Outline
Chapter 2 Background
2.1 Machine Learning
2.1.1 Sample Data
2.1.2 Supervised Learning
2.1.3 Unsupervised Learning
2.1.4 Reinforcement Learning
2.2 Preprocessing
2.2.1 Feature Selection and the Curse of Dimensionality
2.2.2 Feature Scaling
2.3 Model Evaluation
2.3.1 Regression Metrics
2.3.2 Classification Metrics
2.3.3 Reinforcement Learning Metrics
2.3.4 Underfitting
2.3.5 Overfitting
2.3.6 Cross Validation
2.3.7 Hyperparameter Tuning
2.4 Models
2.4.1 Logistic Regression
2.4.2 Support Vector Machines
2.4.3 Neural Networks
2.4.4 Recurrent Neural Networks
2.4.5 Monte Carlo Method
2.4.6 Q-learning and Deep Q-learning
2.4.7 Neuroevolution of Augmenting Topologies
2.5 Stocks
2.5.1 Stock Market
2.5.2 Stocks Exchanges
2.5.3 Stocks Trading
2.5.4 Stocks Price
2.5.5 Features
2.5.6 Predictability
2.5.7 Risk and Reward
2.5.8 Quantitative Trading
2.6 Previous Works
2.6.1 Reinforcement Learning
2.6.2 Supervised Learning
Chapter 3 Methodology
3.1 Experiment Environment
3.2 Data
3.2.1 Selection of Stock Market
3.3 Preprocessing and Exploratory Data Analysis
3.3.1 Missing Data
3.3.2 Feature Extraction and Domain Knowledge
3.3.3 Feature Scaling
3.3.4 Test and Training Data Selection
3.3.5 Rolling Window of Model Inputs
3.3.6 Cross Validation
3.4 Multi-Agent Autonomous Trading
3.4.1 Single Deep Q-learning Agent
3.4.2 Online Weighted Selection
3.4.3 Experience Replay
Chapter 4 Experiment
4.1 Settings of Experiments
4.1.1 Simulation of Trading
4.1.2 Parameters of Agents
4.2 Performance Measurement
4.2.1 Cumulative Return
4.2.2 Sharp Ratio
4.3 Results
4.3.1 Agent Experience
4.3.2 Other Hyperparameters and Model Settings
4.3.3 Comparison of Agents Decision Models on S&P 500 stocks
4.3.4 Comparison of Base Agents with Multi-Agent on Different Markets
Conclusion
結(jié)論
References
Papers Published in the Period of Master Education
Acknowledgements
【參考文獻(xiàn)】:
期刊論文
[1]基于粒子群優(yōu)化WNN股票預(yù)測(cè)模型的性能評(píng)估[J]. 汪志峰,錢萌. 電腦知識(shí)與技術(shù). 2019(09)
[2]變步長(zhǎng)BLSTM集成學(xué)習(xí)股票預(yù)測(cè)[J]. 王子玥,謝維波,李斌. 華僑大學(xué)學(xué)報(bào)(自然科學(xué)版). 2019(02)
[3]基于小波神經(jīng)網(wǎng)絡(luò)與ARIMA組合模型在股票預(yù)測(cè)中的應(yīng)用[J]. 楊進(jìn),陳亮. 經(jīng)濟(jì)數(shù)學(xué). 2018(02)
[4]基于支持向量機(jī)的股票預(yù)測(cè)[J]. 張晨希,張燕平,張迎春,陳潔,萬忠. 計(jì)算機(jī)技術(shù)與發(fā)展. 2006(06)
碩士論文
[1]BP神經(jīng)網(wǎng)絡(luò)在股票預(yù)測(cè)中的應(yīng)用研究[D]. 王莎.中南大學(xué) 2008
本文編號(hào):3683269
【文章頁數(shù)】:98 頁
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Motivation
1.2 Objective
1.3 Outline
Chapter 2 Background
2.1 Machine Learning
2.1.1 Sample Data
2.1.2 Supervised Learning
2.1.3 Unsupervised Learning
2.1.4 Reinforcement Learning
2.2 Preprocessing
2.2.1 Feature Selection and the Curse of Dimensionality
2.2.2 Feature Scaling
2.3 Model Evaluation
2.3.1 Regression Metrics
2.3.2 Classification Metrics
2.3.3 Reinforcement Learning Metrics
2.3.4 Underfitting
2.3.5 Overfitting
2.3.6 Cross Validation
2.3.7 Hyperparameter Tuning
2.4 Models
2.4.1 Logistic Regression
2.4.2 Support Vector Machines
2.4.3 Neural Networks
2.4.4 Recurrent Neural Networks
2.4.5 Monte Carlo Method
2.4.6 Q-learning and Deep Q-learning
2.4.7 Neuroevolution of Augmenting Topologies
2.5 Stocks
2.5.1 Stock Market
2.5.2 Stocks Exchanges
2.5.3 Stocks Trading
2.5.4 Stocks Price
2.5.5 Features
2.5.6 Predictability
2.5.7 Risk and Reward
2.5.8 Quantitative Trading
2.6 Previous Works
2.6.1 Reinforcement Learning
2.6.2 Supervised Learning
Chapter 3 Methodology
3.1 Experiment Environment
3.2 Data
3.2.1 Selection of Stock Market
3.3 Preprocessing and Exploratory Data Analysis
3.3.1 Missing Data
3.3.2 Feature Extraction and Domain Knowledge
3.3.3 Feature Scaling
3.3.4 Test and Training Data Selection
3.3.5 Rolling Window of Model Inputs
3.3.6 Cross Validation
3.4 Multi-Agent Autonomous Trading
3.4.1 Single Deep Q-learning Agent
3.4.2 Online Weighted Selection
3.4.3 Experience Replay
Chapter 4 Experiment
4.1 Settings of Experiments
4.1.1 Simulation of Trading
4.1.2 Parameters of Agents
4.2 Performance Measurement
4.2.1 Cumulative Return
4.2.2 Sharp Ratio
4.3 Results
4.3.1 Agent Experience
4.3.2 Other Hyperparameters and Model Settings
4.3.3 Comparison of Agents Decision Models on S&P 500 stocks
4.3.4 Comparison of Base Agents with Multi-Agent on Different Markets
Conclusion
結(jié)論
References
Papers Published in the Period of Master Education
Acknowledgements
【參考文獻(xiàn)】:
期刊論文
[1]基于粒子群優(yōu)化WNN股票預(yù)測(cè)模型的性能評(píng)估[J]. 汪志峰,錢萌. 電腦知識(shí)與技術(shù). 2019(09)
[2]變步長(zhǎng)BLSTM集成學(xué)習(xí)股票預(yù)測(cè)[J]. 王子玥,謝維波,李斌. 華僑大學(xué)學(xué)報(bào)(自然科學(xué)版). 2019(02)
[3]基于小波神經(jīng)網(wǎng)絡(luò)與ARIMA組合模型在股票預(yù)測(cè)中的應(yīng)用[J]. 楊進(jìn),陳亮. 經(jīng)濟(jì)數(shù)學(xué). 2018(02)
[4]基于支持向量機(jī)的股票預(yù)測(cè)[J]. 張晨希,張燕平,張迎春,陳潔,萬忠. 計(jì)算機(jī)技術(shù)與發(fā)展. 2006(06)
碩士論文
[1]BP神經(jīng)網(wǎng)絡(luò)在股票預(yù)測(cè)中的應(yīng)用研究[D]. 王莎.中南大學(xué) 2008
本文編號(hào):3683269
本文鏈接:http://sikaile.net/guanlilunwen/bankxd/3683269.html
最近更新
教材專著