基于特征抽取和分步回歸算法的資金流入流出預(yù)測(cè)模型
本文選題:資金流預(yù)測(cè) + 特征抽取; 參考:《中國(guó)科學(xué)技術(shù)大學(xué)》2017年碩士論文
【摘要】:商業(yè)公司的金融平臺(tái)往往擁有千萬(wàn)乃至上億位服務(wù)會(huì)員,公司的金融業(yè)務(wù)場(chǎng)景每天必然會(huì)涉及大量的資金流入和流出,面對(duì)如此龐大的金融數(shù)據(jù),資金管理壓力會(huì)非常大。在既保證資金流動(dòng)性風(fēng)險(xiǎn)最小,又滿足日常業(yè)務(wù)運(yùn)轉(zhuǎn)的情況下,精準(zhǔn)地預(yù)測(cè)資金的流入流出情況顯得尤為重要。但金融數(shù)據(jù)的變動(dòng)往往受社會(huì),政治,經(jīng)濟(jì),重大事件等多方面因素影響,數(shù)據(jù)趨勢(shì)不穩(wěn)定而且包含多噪聲,給資金流量的預(yù)測(cè)帶來(lái)了困難。本文以金融平臺(tái)用戶的資金流量預(yù)測(cè)為研究背景,旨在構(gòu)建一個(gè)準(zhǔn)確、有效的資金流入流出的預(yù)測(cè)模型,以最大程度上貼近資金流量的真實(shí)值,便于資金管理。本文的主要研究?jī)?nèi)容與成果如下:1.本文針對(duì)資金流入流出數(shù)據(jù)集初始特征不明顯的特點(diǎn),利用特征抽取方法挖掘出相關(guān)特征,并采取特征選擇策略選出最優(yōu)特征子集。主要是從時(shí)間、用戶、利率三個(gè)不同角度構(gòu)造與目標(biāo)值相關(guān)的多個(gè)特征,再利用皮埃爾相關(guān)系數(shù)法進(jìn)行初步篩選出最為相關(guān)的特征。隨后用特征選擇策略進(jìn)一步篩選,剔除次相關(guān)特征和冗余特征,形成最優(yōu)特征子集。實(shí)驗(yàn)結(jié)果表明,特征抽取方法所選的特征子集對(duì)不同回歸算法的預(yù)測(cè)效果的影響不同,在最終申購(gòu)值的12列特征、贖回值的10列特征時(shí)達(dá)到最佳子集,對(duì)大多數(shù)不同的回歸算法可以得到較好的預(yù)測(cè)效果。因此可以確定此特征子集作為下一步算法預(yù)測(cè)的最優(yōu)特征子集。2.為解決數(shù)據(jù)集不穩(wěn)定,多噪聲的問(wèn)題,采用分步回歸算法對(duì)特征子集進(jìn)行訓(xùn)練學(xué)習(xí),提高回歸預(yù)測(cè)準(zhǔn)確率。本文提出的是兩步特征預(yù)測(cè)方法,即單步特征預(yù)測(cè)是運(yùn)用灰度預(yù)測(cè)、時(shí)間序列算法對(duì)未來(lái)時(shí)間的未知特征進(jìn)行預(yù)測(cè),將預(yù)測(cè)的特征添加到未來(lái)時(shí)段的已知特征子集中。隨后結(jié)合BP神經(jīng)網(wǎng)絡(luò)對(duì)所有特征集合進(jìn)行訓(xùn)練建模,得到最終的預(yù)測(cè)結(jié)果。將該算法與集成學(xué)習(xí)方法對(duì)比,運(yùn)用基于Adaboost的梯度提升回歸樹(shù)和基于Bagging的隨機(jī)森林回歸算法分別對(duì)數(shù)據(jù)集進(jìn)行訓(xùn)練。由實(shí)驗(yàn)結(jié)果分析,發(fā)現(xiàn)兩步特征預(yù)測(cè)算法較其他算法減小了預(yù)測(cè)誤差,部分算法比集成學(xué)習(xí)方法的預(yù)測(cè)效果更佳。3.本文對(duì)離散類(lèi)型的特征子集進(jìn)行one-hot稀疏編碼,考慮因子分解機(jī)算法在處理稀疏數(shù)據(jù)集時(shí)作用顯著,運(yùn)用該算法進(jìn)行回歸預(yù)測(cè)。由于因子分解機(jī)算法可以較好地表達(dá)變量間的相互作用,相當(dāng)于在原有特征變量的基礎(chǔ)上還增加了二次交叉特征,更好地刻畫(huà)數(shù)據(jù)集的特點(diǎn)。此外,因子分解機(jī)的算法復(fù)雜度不太高,且運(yùn)行效率高。實(shí)驗(yàn)表明,因子分解機(jī)算法在一定程度上可以提高資金流入流出量的預(yù)測(cè)準(zhǔn)確率。
[Abstract]:The financial platform of a commercial company often has tens of millions or even hundreds of millions of service members. The financial business scenario of the company is bound to involve a large amount of capital inflow and outflow every day. In the face of such huge financial data, the pressure of capital management will be very great. It is very important to predict the inflow and outflow of funds accurately under the condition that the liquidity risk is minimum and the daily business operation is satisfied. However, the change of financial data is often affected by social, political, economic, major events and other factors. The trend of data is unstable and contains many noises, which makes it difficult to predict the flow of funds. The purpose of this paper is to construct an accurate and effective forecasting model of capital inflow and outflow in order to get close to the real value of capital flow to the greatest extent and to facilitate capital management. The main contents and results of this paper are as follows: 1. In view of the fact that the initial features of the inflow and outflow data sets are not obvious, this paper uses the feature extraction method to find out the relevant features, and adopts the feature selection strategy to select the optimal feature subset. Several features related to the target value are constructed from three different angles of time, user and interest rate, and the most relevant features are preliminarily selected by using Pierre correlation coefficient method. Then the feature selection strategy is used to further screen the subcorrelation feature and redundant feature to form the optimal feature subset. The experimental results show that the feature subset selected by the feature extraction method has different effects on the prediction effect of different regression algorithms, and reaches the best subset when the final purchase value is 12 column feature, the redemption value is 10 column feature. Good prediction results can be obtained for most different regression algorithms. Therefore, this feature subset can be determined as the optimal feature subset. 2. In order to solve the problem of unstable and noisy data sets, stepwise regression algorithm is used to train and learn feature subsets to improve the accuracy of regression prediction. In this paper, a two-step feature prediction method is proposed, that is, single-step feature prediction is based on gray prediction, time series algorithm is used to predict unknown features of future time, and the predicted features are added to the subset of known features in the future period. Then the BP neural network is used to train and model all the feature sets, and the final prediction results are obtained. The algorithm is compared with the ensemble learning method, and the data sets are trained by using the gradient lifting regression tree based on Adaboost and the stochastic forest regression algorithm based on Bagging. By analyzing the experimental results, it is found that the two-step feature prediction algorithm reduces the prediction error compared with other algorithms, and some of the algorithms have better prediction effect than the integrated learning method. In this paper, one-hot sparse coding for discrete feature subsets is carried out, and the factor factoring algorithm is used to predict the sparse data sets. Because the factoring machine algorithm can better express the interaction between variables, it is equivalent to the addition of quadratic cross features on the basis of the original feature variables, which can better describe the characteristics of the data set. In addition, the algorithm complexity of factoring machine is not too high, and the running efficiency is high. The experimental results show that the factor factoring algorithm can improve the accuracy of the forecast of the inflow and outflow of funds to some extent.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:F224;F832.39;F724.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 徐子偉;王傳啟;王鵬;黃海;;基于分步特征提取和組合分類(lèi)器的電信客戶流失預(yù)測(cè)模型[J];微型機(jī)與應(yīng)用;2016年13期
2 王子豪;徐桂瓊;;基于高階偏差的因子分解機(jī)推薦算法[J];計(jì)算機(jī)應(yīng)用研究;2017年02期
3 高曉波;方獻(xiàn)梅;李石君;;基于因子分解機(jī)的信任感知商品推薦[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2016年01期
4 張玲;劉波;;基于殘差統(tǒng)計(jì)的時(shí)間序列加性離群點(diǎn)檢測(cè)算法研究[J];電子技術(shù)應(yīng)用;2015年09期
5 胡亞慧;李石君;余偉;楊莎;方其慶;;一種結(jié)合文化和因子分解機(jī)的快速評(píng)分預(yù)測(cè)方法[J];南京大學(xué)學(xué)報(bào)(自然科學(xué));2015年04期
6 張勇;趙曉輝;;人工神經(jīng)網(wǎng)絡(luò)結(jié)合灰度關(guān)聯(lián)分析用于吉林省地方稅收收入預(yù)測(cè)研究[J];電腦知識(shí)與技術(shù);2015年20期
7 肖蘇;熊焱;;基于灰度統(tǒng)計(jì)和神經(jīng)網(wǎng)絡(luò)的物流業(yè)稅收預(yù)測(cè)模型[J];物流技術(shù);2013年23期
8 劉斌;;淺析營(yíng)業(yè)稅改征增值稅的影響[J];哈爾濱師范大學(xué)社會(huì)科學(xué)學(xué)報(bào);2012年05期
9 謝小璐;;基于小波神經(jīng)網(wǎng)絡(luò)的Shibor預(yù)測(cè)研究[J];金融理論與實(shí)踐;2012年08期
10 車(chē)金星;王廣富;;基于粒子群最優(yōu)化下BP神經(jīng)網(wǎng)絡(luò)的短期電價(jià)預(yù)測(cè)(英文)[J];南昌工程學(xué)院學(xué)報(bào);2012年01期
相關(guān)博士學(xué)位論文 前1條
1 李紅權(quán);資本市場(chǎng)的非線性動(dòng)力學(xué)特征與風(fēng)險(xiǎn)管理研究[D];湖南大學(xué);2005年
相關(guān)碩士學(xué)位論文 前10條
1 韓莉;基于LM-BP神經(jīng)網(wǎng)絡(luò)股票預(yù)測(cè)研究[D];東北農(nóng)業(yè)大學(xué);2016年
2 徐子偉;基于分步特征選擇和組合分類(lèi)器的電信客戶流失預(yù)測(cè)模型[D];中國(guó)科學(xué)技術(shù)大學(xué);2016年
3 張方;基于BP神經(jīng)網(wǎng)絡(luò)的稅收預(yù)測(cè)研究[D];長(zhǎng)安大學(xué);2016年
4 林慶添;基于人工智能算法的上海銀行間同業(yè)拆放利率預(yù)測(cè)[D];蘭州大學(xué);2016年
5 樊?huà)?省級(jí)電網(wǎng)公司融資預(yù)測(cè)分析研究[D];華北電力大學(xué)(北京);2016年
6 師小偉;基于人工智能優(yōu)化的組合模型在銀行間拆借利率預(yù)測(cè)中的應(yīng)用研究[D];蘭州大學(xué);2014年
7 肖堅(jiān);基于隨機(jī)森林的不平衡數(shù)據(jù)分類(lèi)方法研究[D];哈爾濱工業(yè)大學(xué);2013年
8 余秋宏;基于因子分解機(jī)的社交網(wǎng)絡(luò)關(guān)系推薦研究[D];北京郵電大學(xué);2013年
9 袁小星;基于支持向量機(jī)集成的高新技術(shù)企業(yè)財(cái)務(wù)預(yù)警研究[D];哈爾濱工業(yè)大學(xué);2012年
10 彭曉;基于灰色Logistic回歸的上市公司財(cái)務(wù)困境預(yù)測(cè)研究[D];重慶理工大學(xué);2010年
,本文編號(hào):1932472
本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/1932472.html