基于隨機(jī)森林特征選擇的貝葉斯分類模型及應(yīng)用
[Abstract]:Bayesian analysis method is a method to study uncertainty, and its uncertainty is represented by the size of probability. The classification model based on this method has the advantages of interpretability, high accuracy and so on. At present, it has been widely used in many fields. With the rapid development of China's economy, credit evaluation has gradually become one of the topics worth paying attention to. According to the characteristics of credit evaluation data, a Bayesian classification model based on stochastic forest feature selection is proposed in this paper, and the German data set in UCI database is selected for empirical analysis. The results show that: based on the idea of stochastic forest feature selection, Not only the structure of Bayesian classification model is simpler, but also the classification effect is better. The main works and innovations of this paper are as follows: (1) Random forest is an intelligent learning algorithm which can tolerate noise and is more stable. The feature selection based on this algorithm can be used to filter feature variables and delete its redundant and irrelevant feature attributes. Considering the naive Bayesian model with good classification effect, this paper constructs a naive Bayesian classification model based on stochastic forest feature selection (RF-NB). (2) in practical application, considering that the "independence hypothesis" of naive Bayes is often not valid. In order to make the model more realistic, the tree enhanced naive Bayes model can better represent the dependency between the feature attributes. Therefore, a tree enhanced naive Bayesian classification model (RF-TAN). (3) based on stochastic forest feature selection is constructed. The Bayesian classification model based on stochastic forest feature selection is applied to the guidance of German data credit evaluation. It is used to verify the classification effect of the proposed RF-NB and RF-TAN classification models, and compared with the NB model without feature selection and the TAN model without feature selection. The experimental results show that the classification effect of RF-NB and RF-TAN model is obviously better than that of NB,TAN model.
【學(xué)位授予單位】:華北水利水電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:F224
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 葉曉楓;魯亞會(huì);;基于隨機(jī)森林融合樸素貝葉斯的信用評(píng)估模型[J];數(shù)學(xué)的實(shí)踐與認(rèn)識(shí);2017年02期
2 吳信東;何進(jìn);陸汝鈐;鄭南寧;;從大數(shù)據(jù)到大知識(shí):HACE+BigKE[J];自動(dòng)化學(xué)報(bào);2016年07期
3 周美琴;陳詩旭;袁鼎榮;朱新華;;一種單位代價(jià)收益決策樹剪枝算法[J];計(jì)算機(jī)工程與科學(xué);2016年05期
4 李進(jìn);;基于隨機(jī)森林算法的綠色信貸信用風(fēng)險(xiǎn)評(píng)估研究[J];金融理論與實(shí)踐;2015年11期
5 趙煜;邵必林;邊根慶;宋丹;;面向不平衡微博數(shù)據(jù)集的轉(zhuǎn)發(fā)行為預(yù)測方法[J];計(jì)算機(jī)應(yīng)用;2015年07期
6 肖進(jìn);劉敦虎;顧新;汪壽陽;;銀行客戶信用評(píng)估動(dòng)態(tài)分類器集成選擇模型[J];管理科學(xué)學(xué)報(bào);2015年03期
7 劉敏;郎榮玲;曹永斌;;隨機(jī)森林中樹的數(shù)量[J];計(jì)算機(jī)工程與應(yīng)用;2015年05期
8 苗紅星;余建坤;;基于決策樹的ID3算法和C4.5算法的比較[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2014年15期
9 孟杰;;隨機(jī)森林模型在財(cái)務(wù)失敗預(yù)警中的應(yīng)用[J];統(tǒng)計(jì)與決策;2014年04期
10 姚明海;趙連朋;劉維學(xué);;基于特征選擇的Bagging分類算法研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2014年04期
相關(guān)碩士學(xué)位論文 前1條
1 高金玲;基于Logistic回歸的中小型企業(yè)信用評(píng)估模型研究[D];西北師范大學(xué);2014年
,本文編號(hào):2274650
本文鏈接:http://sikaile.net/jingjifazhanlunwen/2274650.html