銀行信用評(píng)級(jí)中的不平衡分類問(wèn)題研究
[Abstract]:Credit rating is an important part of bank credit risk management. It is a method for banks to evaluate customers' credit status, repay loan ability and future prospects. It is a process of guiding business by mining customer information. Under the background of the current big data era, the bank can obtain more and more customer credit data. How to find out the customer credit grade by mining the hidden information is the most important problem that the bank faces. In the actual bank credit data set, the customers with good credit are often much more than those with bad credit, which leads to the problem of bank credit rating is essentially an unbalanced classification problem. In the problem of unbalanced classification, small samples are often the focus of attention, such as credit rating field, banks pay more attention to those customers with poor credit. Therefore, how to effectively distinguish and identify small samples is the key to solve the problem of unbalanced classification. Machine learning algorithms often can not effectively identify small class samples when dealing with unbalanced classification problems, so how to effectively solve the unbalanced classification problem is the focus of research work. At present, the unbalanced classification problem is mainly studied from the data level and the algorithm level. In data level, resampling method is mainly used to balance the distribution of data categories, such as random under-sampling method, rose method and SMOTE method, which are typical resampling methods, and ensemble learning algorithms are often used to solve the problem of unbalanced classification. In order to verify the validity of resampling method and ensemble learning algorithm in dealing with the problem of unbalanced classification, four groups of data sets with different unbalance rates from UCI database and KEEL database are used for simulation experiments. The experimental results show that the resampling method and the ensemble learning algorithm can effectively improve the recognition rate of the classification model for small class samples. Rose method is an artificial synthetic data method. After the weight coefficient is improved and combined with the random under-sampling method, the RHS random Hybrid Sampling) method is obtained, and then the classical AdaBoost algorithm is used as the ensemble learning algorithm, thus the RHSBoost (Random Hybrid Sampling Boosting) algorithm is obtained. The basic idea of the algorithm is: firstly, the balanced data set can be obtained by random under-sampling method, and then more artificial data can be synthesized by the improved ROSE method, and the weight of subclass samples can be changed by using the improved ROSE method. In this way, we can enhance the classifier. In this paper, the bank credit data set is used to experiment. On the premise of using the decision tree as the basic classification algorithm, the RHSBoost algorithm is compared with the RUSBoost algorithm, the resampling method and the ensemble learning algorithm. The feasibility and advantages of the RHSBoost algorithm are proved.
【學(xué)位授予單位】:廣東工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;F830.4;TP181
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 CF·趙寶良;淺說(shuō)“BPH—DC”論[J];發(fā)明與革新;2001年04期
2 王勝祥;現(xiàn)實(shí)、實(shí)踐與理論——兼談圖書(shū)館高位理論[J];黑龍江圖書(shū)館;1990年02期
3 王健庭;火信號(hào)的采集與相關(guān)修正[J];數(shù)據(jù)采集與處理;1987年02期
4 陳國(guó)階;我國(guó)東西部發(fā)展不平衡與西部開(kāi)發(fā)[J];科技導(dǎo)報(bào);1995年07期
5 王萌;施艷艷;王海明;沈明輝;;不平衡電網(wǎng)電壓下雙饋風(fēng)力發(fā)電系統(tǒng)強(qiáng)勵(lì)控制[J];測(cè)控技術(shù);2014年07期
6 漫征;;克服地區(qū)落后論的錯(cuò)誤思想[J];新聞戰(zhàn)線;1960年11期
7 ;來(lái)稿選題建議[J];青年研究;1999年01期
8 沈睿;;區(qū)域發(fā)展不平衡——不同地域中小企業(yè)信息化建設(shè)差距較大[J];每周電腦報(bào);2004年08期
9 張昕竹;用電信普遍服務(wù)政策改善經(jīng)濟(jì)發(fā)展不平衡[J];通信世界;2001年16期
10 周耘;;試論我國(guó)年鑒發(fā)展的不平衡性[J];圖書(shū)館學(xué)研究;1987年04期
相關(guān)會(huì)議論文 前5條
1 張雨石;唐麗敏;王庸凱;陳文科;;關(guān)于中日航線集裝箱運(yùn)量不平衡原因的分析[A];中國(guó)航海學(xué)會(huì)——2004年度學(xué)術(shù)交流會(huì)優(yōu)秀論文集[C];2004年
2 廖芳宇;;基于LabVIEW的三相不平衡的測(cè)量[A];2011年云南電力技術(shù)論壇論文集(入選部分)[C];2011年
3 沙鵬程;;關(guān)于西部民營(yíng)企業(yè)可持續(xù)發(fā)展的思考[A];第十四次全國(guó)回族學(xué)研討會(huì)論文匯編[C];2003年
4 張敦偉;丁博;;配電網(wǎng)三相不平衡補(bǔ)償?shù)奶接慬A];2007中國(guó)電機(jī)工程學(xué)會(huì)電力系統(tǒng)自動(dòng)化專委會(huì)供用電管理自動(dòng)化學(xué)科組(分專委會(huì))二屆三次會(huì)議論文集[C];2007年
5 王仲生;王翔;;轉(zhuǎn)子不平衡自愈監(jiān)控系統(tǒng)設(shè)計(jì)[A];第七屆全國(guó)信息獲取與處理學(xué)術(shù)會(huì)議論文集[C];2009年
相關(guān)重要報(bào)紙文章 前10條
1 本報(bào)記者 劉金松;教育最大的不公平是教育資源不平衡[N];經(jīng)濟(jì)觀察報(bào);2014年
2 程凱;解決不平衡還要靠市場(chǎng)[N];中華工商時(shí)報(bào);2005年
3 本報(bào)見(jiàn)習(xí)記者 周寧;示范小城鎮(zhèn)建設(shè)“四個(gè)不平衡”[N];經(jīng)濟(jì)信息時(shí)報(bào);2013年
4 記者 張黎明;我市治堵工作進(jìn)展不平衡[N];金華日?qǐng)?bào);2014年
5 本報(bào)記者 任s,
本文編號(hào):2168719
本文鏈接:http://sikaile.net/jingjilunwen/huobiyinxinglunwen/2168719.html