天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

銀行信用評(píng)級(jí)中的不平衡分類問(wèn)題研究

發(fā)布時(shí)間:2018-08-06 19:13
【摘要】:信用評(píng)級(jí)是一項(xiàng)銀行信用風(fēng)險(xiǎn)管理的重要內(nèi)容,是一種銀行評(píng)價(jià)客戶信譽(yù)狀況、歸還貸款能力和未來(lái)前景的方法,是一個(gè)通過(guò)挖掘客戶信息來(lái)指導(dǎo)業(yè)務(wù)的過(guò)程。在當(dāng)前大數(shù)據(jù)時(shí)代的背景下,銀行所能獲得的客戶信用數(shù)據(jù)越來(lái)越多,如何通過(guò)挖掘數(shù)據(jù)隱藏的信息從而判斷客戶信用等級(jí)是銀行面臨的至關(guān)重要的問(wèn)題。在實(shí)際的銀行信用數(shù)據(jù)集中,信用良好的客戶往往比信用不良的客戶多很多,這導(dǎo)致銀行信用評(píng)級(jí)問(wèn)題實(shí)質(zhì)上是一種不平衡分類問(wèn)題。在不平衡分類問(wèn)題中,小類樣本往往是關(guān)注的重點(diǎn),如信用評(píng)級(jí)領(lǐng)域,銀行更關(guān)注那些信用不良的客戶。因此,如何有效地區(qū)分和識(shí)別小類樣本是解決不平衡分類問(wèn)題的關(guān)鍵。機(jī)器學(xué)習(xí)算法在處理不平衡分類問(wèn)題時(shí)往往不能有效地識(shí)別小類樣本,因此如何有效地解決不平衡分類問(wèn)題是重點(diǎn)研究的工作。目前,不平衡分類問(wèn)題主要從數(shù)據(jù)層面和算法層面進(jìn)行研究。數(shù)據(jù)層面上主要采用重采樣方法來(lái)平衡數(shù)據(jù)類別分布,如隨機(jī)欠采樣方法、ROSE方法、SMOTE方法等都是典型的重采樣方法;算法層面上集成學(xué)習(xí)算法經(jīng)常被用來(lái)解決不平衡分類問(wèn)題。為了驗(yàn)證重采樣方法和集成學(xué)習(xí)算法在處理不平衡分類問(wèn)題時(shí)的有效性,本文采用四組分別來(lái)自于UCI數(shù)據(jù)庫(kù)和KEEL數(shù)據(jù)庫(kù)的不平衡率各不相同的數(shù)據(jù)集進(jìn)行仿真實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明重采樣方法和集成學(xué)習(xí)算法的確能夠有效提升分類模型對(duì)小類樣本的識(shí)別率。ROSE方法是一種人工合成數(shù)據(jù)的方法,將其權(quán)重系數(shù)進(jìn)行改進(jìn)之后與隨機(jī)欠采樣方法組合,得到隨機(jī)混合采樣(RHS,Random Hybrid Sampling)方法,之后采用經(jīng)典的AdaBoost算法作為集成學(xué)習(xí)算法,這樣就得到了RHSBoost(Random Hybrid Sampling Boosting)算法。該算法的基本思想是:首先通過(guò)隨機(jī)欠采樣方法來(lái)獲得平衡的數(shù)據(jù)集,之后借助改進(jìn)的ROSE方法來(lái)合成更多的人工數(shù)據(jù),AdaBoost算法可以更改錯(cuò)誤分類的小類樣本權(quán)重,這樣就可以達(dá)到增強(qiáng)分類器的目的。本文利用銀行信用數(shù)據(jù)集進(jìn)行實(shí)驗(yàn),在采用決策樹(shù)作為基分類算法的前提下,將RHSBoost算法與RUSBoost算法、SMOTEBoost算法、重采樣方法和集成學(xué)習(xí)算法進(jìn)行對(duì)比,證明了RHSBoost算法的可行性和優(yōu)勢(shì)。
[Abstract]:Credit rating is an important part of bank credit risk management. It is a method for banks to evaluate customers' credit status, repay loan ability and future prospects. It is a process of guiding business by mining customer information. Under the background of the current big data era, the bank can obtain more and more customer credit data. How to find out the customer credit grade by mining the hidden information is the most important problem that the bank faces. In the actual bank credit data set, the customers with good credit are often much more than those with bad credit, which leads to the problem of bank credit rating is essentially an unbalanced classification problem. In the problem of unbalanced classification, small samples are often the focus of attention, such as credit rating field, banks pay more attention to those customers with poor credit. Therefore, how to effectively distinguish and identify small samples is the key to solve the problem of unbalanced classification. Machine learning algorithms often can not effectively identify small class samples when dealing with unbalanced classification problems, so how to effectively solve the unbalanced classification problem is the focus of research work. At present, the unbalanced classification problem is mainly studied from the data level and the algorithm level. In data level, resampling method is mainly used to balance the distribution of data categories, such as random under-sampling method, rose method and SMOTE method, which are typical resampling methods, and ensemble learning algorithms are often used to solve the problem of unbalanced classification. In order to verify the validity of resampling method and ensemble learning algorithm in dealing with the problem of unbalanced classification, four groups of data sets with different unbalance rates from UCI database and KEEL database are used for simulation experiments. The experimental results show that the resampling method and the ensemble learning algorithm can effectively improve the recognition rate of the classification model for small class samples. Rose method is an artificial synthetic data method. After the weight coefficient is improved and combined with the random under-sampling method, the RHS random Hybrid Sampling) method is obtained, and then the classical AdaBoost algorithm is used as the ensemble learning algorithm, thus the RHSBoost (Random Hybrid Sampling Boosting) algorithm is obtained. The basic idea of the algorithm is: firstly, the balanced data set can be obtained by random under-sampling method, and then more artificial data can be synthesized by the improved ROSE method, and the weight of subclass samples can be changed by using the improved ROSE method. In this way, we can enhance the classifier. In this paper, the bank credit data set is used to experiment. On the premise of using the decision tree as the basic classification algorithm, the RHSBoost algorithm is compared with the RUSBoost algorithm, the resampling method and the ensemble learning algorithm. The feasibility and advantages of the RHSBoost algorithm are proved.
【學(xué)位授予單位】:廣東工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;F830.4;TP181

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 CF·趙寶良;淺說(shuō)“BPH—DC”論[J];發(fā)明與革新;2001年04期

2 王勝祥;現(xiàn)實(shí)、實(shí)踐與理論——兼談圖書(shū)館高位理論[J];黑龍江圖書(shū)館;1990年02期

3 王健庭;火信號(hào)的采集與相關(guān)修正[J];數(shù)據(jù)采集與處理;1987年02期

4 陳國(guó)階;我國(guó)東西部發(fā)展不平衡與西部開(kāi)發(fā)[J];科技導(dǎo)報(bào);1995年07期

5 王萌;施艷艷;王海明;沈明輝;;不平衡電網(wǎng)電壓下雙饋風(fēng)力發(fā)電系統(tǒng)強(qiáng)勵(lì)控制[J];測(cè)控技術(shù);2014年07期

6 漫征;;克服地區(qū)落后論的錯(cuò)誤思想[J];新聞戰(zhàn)線;1960年11期

7 ;來(lái)稿選題建議[J];青年研究;1999年01期

8 沈睿;;區(qū)域發(fā)展不平衡——不同地域中小企業(yè)信息化建設(shè)差距較大[J];每周電腦報(bào);2004年08期

9 張昕竹;用電信普遍服務(wù)政策改善經(jīng)濟(jì)發(fā)展不平衡[J];通信世界;2001年16期

10 周耘;;試論我國(guó)年鑒發(fā)展的不平衡性[J];圖書(shū)館學(xué)研究;1987年04期

相關(guān)會(huì)議論文 前5條

1 張雨石;唐麗敏;王庸凱;陳文科;;關(guān)于中日航線集裝箱運(yùn)量不平衡原因的分析[A];中國(guó)航海學(xué)會(huì)——2004年度學(xué)術(shù)交流會(huì)優(yōu)秀論文集[C];2004年

2 廖芳宇;;基于LabVIEW的三相不平衡的測(cè)量[A];2011年云南電力技術(shù)論壇論文集(入選部分)[C];2011年

3 沙鵬程;;關(guān)于西部民營(yíng)企業(yè)可持續(xù)發(fā)展的思考[A];第十四次全國(guó)回族學(xué)研討會(huì)論文匯編[C];2003年

4 張敦偉;丁博;;配電網(wǎng)三相不平衡補(bǔ)償?shù)奶接慬A];2007中國(guó)電機(jī)工程學(xué)會(huì)電力系統(tǒng)自動(dòng)化專委會(huì)供用電管理自動(dòng)化學(xué)科組(分專委會(huì))二屆三次會(huì)議論文集[C];2007年

5 王仲生;王翔;;轉(zhuǎn)子不平衡自愈監(jiān)控系統(tǒng)設(shè)計(jì)[A];第七屆全國(guó)信息獲取與處理學(xué)術(shù)會(huì)議論文集[C];2009年

相關(guān)重要報(bào)紙文章 前10條

1 本報(bào)記者 劉金松;教育最大的不公平是教育資源不平衡[N];經(jīng)濟(jì)觀察報(bào);2014年

2 程凱;解決不平衡還要靠市場(chǎng)[N];中華工商時(shí)報(bào);2005年

3 本報(bào)見(jiàn)習(xí)記者 周寧;示范小城鎮(zhèn)建設(shè)“四個(gè)不平衡”[N];經(jīng)濟(jì)信息時(shí)報(bào);2013年

4 記者 張黎明;我市治堵工作進(jìn)展不平衡[N];金華日?qǐng)?bào);2014年

5 本報(bào)記者 任s,

本文編號(hào):2168719


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/huobiyinxinglunwen/2168719.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶81d9f***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com