天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 經(jīng)濟(jì)論文 > 投融資論文 >

基于數(shù)據(jù)挖掘的P2P網(wǎng)貸獲貸結(jié)果影響因素及放貸決策模型研究

發(fā)布時(shí)間:2018-04-20 21:23

  本文選題:P2P網(wǎng)絡(luò)貸款 + 隨機(jī)森林模型 ; 參考:《上海師范大學(xué)》2017年碩士論文


【摘要】:P2P網(wǎng)絡(luò)貸款指的是出借人與借款人之間通過網(wǎng)絡(luò)借貸平臺(tái)而不是金融機(jī)構(gòu)產(chǎn)生的無抵押貸款。從2015年起我國(guó)的P2P網(wǎng)絡(luò)貸款發(fā)展非常迅猛,《中國(guó)P2P網(wǎng)貸行業(yè)2015年年報(bào)簡(jiǎn)報(bào)》顯示,2015年全國(guó)的P2P網(wǎng)貸平臺(tái)數(shù)量從2918家增至5121家,年度累計(jì)成交量從2014年的2528億元增加到2015年的9823.04億元。然而,截止至2017年2月,全國(guó)累計(jì)成立的5882家P2P網(wǎng)絡(luò)貸款平臺(tái)中,已有3547家平臺(tái)停業(yè)或者出現(xiàn)問題。由此可見,P2P網(wǎng)貸平臺(tái)的風(fēng)險(xiǎn)控制問題刻不容緩。本文基于P2P網(wǎng)貸平臺(tái)“好貸網(wǎng)”的真實(shí)貸款數(shù)據(jù),從申請(qǐng)者的一系列特征變量中識(shí)別出影響其獲貸結(jié)果的顯著因素,并建立了有效的放貸決策模型判別申請(qǐng)者的獲貸結(jié)果。文章具體內(nèi)容如下:數(shù)據(jù)預(yù)處理部分,將原始數(shù)據(jù)的貸款申請(qǐng)表和申請(qǐng)者信息表用SQL拼接成個(gè)人貸款分析表,通過邏輯處理刪除無效數(shù)據(jù),然后用KNN插值法對(duì)缺失值進(jìn)行插補(bǔ),再通過WOE分箱法處理離群值,最終得到3003條有效數(shù)據(jù),20個(gè)申請(qǐng)者特征變量。獲貸結(jié)果影響因素識(shí)別部分,首先通過計(jì)算20個(gè)變量的IV值篩選出對(duì)獲貸結(jié)果顯著的14個(gè)變量,接著用隨機(jī)森林模型計(jì)算每個(gè)顯著變量的Gini值平均減少量,平均減少量越大的變量對(duì)獲貸結(jié)果的影響越大。結(jié)果發(fā)現(xiàn),對(duì)獲貸結(jié)果影響最大的因素是申請(qǐng)者以往信用記錄,其次是其職業(yè)和資產(chǎn)情況,最后是貸款額度和貸款期限,而性別和婚姻狀況等個(gè)人基本特征的影響非常小。通過成敗比進(jìn)一步識(shí)別各因素對(duì)獲貸結(jié)果影響的具體方向和大小,發(fā)現(xiàn)有信用卡比沒有信用卡的獲貸的成功率高20倍,單卡最高額度、開卡時(shí)間、工資、工作年限、文化程度都與獲貸成功率顯著成正比。放貸決策模型建立部分,本文選用最常見的6種模型:統(tǒng)計(jì)模型中的Logistic回歸模型、非統(tǒng)計(jì)模型中的SVM模型和神經(jīng)網(wǎng)絡(luò)模型、組合模型中的AdaBoost模型、GDBT模型、XGBoost模型。首先對(duì)申請(qǐng)者用K-means聚類法進(jìn)行分類,總結(jié)每類申請(qǐng)者的特征,再對(duì)每類申請(qǐng)者單獨(dú)建立模型并將每類申請(qǐng)者的模型預(yù)測(cè)結(jié)果匯總,將匯總結(jié)果與未分類前所建立的模型結(jié)果進(jìn)行對(duì)比,發(fā)現(xiàn)聚類后的模型準(zhǔn)確度、靈敏度、特異性分別有3.31%、17.39%、11.05%的顯著提高,這意味著聚類后的模型與未聚類相比能為P2P網(wǎng)貸平臺(tái)增加17.39%的業(yè)務(wù),降低11.05%的錯(cuò)判風(fēng)險(xiǎn)。從而得到如下結(jié)論:不同申請(qǐng)者之間存在較大差異,對(duì)申請(qǐng)者整體建模會(huì)忽略這些差異信息,造成模型精度下降。先用K-means聚類法將申請(qǐng)者分類再在各類申請(qǐng)者中建立模型,能顯著增強(qiáng)模型捕捉不同類申請(qǐng)者特征的能力,從而增加模型的風(fēng)險(xiǎn)控制能力。
[Abstract]:P2P network loan refers to the unsecured loan between the lender and the borrower through the network lending platform rather than the financial institution. Since 2015, the development of P2P network loans in China has been very rapid. According to the Annual report of China's P2P Network loan Industry 2015, the number of P2P network lending platforms in China increased from 2918 to 5121 in 2015. Annual cumulative turnover increased from 252.8 billion yuan in 2014 to 982.304 billion yuan in 2015. However, as of February 2017, 3547 of the country's 5882 P2P network lending platforms had been shut down or had problems. It can be seen that the risk control of P2P network loan platform is urgent. Based on the real loan data of "good loan Network", a P2P platform, this paper identifies the significant factors that affect the loan result of the applicant from a series of characteristic variables, and establishes an effective lending decision model to judge the loan result of the applicant. The content of this paper is as follows: in the part of data preprocessing, the loan application form and applicant information table of original data are spliced into personal loan analysis table by SQL, and the invalid data is deleted by logical processing, and then the missing value is interpolated by KNN interpolation method. Finally, 3003 valid data and 20 applicant characteristic variables were obtained by using WOE subgroup method to deal with outliers. In the identification part of the factors influencing the loan result, the 14 variables that are significant to the loan result are screened out by calculating the IV value of 20 variables, and then the average reduction of the Gini value of each significant variable is calculated by using the stochastic forest model. The larger the average reduction, the greater the effect on the loan result. The results show that the most important factors affecting the loan result are the applicant's previous credit record, his occupation and assets, the loan quota and the loan term, and the influence of the personal basic characteristics such as gender and marital status is very small. By further identifying the specific direction and magnitude of the impact of various factors on the result of the loan, it is found that the success rate of obtaining a loan with a credit card is 20 times higher than that without a credit card, the maximum amount of a single card, the time to open the card, the salary, and the number of years of work. The level of education is significantly proportional to the success rate of obtaining loans. In the part of establishing loan decision model, this paper selects the most common six models: Logistic regression model in statistical model, SVM model and neural network model in non-statistical model, AdaBoost model in combination model and XGBoost model. Firstly, the applicants are classified by K-means clustering method, and the characteristics of each type of applicants are summarized, then the model of each type of applicant is established separately and the forecast results of each type of applicant are summarized. By comparing the summary results with the results of the models established before the classification, it was found that the accuracy, sensitivity and specificity of the models were significantly improved by 3.31%, 17.39% and 11.05%, respectively. This means that compared with the unclustered model, the clustering model can increase 17.39% service for P2P network loan platform and reduce the risk of misjudgment by 11.05%. The conclusions are as follows: there are great differences among different applicants, and the model precision will be reduced because of the ignoring of the difference information in the overall modeling of the applicants. The ability of the model to capture the characteristics of different types of applicants can be significantly enhanced by using the K-means clustering method to classify applicants and then to establish a model among all kinds of applicants, thus increasing the risk control ability of the model.
【學(xué)位授予單位】:上海師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:F724.6;F832.4

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 周玉琴;張曉玫;羅璇;;基于隨機(jī)森林的P2P網(wǎng)絡(luò)借貸成功率預(yù)測(cè)研究[J];東北農(nóng)業(yè)大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2016年06期

2 杜江;李連發(fā);;商業(yè)銀行個(gè)人信用評(píng)分模型的應(yīng)用發(fā)展研究[J];現(xiàn)代商業(yè);2016年05期

3 孫權(quán);趙金濤;;基于數(shù)據(jù)挖掘的商戶風(fēng)險(xiǎn)評(píng)分方法和系統(tǒng)[J];軟件產(chǎn)業(yè)與工程;2016年01期

4 孫國(guó)瑞;華錦芝;劉思帆;楊陽;鐘亦平;張凌毅;;實(shí)時(shí)風(fēng)險(xiǎn)評(píng)估模型的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)科學(xué)與探索;2015年04期

5 吳東武;;抵押貸款、社會(huì)資本與農(nóng)戶貸款可得性的實(shí)證研究——基于電白縣農(nóng)戶的調(diào)查數(shù)據(jù)[J];當(dāng)代財(cái)經(jīng);2014年07期

6 王會(huì)娟;廖理;;中國(guó)P2P網(wǎng)絡(luò)借貸平臺(tái)信用認(rèn)證機(jī)制研究——來自“人人貸”的經(jīng)驗(yàn)證據(jù)[J];中國(guó)工業(yè)經(jīng)濟(jì);2014年04期

7 柴洪峰;;金融大數(shù)據(jù)及銀行卡產(chǎn)業(yè)大數(shù)據(jù)實(shí)踐[J];上海金融;2013年10期

8 馮果;蔣莎莎;;論我國(guó)P2P網(wǎng)絡(luò)貸款平臺(tái)的異化及其監(jiān)管[J];法商研究;2013年05期

9 俞慶生;;基于云平臺(tái)的邏輯回歸模型構(gòu)建算法的設(shè)計(jì)與實(shí)現(xiàn)[J];科技通報(bào);2013年06期

10 錢金葉;楊飛;;中國(guó)P2P網(wǎng)絡(luò)借貸的發(fā)展現(xiàn)狀及前景[J];金融論壇;2012年01期

相關(guān)碩士學(xué)位論文 前9條

1 許江峰;數(shù)據(jù)挖掘技術(shù)在P2P網(wǎng)絡(luò)金融中的應(yīng)用研究[D];北京交通大學(xué);2016年

2 王夢(mèng)佳;基于Logistic回歸模型的P2P網(wǎng)貸平臺(tái)借款人信用風(fēng)險(xiǎn)評(píng)估[D];北京外國(guó)語大學(xué);2015年

3 楊薇薇;P2P網(wǎng)絡(luò)信貸行為及風(fēng)險(xiǎn)評(píng)估研究[D];中國(guó)海洋大學(xué);2014年

4 劉峙廷;我國(guó)P2P網(wǎng)絡(luò)信貸風(fēng)險(xiǎn)評(píng)估研究[D];廣西大學(xué);2013年

5 倪曉芬;基于P2P網(wǎng)絡(luò)借貸平臺(tái)的中小企業(yè)聯(lián)保貸款模式研究[D];華僑大學(xué);2012年

6 曾超群;基于聚類算法的數(shù)據(jù)挖掘技術(shù)的研究[D];中南大學(xué);2010年

7 楊毅;基于數(shù)據(jù)挖掘技術(shù)的信用卡信用評(píng)分模型研究[D];西北農(nóng)林科技大學(xué);2009年

8 黃麗;BP神經(jīng)網(wǎng)絡(luò)算法改進(jìn)及應(yīng)用研究[D];重慶師范大學(xué);2008年

9 陳浩;基于數(shù)據(jù)挖掘技術(shù)的信用卡申請(qǐng)?jiān)u分模型研究[D];湖南大學(xué);2007年

,

本文編號(hào):1779511

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/touziyanjiulunwen/1779511.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶218e2***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com