天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于數(shù)據(jù)挖掘的P2P網(wǎng)貸獲貸結果影響因素及放貸決策模型研究

發(fā)布時間:2018-04-20 21:23

  本文選題:P2P網(wǎng)絡貸款 + 隨機森林模型 ; 參考:《上海師范大學》2017年碩士論文


【摘要】:P2P網(wǎng)絡貸款指的是出借人與借款人之間通過網(wǎng)絡借貸平臺而不是金融機構產(chǎn)生的無抵押貸款。從2015年起我國的P2P網(wǎng)絡貸款發(fā)展非常迅猛,《中國P2P網(wǎng)貸行業(yè)2015年年報簡報》顯示,2015年全國的P2P網(wǎng)貸平臺數(shù)量從2918家增至5121家,年度累計成交量從2014年的2528億元增加到2015年的9823.04億元。然而,截止至2017年2月,全國累計成立的5882家P2P網(wǎng)絡貸款平臺中,已有3547家平臺停業(yè)或者出現(xiàn)問題。由此可見,P2P網(wǎng)貸平臺的風險控制問題刻不容緩。本文基于P2P網(wǎng)貸平臺“好貸網(wǎng)”的真實貸款數(shù)據(jù),從申請者的一系列特征變量中識別出影響其獲貸結果的顯著因素,并建立了有效的放貸決策模型判別申請者的獲貸結果。文章具體內(nèi)容如下:數(shù)據(jù)預處理部分,將原始數(shù)據(jù)的貸款申請表和申請者信息表用SQL拼接成個人貸款分析表,通過邏輯處理刪除無效數(shù)據(jù),然后用KNN插值法對缺失值進行插補,再通過WOE分箱法處理離群值,最終得到3003條有效數(shù)據(jù),20個申請者特征變量。獲貸結果影響因素識別部分,首先通過計算20個變量的IV值篩選出對獲貸結果顯著的14個變量,接著用隨機森林模型計算每個顯著變量的Gini值平均減少量,平均減少量越大的變量對獲貸結果的影響越大。結果發(fā)現(xiàn),對獲貸結果影響最大的因素是申請者以往信用記錄,其次是其職業(yè)和資產(chǎn)情況,最后是貸款額度和貸款期限,而性別和婚姻狀況等個人基本特征的影響非常小。通過成敗比進一步識別各因素對獲貸結果影響的具體方向和大小,發(fā)現(xiàn)有信用卡比沒有信用卡的獲貸的成功率高20倍,單卡最高額度、開卡時間、工資、工作年限、文化程度都與獲貸成功率顯著成正比。放貸決策模型建立部分,本文選用最常見的6種模型:統(tǒng)計模型中的Logistic回歸模型、非統(tǒng)計模型中的SVM模型和神經(jīng)網(wǎng)絡模型、組合模型中的AdaBoost模型、GDBT模型、XGBoost模型。首先對申請者用K-means聚類法進行分類,總結每類申請者的特征,再對每類申請者單獨建立模型并將每類申請者的模型預測結果匯總,將匯總結果與未分類前所建立的模型結果進行對比,發(fā)現(xiàn)聚類后的模型準確度、靈敏度、特異性分別有3.31%、17.39%、11.05%的顯著提高,這意味著聚類后的模型與未聚類相比能為P2P網(wǎng)貸平臺增加17.39%的業(yè)務,降低11.05%的錯判風險。從而得到如下結論:不同申請者之間存在較大差異,對申請者整體建模會忽略這些差異信息,造成模型精度下降。先用K-means聚類法將申請者分類再在各類申請者中建立模型,能顯著增強模型捕捉不同類申請者特征的能力,從而增加模型的風險控制能力。
[Abstract]:P2P network loan refers to the unsecured loan between the lender and the borrower through the network lending platform rather than the financial institution. Since 2015, the development of P2P network loans in China has been very rapid. According to the Annual report of China's P2P Network loan Industry 2015, the number of P2P network lending platforms in China increased from 2918 to 5121 in 2015. Annual cumulative turnover increased from 252.8 billion yuan in 2014 to 982.304 billion yuan in 2015. However, as of February 2017, 3547 of the country's 5882 P2P network lending platforms had been shut down or had problems. It can be seen that the risk control of P2P network loan platform is urgent. Based on the real loan data of "good loan Network", a P2P platform, this paper identifies the significant factors that affect the loan result of the applicant from a series of characteristic variables, and establishes an effective lending decision model to judge the loan result of the applicant. The content of this paper is as follows: in the part of data preprocessing, the loan application form and applicant information table of original data are spliced into personal loan analysis table by SQL, and the invalid data is deleted by logical processing, and then the missing value is interpolated by KNN interpolation method. Finally, 3003 valid data and 20 applicant characteristic variables were obtained by using WOE subgroup method to deal with outliers. In the identification part of the factors influencing the loan result, the 14 variables that are significant to the loan result are screened out by calculating the IV value of 20 variables, and then the average reduction of the Gini value of each significant variable is calculated by using the stochastic forest model. The larger the average reduction, the greater the effect on the loan result. The results show that the most important factors affecting the loan result are the applicant's previous credit record, his occupation and assets, the loan quota and the loan term, and the influence of the personal basic characteristics such as gender and marital status is very small. By further identifying the specific direction and magnitude of the impact of various factors on the result of the loan, it is found that the success rate of obtaining a loan with a credit card is 20 times higher than that without a credit card, the maximum amount of a single card, the time to open the card, the salary, and the number of years of work. The level of education is significantly proportional to the success rate of obtaining loans. In the part of establishing loan decision model, this paper selects the most common six models: Logistic regression model in statistical model, SVM model and neural network model in non-statistical model, AdaBoost model in combination model and XGBoost model. Firstly, the applicants are classified by K-means clustering method, and the characteristics of each type of applicants are summarized, then the model of each type of applicant is established separately and the forecast results of each type of applicant are summarized. By comparing the summary results with the results of the models established before the classification, it was found that the accuracy, sensitivity and specificity of the models were significantly improved by 3.31%, 17.39% and 11.05%, respectively. This means that compared with the unclustered model, the clustering model can increase 17.39% service for P2P network loan platform and reduce the risk of misjudgment by 11.05%. The conclusions are as follows: there are great differences among different applicants, and the model precision will be reduced because of the ignoring of the difference information in the overall modeling of the applicants. The ability of the model to capture the characteristics of different types of applicants can be significantly enhanced by using the K-means clustering method to classify applicants and then to establish a model among all kinds of applicants, thus increasing the risk control ability of the model.
【學位授予單位】:上海師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:F724.6;F832.4

【參考文獻】

相關期刊論文 前10條

1 周玉琴;張曉玫;羅璇;;基于隨機森林的P2P網(wǎng)絡借貸成功率預測研究[J];東北農(nóng)業(yè)大學學報(社會科學版);2016年06期

2 杜江;李連發(fā);;商業(yè)銀行個人信用評分模型的應用發(fā)展研究[J];現(xiàn)代商業(yè);2016年05期

3 孫權;趙金濤;;基于數(shù)據(jù)挖掘的商戶風險評分方法和系統(tǒng)[J];軟件產(chǎn)業(yè)與工程;2016年01期

4 孫國瑞;華錦芝;劉思帆;楊陽;鐘亦平;張凌毅;;實時風險評估模型的研究與實現(xiàn)[J];計算機科學與探索;2015年04期

5 吳東武;;抵押貸款、社會資本與農(nóng)戶貸款可得性的實證研究——基于電白縣農(nóng)戶的調(diào)查數(shù)據(jù)[J];當代財經(jīng);2014年07期

6 王會娟;廖理;;中國P2P網(wǎng)絡借貸平臺信用認證機制研究——來自“人人貸”的經(jīng)驗證據(jù)[J];中國工業(yè)經(jīng)濟;2014年04期

7 柴洪峰;;金融大數(shù)據(jù)及銀行卡產(chǎn)業(yè)大數(shù)據(jù)實踐[J];上海金融;2013年10期

8 馮果;蔣莎莎;;論我國P2P網(wǎng)絡貸款平臺的異化及其監(jiān)管[J];法商研究;2013年05期

9 俞慶生;;基于云平臺的邏輯回歸模型構建算法的設計與實現(xiàn)[J];科技通報;2013年06期

10 錢金葉;楊飛;;中國P2P網(wǎng)絡借貸的發(fā)展現(xiàn)狀及前景[J];金融論壇;2012年01期

相關碩士學位論文 前9條

1 許江峰;數(shù)據(jù)挖掘技術在P2P網(wǎng)絡金融中的應用研究[D];北京交通大學;2016年

2 王夢佳;基于Logistic回歸模型的P2P網(wǎng)貸平臺借款人信用風險評估[D];北京外國語大學;2015年

3 楊薇薇;P2P網(wǎng)絡信貸行為及風險評估研究[D];中國海洋大學;2014年

4 劉峙廷;我國P2P網(wǎng)絡信貸風險評估研究[D];廣西大學;2013年

5 倪曉芬;基于P2P網(wǎng)絡借貸平臺的中小企業(yè)聯(lián)保貸款模式研究[D];華僑大學;2012年

6 曾超群;基于聚類算法的數(shù)據(jù)挖掘技術的研究[D];中南大學;2010年

7 楊毅;基于數(shù)據(jù)挖掘技術的信用卡信用評分模型研究[D];西北農(nóng)林科技大學;2009年

8 黃麗;BP神經(jīng)網(wǎng)絡算法改進及應用研究[D];重慶師范大學;2008年

9 陳浩;基于數(shù)據(jù)挖掘技術的信用卡申請評分模型研究[D];湖南大學;2007年

,

本文編號:1779511

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/touziyanjiulunwen/1779511.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶218e2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com