天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 經(jīng)濟(jì)論文 > 投融資論文 >

非均衡分類的集成學(xué)習(xí)應(yīng)用研究

發(fā)布時(shí)間:2018-07-02 19:40

  本文選題:非均衡分類 + 集成學(xué)習(xí) ; 參考:《南京信息工程大學(xué)》2017年碩士論文


【摘要】:類傾斜分布的數(shù)據(jù)集廣泛存在于現(xiàn)實(shí)世界中。在很多領(lǐng)域,針對非均衡分布的分類問題,少數(shù)類樣本被正確分類的重要程度往往高于多數(shù)類樣本被正確分類的重要程度。在類傾斜分布數(shù)據(jù)集的前提下構(gòu)建非均衡分類模型時(shí),大多經(jīng)典分類算法都是以訓(xùn)練集具有平衡的類分布或者各類樣本具有相同的誤分代價(jià)為前提建立分類模型,因此,非均衡的類分布在一定程度上造成了這些分類算法性能下降。在這種情況下,少數(shù)類樣本的信息往往被多數(shù)類樣本信息所掩蓋,導(dǎo)致來自少數(shù)類樣本的分類錯(cuò)誤率遠(yuǎn)遠(yuǎn)高于多數(shù)類樣本。因此,非均衡分類問題的研究愈發(fā)受到廣泛關(guān)注,同時(shí)也成為數(shù)據(jù)挖掘應(yīng)用領(lǐng)域的熱點(diǎn)及難點(diǎn)問題。本文在探討非均衡分類應(yīng)用問題之前,首先對非均衡分類問題研究內(nèi)容和現(xiàn)狀進(jìn)行介紹,從采樣方法、分類算法方面展開詳細(xì)的綜述。然后,根據(jù)集成學(xué)習(xí)算法在處理非均衡數(shù)據(jù)時(shí),較單分類器能夠取得更好的性能的優(yōu)點(diǎn),進(jìn)一步探討了集成學(xué)習(xí)組合方法對非均衡分類問題的處理情況,并對相關(guān)應(yīng)用進(jìn)行詳細(xì)闡述。本文基于集成學(xué)習(xí)模型對非均衡分類問題有以下兩部分應(yīng)用:第一部分,基于2014年A股滬市1000組上市公司財(cái)務(wù)數(shù)據(jù),使用基于海格林距離的隨機(jī)森林(Hellinger Distance based Random Forest, HDRF)從 ST股非均衡分類的角度對上市公司財(cái)務(wù)預(yù)警模型構(gòu)建問題進(jìn)行研究;诤A指窬嚯x的隨機(jī)森林能夠集成隨機(jī)森林的差異性以及海林格距離決策樹的傾斜不敏感特征,實(shí)驗(yàn)中選擇了傳統(tǒng)隨機(jī)森林、基于C4. 5決策樹為基分類器的Bagging、AdaBoost、旋轉(zhuǎn)森林集成分類器以及基于海林格決策樹為基分類器的相應(yīng)集成分類器作對比實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明基于海林格距離的隨機(jī)森林集成模型在處理上市公司ST股非均衡分類問題時(shí),在ROC曲線下面積及Fmeasure指標(biāo)上具有相對更好的綜合分類性能,此外海林格距離決策樹作為基分類器能夠提高集成模型非均衡分類性能。第二部分,拓展了非均衡分類模型的應(yīng)用。針對客戶關(guān)系管理領(lǐng)域的客戶保持研究,此部分重點(diǎn)討論了商業(yè)銀行客戶流失問題,將CVParameterSelection應(yīng)用于支持向量機(jī)組合核函數(shù)參數(shù)尋優(yōu),構(gòu)建了基于EasyEnsemble的Relief-SVM客戶流失預(yù)測模型,并通過商業(yè)銀行客戶資料數(shù)據(jù)研究驗(yàn)證了該模型較單一核函數(shù)EasyEnsemble-Relief-SVM模型及傳統(tǒng)C4. 5決策樹為基分類器的Bagging、AdaBoost集成分類器在AUC、Fmeasure指標(biāo)上均有所提升。因此,在參數(shù)尋優(yōu)前提下組合核函數(shù)EasyEnsemble的Relief-SVM客戶流失預(yù)測模型是一種處理商業(yè)銀行客戶流失分類預(yù)測問題的有效方法,不但能夠更準(zhǔn)確地對潛在流失客戶進(jìn)行預(yù)測,同時(shí)還兼顧客戶整體分類精度,這使得針對流失客戶開展客戶挽留決策成為可能,最終盡可能達(dá)到客戶保持的目的。最后,本文對基于集成學(xué)習(xí)的非均衡分類方法對這兩部分應(yīng)用研究進(jìn)行了總結(jié),分析不足之處并對未來的研究做了展望,希望能夠?qū)?jīng)濟(jì)管理領(lǐng)域中一些非均衡數(shù)據(jù)開展有效的知識(shí)發(fā)現(xiàn)。
[Abstract]:In many fields, in many fields, for the classification problem of unbalanced distribution, the importance of the correct classification of the minority samples is often higher than the importance of the correct classification of the majority of the samples. Most classical classification is made when the non equilibrium classification model is built on the premise of the class inclined distribution data set. The class algorithm sets up a classification model on the premise that the training set has a balanced class distribution or the same misclassification cost. Therefore, the non equilibrium class distribution causes the performance degradation of these classification algorithms. In this case, the information of the minority samples is often obscured by the majority of the sample information. The classification error rate from a few classes of samples is far higher than the majority of the samples. Therefore, the research on the disequilibrium classification problem has become more and more popular, and it has also become a hot and difficult problem in the field of data mining applications. This paper first discusses the research content and status of the non equilibrium classification problem before discussing the application of the disequilibrium classification. This paper introduces a detailed overview of the sampling method and classification algorithm. Then, according to the advantages of the integrated learning algorithm in dealing with non balanced data, a better performance can be obtained than the single classifier. The processing of the integrated learning combination method to the non equilibrium classification problem is further discussed, and the related applications are expounded in detail. Based on the integrated learning model, the following two parts are applied to the disequilibrium classification problem: in the first part, based on the financial data of the 1000 groups of Listed Companies in A shares of Shanghai stock market in 2014, Hellinger Distance based Random Forest (HDRF) is used for the financial early-warning model of listed companies from the point of view of the non equilibrium classification of ST shares. The Stochastic Forest Based on the Hailin lattice distance can integrate the discrepancy of random forest and the insensitive feature of the Hailin lattice distance decision tree. In the experiment, the traditional random forest, the Bagging, AdaBoost, the rotary forest integration classifier and the Hailin lattice decision tree based on the C4. 5 decision tree are selected as the base classifier. Compared with the corresponding ensemble classifier based on the base classifier, the experimental results show that the random forest integration model based on the Hailin lattice distance has a relatively better comprehensive classification ability on the area and Fmeasure index under the ROC curve when dealing with the non equilibrium classification problem of the listed company s t shares. The distance decision tree of the outer sea linger is used as the base classification. The device can improve the non equilibrium classification performance of the integrated model. Second, the application of the non equilibrium classification model is extended. The customer retention in the customer relationship management field is maintained. This part focuses on the problem of customer loss in commercial banks. The CVParameterSelection is applied to the optimization of the parameter of the support vector machine combination kernel function. EasyEnsemble Relief-SVM customer loss prediction model, and through the commercial bank customer data data research verified that the model compared with the single kernel function EasyEnsemble-Relief-SVM model and the traditional C4. 5 decision tree as the base classifier based Bagging, AdaBoost integrated classifier on the AUC, Fmeasure indicators have been improved. Therefore, before the optimization of parameters optimization The Relief-SVM customer churn prediction model based on the combined kernel function EasyEnsemble is an effective method to deal with the customer loss classification prediction problem in commercial banks. It can not only predict the potential lost customers more accurately, but also give consideration to the overall classification accuracy of the customers. This makes the customer retention decision for the lost customers to be made. Finally, this paper makes a summary of the two parts of the application research based on the integrated learning based non equilibrium classification method, analyzes the shortcomings and looks forward to the future research, hoping to carry out effective knowledge discovery of some non balanced data in the field of economic management.
【學(xué)位授予單位】:南京信息工程大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:F832.51;F275

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 董振波;;基于加權(quán)模糊聚類的不平衡數(shù)據(jù)分類方法[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2016年17期

2 呂洪艷;劉芳;;組合核函數(shù)SVM在特定領(lǐng)域文本分類中的應(yīng)用[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2016年05期

3 邢勝;王熙照;王曉蘭;;基于多類重采樣的非平衡數(shù)據(jù)極速學(xué)習(xí)機(jī)集成學(xué)習(xí)[J];南京大學(xué)學(xué)報(bào)(自然科學(xué));2016年01期

4 李詒靖;郭海湘;李亞楠;劉曉;;一種基于Boosting的集成學(xué)習(xí)算法在不均衡數(shù)據(jù)中的分類[J];系統(tǒng)工程理論與實(shí)踐;2016年01期

5 徐麗麗;閆德勤;;不平衡數(shù)據(jù)加權(quán)集成學(xué)習(xí)算法[J];微型機(jī)與應(yīng)用;2015年23期

6 肖進(jìn);唐靜;劉敦虎;謝玲;汪壽陽;;基于改進(jìn)GMDH的目標(biāo)客戶選擇模型研究[J];中國管理科學(xué);2015年10期

7 徐可欣;張文;王永吉;;基于統(tǒng)計(jì)抽樣的非均衡分類方法在軟件缺陷預(yù)測中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用與軟件;2015年08期

8 王瑞琦;沈韜;馬帥;郭劍毅;余正濤;;基于凸組合核函數(shù)的化合物太赫茲透射光譜分類[J];光譜學(xué)與光譜分析;2015年05期

9 陳宇;許莉薇;;基于優(yōu)化LM模糊神經(jīng)網(wǎng)絡(luò)的不均衡林業(yè)信息文本分類算法[J];中南林業(yè)科技大學(xué)學(xué)報(bào);2015年04期

10 傅清秋;謝永華;湯波;張恒德;;基于組合核函數(shù)SVM沙塵暴預(yù)警技術(shù)的研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2014年02期

相關(guān)博士學(xué)位論文 前5條

1 楊澤平;基于神經(jīng)網(wǎng)絡(luò)的不平衡數(shù)據(jù)分類方法研究[D];華東理工大學(xué);2015年

2 錢云;非均衡數(shù)據(jù)分類算法若干應(yīng)用研究[D];吉林大學(xué);2014年

3 尹留志;關(guān)于非平衡數(shù)據(jù)特征問題的研究[D];中國科學(xué)技術(shù)大學(xué);2014年

4 秦志敏;我國上市公司財(cái)務(wù)預(yù)警變量選擇研究[D];東北財(cái)經(jīng)大學(xué);2012年

5 谷瓊;面向非均衡數(shù)據(jù)集的機(jī)器學(xué)習(xí)及在地學(xué)數(shù)據(jù)處理中的應(yīng)用[D];中國地質(zhì)大學(xué);2009年

相關(guān)碩士學(xué)位論文 前5條

1 劉熙鈺;我國ST股摘帽行情及相關(guān)影響因素研究[D];西南財(cái)經(jīng)大學(xué);2016年

2 肖堅(jiān);基于隨機(jī)森林的不平衡數(shù)據(jù)分類方法研究[D];哈爾濱工業(yè)大學(xué);2013年

3 李娜;我國農(nóng)業(yè)上市公司財(cái)務(wù)預(yù)警模型研究[D];沈陽農(nóng)業(yè)大學(xué);2008年

4 王華;財(cái)務(wù)預(yù)警模型的構(gòu)建與檢驗(yàn)[D];西南財(cái)經(jīng)大學(xué);2008年

5 雷浩;數(shù)據(jù)挖掘技術(shù)在我國商業(yè)銀行CRM中的應(yīng)用研究[D];中南大學(xué);2005年



本文編號:2090707

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/touziyanjiulunwen/2090707.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶82fe2***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
色婷婷中文字幕在线视频| 国产在线日韩精品欧美| 日韩18一区二区三区| 日韩一级毛一欧美一级乱| 国产亚洲成av人在线观看| 亚洲日本中文字幕视频在线观看| 日韩日韩日韩日韩在线| 欧美亚洲综合另类色妞| 成年女人午夜在线视频| 99精品国产一区二区青青 | 最新国产欧美精品91| 国产超薄黑色肉色丝袜| 亚洲二区欧美一区二区| 激情少妇一区二区三区| 国产精品日本女优在线观看| 国产一级一片内射视频在线| 色欧美一区二区三区在线| 国产精品福利精品福利| 亚洲精品偷拍一区二区三区| 国产欧美日韩精品一区二| 两性色午夜天堂免费视频| 欧美精品日韩精品一区| 中国美女偷拍福利视频| 国产精品色热综合在线| 日韩欧美三级中文字幕| 日本不卡视频在线观看| 国产精品午夜小视频观看| 十八禁日本一区二区三区| 高清不卡视频在线观看| 午夜国产成人福利视频| 久草视频在线视频在线观看| 97人妻人人揉人人躁人人| 久久精品国产第一区二区三区| 免费人妻精品一区二区三区久久久| 国产一区二区三区免费福利| 亚洲精品av少妇在线观看| 欧美做爰猛烈叫床大尺度| 国产精品涩涩成人一区二区三区| 精产国品一二三区麻豆| 中文文精品字幕一区二区| 亚洲综合一区二区三区在线|