天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動(dòng)化論文 >

基于改進(jìn)SMOTE非均衡支持向量機(jī)的建模與應(yīng)用

發(fā)布時(shí)間:2018-08-11 11:28
【摘要】:支持向量機(jī)是機(jī)器學(xué)習(xí)算法中的一種經(jīng)典分類方法,具有分類性能好,訓(xùn)練速度快的優(yōu)點(diǎn),尤其在非線性分類場景下有較優(yōu)異的表現(xiàn)。支持向量機(jī)以嚴(yán)格的數(shù)學(xué)推導(dǎo)和堅(jiān)實(shí)的統(tǒng)計(jì)學(xué)方法為理論基礎(chǔ),現(xiàn)已被廣泛得應(yīng)用于工業(yè)生產(chǎn),入侵檢測,醫(yī)學(xué)鑒定,用戶推薦,管理評(píng)價(jià),決策系統(tǒng),金融征信,生物科學(xué)等領(lǐng)域。同時(shí),伴隨社會(huì)經(jīng)濟(jì)發(fā)展,個(gè)人征信也逐步被提升至越發(fā)重要的位置。隨著數(shù)據(jù)挖掘技術(shù)不斷更新,基于大數(shù)據(jù)的機(jī)器學(xué)習(xí)方法也逐步替代了人工篩選的方法,在征信行業(yè)中扮演著越來越重要的角色。但是,隨著技術(shù)水平發(fā)展,數(shù)據(jù)采集、存儲(chǔ)的成本迅速下降,分類問題中的數(shù)據(jù)復(fù)雜性伴隨數(shù)據(jù)量的急劇提升也在不斷增加,如數(shù)據(jù)維度不斷增高、數(shù)據(jù)均衡度越發(fā)像單邊傾斜,這些改變對(duì)分類問題帶來了越來越多得挑戰(zhàn)。對(duì)于支持向量機(jī)而言,這些問題嚴(yán)重影響了經(jīng)典分類器在特定場景下的分類性能。為了應(yīng)對(duì)數(shù)據(jù)量提升、實(shí)用場景更為復(fù)雜帶來的這些問題,就需要根據(jù)支持向量機(jī)的內(nèi)在特性,充分考慮非均衡數(shù)據(jù)、指標(biāo)復(fù)雜性等給分類結(jié)果帶來的影響,從影響分類性能的根因出發(fā),進(jìn)而才可能對(duì)經(jīng)典支持向量機(jī)有針對(duì)性地進(jìn)行改進(jìn),在延續(xù)支持向量機(jī)的嚴(yán)格的理論基礎(chǔ)支撐的前提下,進(jìn)一步提升其應(yīng)用價(jià)值。本文系統(tǒng)地研究了經(jīng)典支持向量機(jī)的相關(guān)理論及其性質(zhì),針對(duì)處理支持向量機(jī)中的數(shù)據(jù)非均衡問題與解決方案建模和具體實(shí)現(xiàn)方法分別進(jìn)行了討論,并提出具有自適應(yīng)特性、對(duì)非均衡數(shù)據(jù)有良好抗性的改進(jìn)支持向量機(jī)算法,并以小額貸款公司客戶信用風(fēng)險(xiǎn)評(píng)估為實(shí)際應(yīng)用案例,經(jīng)測試,本文方法提高了潛在違約客戶的分類精度。本文的主要研究內(nèi)容如下:(1)研究模糊情況下SVM分類器的建模與應(yīng)用,研究了基于區(qū)間數(shù)的SVM分類器;針對(duì)樣本中帶有區(qū)間數(shù)的情況,提出了基于超立方體定點(diǎn)采樣的采樣方法;給出利用二叉樹對(duì)區(qū)間數(shù)樣本進(jìn)行采樣的算法。(2)分析了傳統(tǒng)SMOTE算法在處理非均衡數(shù)據(jù)時(shí)不考慮樣本本身含義的弊端,并會(huì)對(duì)整個(gè)少數(shù)類樣本進(jìn)行操作的問題,在SMOTE對(duì)少數(shù)類樣本進(jìn)行插值的基礎(chǔ)上,提出基于關(guān)鍵指標(biāo)優(yōu)選的改進(jìn)過采樣方法;利用區(qū)間數(shù)SVM的分類特性,改善新合成樣本的分布情況;最后給出了非均衡數(shù)據(jù)下的改進(jìn)SMOTE支持向量機(jī)的完整模型與算法流程。(3)分析了在使用改進(jìn)SMOTE過程中設(shè)置關(guān)鍵指標(biāo)和相關(guān)參數(shù)對(duì)分類結(jié)果的影響,提出基于信息增益的優(yōu)化的SOMTE支持向量機(jī)算法。首先建立基于信息增益的超立方體頂點(diǎn)采樣SMOTE支持向量機(jī),再通過優(yōu)化算法對(duì)改進(jìn)后的SMOTE-SVM模型的參數(shù)進(jìn)行自動(dòng)尋優(yōu);進(jìn)而增強(qiáng)了算法參數(shù)設(shè)置的合理性,提升了分類性能,并給出組合算法的具體流程。(4)研究了小額貸款公司在信用風(fēng)險(xiǎn)評(píng)估方面所面臨的實(shí)際問題,分析了其在對(duì)客戶信用評(píng)估時(shí)的劣勢;依據(jù)小額貸款公司經(jīng)營實(shí)際構(gòu)建了信用風(fēng)險(xiǎn)評(píng)估指標(biāo)體系;將本文提出的改進(jìn)支持向量機(jī)算法應(yīng)用到實(shí)際問題,并與其他經(jīng)典分類算法進(jìn)行了分類綜合性能比對(duì),并從關(guān)鍵指標(biāo)出發(fā),分析了客戶違約的關(guān)鍵指標(biāo)下分布情況,最后根據(jù)兩類用戶的典型特征進(jìn)行了用戶畫像。
[Abstract]:Support Vector Machine (SVM) is a classical classification method in machine learning algorithm, which has the advantages of good classification performance and fast training speed, especially in non-linear classification scenarios. Based on strict mathematical deduction and solid statistical methods, SVM has been widely used in industrial production and invasion. At the same time, with the development of social economy, personal credit has gradually been promoted to a more important position. With the continuous updating of data mining technology, machine learning based on large data has gradually replaced the method of manual screening. However, with the development of technology, the cost of data acquisition and storage decreases rapidly, and the complexity of data in classification problems increases with the rapid increase of data volume. For example, the data dimension increases constantly, and the data balance becomes more and more like a one-sided tilt. These changes bring about the problem of classification. For support vector machines, these problems seriously affect the classification performance of classical classifiers in specific scenarios. In order to deal with these problems caused by increasing data volume and more complex practical scenarios, it is necessary to fully consider unbalanced data, index complexity and so on according to the inherent characteristics of support vector machines. The impact on the classification results, starting from the root of the impact on classification performance, and then it is possible to improve the classical support vector machine, in the continuation of the strict theoretical basis of support vector machine, further enhance its application value. This paper systematically studies the classical support vector machine theory and its related theory. In this paper, we discuss the problem of dealing with data imbalance in support vector machine and the method of solution modeling and implementation, and propose an improved support vector machine algorithm with self-adaptive characteristics and good resistance to imbalance data. The main research contents of this paper are as follows: (1) The modeling and application of SVM classifier in fuzzy case are studied, and the SVM classifier based on interval number is studied. For the case of interval number in the sample, a sampling method based on hypercube fixed-point sampling is proposed. (2) The disadvantage of traditional SMOTE algorithm in dealing with unbalanced data without considering the meaning of the sample itself is analyzed, and the whole minority sample is operated. Based on SMOTE interpolation for minority samples, an improved oversampling method based on key index optimization is proposed. Finally, the complete model and algorithm flow of the improved SMOTE support vector machine with unbalanced data are given. (3) The influence of setting key indicators and related parameters in the process of using the improved SMOTE on the classification results is analyzed, and an optimized SOM based on information gain is proposed. TE Support Vector Machine (SVM) algorithm. Firstly, a hypercube vertex-sampled SMOTE support vector machine based on information gain is established, and then the parameters of the improved SMOTE-SVM model are automatically optimized by an optimization algorithm. Then the rationality of the algorithm parameters setting is enhanced, and the classification performance is improved. Finally, the specific flow of the combined algorithm is given. The practical problems faced by microfinance companies in credit risk assessment are analyzed, and their disadvantages in credit assessment are analyzed; the credit risk assessment index system is constructed according to the actual operation of microfinance companies; the improved support vector machine algorithm proposed in this paper is applied to practical problems, and is carried out with other classical classification algorithms. It compares the comprehensive performance of classification, analyzes the distribution of the key indicators of customer default from the key indicators, and finally carries out user portraits according to the typical characteristics of the two types of users.
【學(xué)位授予單位】:南京航空航天大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181;F832.4

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王鮮芳;王歲花;杜昊澤;王平;;基于模糊粗糙集和支持向量機(jī)的化工過程故障診斷[J];控制與決策;2015年02期

2 呂成戍;;基于代價(jià)敏感支持向量機(jī)的推薦系統(tǒng)托攻擊檢測方法[J];計(jì)算機(jī)工程與科學(xué);2014年04期

3 孟慶芳;陳珊珊;陳月輝;馮志全;;基于遞歸量化分析與支持向量機(jī)的癲癇腦電自動(dòng)檢測方法[J];物理學(xué)報(bào);2014年05期

4 林宇;黃迅;徐凱;;基于RU-SMOTE-SVM的金融市場極端風(fēng)險(xiǎn)預(yù)警研究[J];預(yù)測;2013年04期

5 陶新民;郝思媛;張冬雪;李震;;基于樣本特性欠取樣的不均衡支持向量機(jī)[J];控制與決策;2013年07期

6 袁飛;詹宜巨;王永華;;區(qū)間數(shù)模糊c均值聚類中相對(duì)位置相異度的研究[J];信號(hào)處理;2012年10期

7 彭宇;羅清華;王丹;彭喜元;;基于區(qū)間數(shù)聚類的無線傳感器網(wǎng)絡(luò)定位方法[J];自動(dòng)化學(xué)報(bào);2012年07期

8 姚瀟;余樂安;;模糊近似支持向量機(jī)模型及其在信用風(fēng)險(xiǎn)評(píng)估中的應(yīng)用[J];系統(tǒng)工程理論與實(shí)踐;2012年03期

9 朱明;陶新民;;基于隨機(jī)下采樣和SMOTE的不均衡SVM分類算法[J];信息技術(shù);2012年01期

10 韓立巖;宋曉東;姚偉龍;;基于改進(jìn)支持向量機(jī)的上市公司財(cái)務(wù)困境判別研究[J];管理評(píng)論;2011年05期

,

本文編號(hào):2176868

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2176868.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶ab688***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com