基于改進SMOTE非均衡支持向量機的建模與應用

發(fā)布時間：2018-08-11 11:28

【摘要】：支持向量機是機器學習算法中的一種經(jīng)典分類方法,具有分類性能好,訓練速度快的優(yōu)點,尤其在非線性分類場景下有較優(yōu)異的表現(xiàn)。支持向量機以嚴格的數(shù)學推導和堅實的統(tǒng)計學方法為理論基礎,現(xiàn)已被廣泛得應用于工業(yè)生產(chǎn),入侵檢測,醫(yī)學鑒定,用戶推薦,管理評價,決策系統(tǒng),金融征信,生物科學等領域。同時,伴隨社會經(jīng)濟發(fā)展,個人征信也逐步被提升至越發(fā)重要的位置。隨著數(shù)據(jù)挖掘技術不斷更新,基于大數(shù)據(jù)的機器學習方法也逐步替代了人工篩選的方法,在征信行業(yè)中扮演著越來越重要的角色。但是,隨著技術水平發(fā)展,數(shù)據(jù)采集、存儲的成本迅速下降,分類問題中的數(shù)據(jù)復雜性伴隨數(shù)據(jù)量的急劇提升也在不斷增加,如數(shù)據(jù)維度不斷增高、數(shù)據(jù)均衡度越發(fā)像單邊傾斜,這些改變對分類問題帶來了越來越多得挑戰(zhàn)。對于支持向量機而言,這些問題嚴重影響了經(jīng)典分類器在特定場景下的分類性能。為了應對數(shù)據(jù)量提升、實用場景更為復雜帶來的這些問題,就需要根據(jù)支持向量機的內(nèi)在特性,充分考慮非均衡數(shù)據(jù)、指標復雜性等給分類結果帶來的影響,從影響分類性能的根因出發(fā),進而才可能對經(jīng)典支持向量機有針對性地進行改進,在延續(xù)支持向量機的嚴格的理論基礎支撐的前提下,進一步提升其應用價值。本文系統(tǒng)地研究了經(jīng)典支持向量機的相關理論及其性質,針對處理支持向量機中的數(shù)據(jù)非均衡問題與解決方案建模和具體實現(xiàn)方法分別進行了討論,并提出具有自適應特性、對非均衡數(shù)據(jù)有良好抗性的改進支持向量機算法,并以小額貸款公司客戶信用風險評估為實際應用案例,經(jīng)測試,本文方法提高了潛在違約客戶的分類精度。本文的主要研究內(nèi)容如下:(1)研究模糊情況下SVM分類器的建模與應用,研究了基于區(qū)間數(shù)的SVM分類器;針對樣本中帶有區(qū)間數(shù)的情況,提出了基于超立方體定點采樣的采樣方法;給出利用二叉樹對區(qū)間數(shù)樣本進行采樣的算法。(2)分析了傳統(tǒng)SMOTE算法在處理非均衡數(shù)據(jù)時不考慮樣本本身含義的弊端,并會對整個少數(shù)類樣本進行操作的問題,在SMOTE對少數(shù)類樣本進行插值的基礎上,提出基于關鍵指標優(yōu)選的改進過采樣方法;利用區(qū)間數(shù)SVM的分類特性,改善新合成樣本的分布情況;最后給出了非均衡數(shù)據(jù)下的改進SMOTE支持向量機的完整模型與算法流程。(3)分析了在使用改進SMOTE過程中設置關鍵指標和相關參數(shù)對分類結果的影響,提出基于信息增益的優(yōu)化的SOMTE支持向量機算法。首先建立基于信息增益的超立方體頂點采樣SMOTE支持向量機,再通過優(yōu)化算法對改進后的SMOTE-SVM模型的參數(shù)進行自動尋優(yōu);進而增強了算法參數(shù)設置的合理性,提升了分類性能,并給出組合算法的具體流程。(4)研究了小額貸款公司在信用風險評估方面所面臨的實際問題,分析了其在對客戶信用評估時的劣勢;依據(jù)小額貸款公司經(jīng)營實際構建了信用風險評估指標體系;將本文提出的改進支持向量機算法應用到實際問題,并與其他經(jīng)典分類算法進行了分類綜合性能比對,并從關鍵指標出發(fā),分析了客戶違約的關鍵指標下分布情況,最后根據(jù)兩類用戶的典型特征進行了用戶畫像。
[Abstract]:Support Vector Machine (SVM) is a classical classification method in machine learning algorithm, which has the advantages of good classification performance and fast training speed, especially in non-linear classification scenarios. Based on strict mathematical deduction and solid statistical methods, SVM has been widely used in industrial production and invasion. At the same time, with the development of social economy, personal credit has gradually been promoted to a more important position. With the continuous updating of data mining technology, machine learning based on large data has gradually replaced the method of manual screening. However, with the development of technology, the cost of data acquisition and storage decreases rapidly, and the complexity of data in classification problems increases with the rapid increase of data volume. For example, the data dimension increases constantly, and the data balance becomes more and more like a one-sided tilt. These changes bring about the problem of classification. For support vector machines, these problems seriously affect the classification performance of classical classifiers in specific scenarios. In order to deal with these problems caused by increasing data volume and more complex practical scenarios, it is necessary to fully consider unbalanced data, index complexity and so on according to the inherent characteristics of support vector machines. The impact on the classification results, starting from the root of the impact on classification performance, and then it is possible to improve the classical support vector machine, in the continuation of the strict theoretical basis of support vector machine, further enhance its application value. This paper systematically studies the classical support vector machine theory and its related theory. In this paper, we discuss the problem of dealing with data imbalance in support vector machine and the method of solution modeling and implementation, and propose an improved support vector machine algorithm with self-adaptive characteristics and good resistance to imbalance data. The main research contents of this paper are as follows: (1) The modeling and application of SVM classifier in fuzzy case are studied, and the SVM classifier based on interval number is studied. For the case of interval number in the sample, a sampling method based on hypercube fixed-point sampling is proposed. (2) The disadvantage of traditional SMOTE algorithm in dealing with unbalanced data without considering the meaning of the sample itself is analyzed, and the whole minority sample is operated. Based on SMOTE interpolation for minority samples, an improved oversampling method based on key index optimization is proposed. Finally, the complete model and algorithm flow of the improved SMOTE support vector machine with unbalanced data are given. (3) The influence of setting key indicators and related parameters in the process of using the improved SMOTE on the classification results is analyzed, and an optimized SOM based on information gain is proposed. TE Support Vector Machine (SVM) algorithm. Firstly, a hypercube vertex-sampled SMOTE support vector machine based on information gain is established, and then the parameters of the improved SMOTE-SVM model are automatically optimized by an optimization algorithm. Then the rationality of the algorithm parameters setting is enhanced, and the classification performance is improved. Finally, the specific flow of the combined algorithm is given. The practical problems faced by microfinance companies in credit risk assessment are analyzed, and their disadvantages in credit assessment are analyzed; the credit risk assessment index system is constructed according to the actual operation of microfinance companies; the improved support vector machine algorithm proposed in this paper is applied to practical problems, and is carried out with other classical classification algorithms. It compares the comprehensive performance of classification, analyzes the distribution of the key indicators of customer default from the key indicators, and finally carries out user portraits according to the typical characteristics of the two types of users.
【學位授予單位】：南京航空航天大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP181;F832.4

【參考文獻】

相關期刊論文前10條

1 王鮮芳;王歲花;杜昊澤;王平;;基于模糊粗糙集和支持向量機的化工過程故障診斷[J];控制與決策;2015年02期

2 呂成戍;;基于代價敏感支持向量機的推薦系統(tǒng)托攻擊檢測方法[J];計算機工程與科學;2014年04期

3 孟慶芳;陳珊珊;陳月輝;馮志全;;基于遞歸量化分析與支持向量機的癲癇腦電自動檢測方法[J];物理學報;2014年05期

4 林宇;黃迅;徐凱;;基于RU-SMOTE-SVM的金融市場極端風險預警研究[J];預測;2013年04期

5 陶新民;郝思媛;張冬雪;李震;;基于樣本特性欠取樣的不均衡支持向量機[J];控制與決策;2013年07期

6 袁飛;詹宜巨;王永華;;區(qū)間數(shù)模糊c均值聚類中相對位置相異度的研究[J];信號處理;2012年10期

7 彭宇;羅清華;王丹;彭喜元;;基于區(qū)間數(shù)聚類的無線傳感器網(wǎng)絡定位方法[J];自動化學報;2012年07期

8 姚瀟;余樂安;;模糊近似支持向量機模型及其在信用風險評估中的應用[J];系統(tǒng)工程理論與實踐;2012年03期

9 朱明;陶新民;;基于隨機下采樣和SMOTE的不均衡SVM分類算法[J];信息技術;2012年01期

10 韓立巖;宋曉東;姚偉龍;;基于改進支持向量機的上市公司財務困境判別研究[J];管理評論;2011年05期

，

本文編號：2176868

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2176868.html

上一篇：基于逐行處理的高光譜實時異常目標檢測
下一篇：開放式數(shù)控軟PLC系統(tǒng)的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于改進SMOTE非均衡支持向量機的建模與應用