譜聚類算法改進及其在個人信用評價中的應用
發(fā)布時間:2019-04-17 18:21
【摘要】:聚類分析一直以來都是機器學習與數據挖掘的一個重要研究熱點,它能夠幫助我們更加清晰地認識事物之間的聯系.近期研究熱度不斷攀升的譜聚類算法是一種新型高效的聚類分析算法,跟以往的聚類分析算法相比較,譜聚類算法適用于任何形狀數據集,能夠解決的問題更加多樣化,合理利用譜聚類算法從海量數據中提取知識,是未來的一個重要研究思路.尤其是隨著近期人工智能的關注度上升,提高算法精度,降低時間復雜度刻不容緩.本篇論文是依據獨立成分分析以及信息熵理論知識對譜聚類算法進行算法改進,充實了譜聚類算法的知識體系,為求解聚類分析中的各種問題貢獻了另一個新構思.本篇論文大部分的研究內容圍繞如下三個方面:第一,譜聚類的聚類結果隨著相似度量函數中尺度參數的改變而改變.相似度量函數描述的是樣本之間的相似度,選擇不一樣的相似度量函數甚至只是尺度參數的差異都會對最終所展現的聚類效果產生很大影響.從本文研究結果可知,良好的相似度量函數應該能夠很好展現出數據的分布特征,所以本文引入信息熵理論,通過最小化信息熵來優(yōu)化尺度參數.第二,譜聚類最終的聚類結果受到Laplacian矩陣特征向量的不同選擇方式的影響.Laplacian矩陣的本質是譜圖劃分準則松弛后的結果,在數據中的表現可以理解為對數據集進行特征提取.Laplacian矩陣對于譜聚類而言具有重要意義,但是針對Laplacian矩陣所對應的特征向量的個數選取不同對于最終的分類結果存在一定影響.改用近年來在特征提取方面表現更佳的ICA算法替代拉普拉斯矩陣是本文的一個大膽嘗試,通過理論分析以及實驗驗證可知,結合ICA算法和信息熵理論改進后的譜聚類算法能夠獲取更佳的分類結果.第三,將ICASC算法與個人信用評價體系相結合,并將其應用到消費金融領域.個人信用評價是消費金融行業(yè)對客戶分類的一個重要參考指標,能夠有效降低壞賬率,減少不必要的資金損失.本文最后將譜聚類算法與消費金融風控手段結合,經過實證得到結論,譜聚類算法能夠有效識別“壞”客戶.本學位論文的創(chuàng)新點也主要體現在上述三個方面:實現相似度量函數中尺度參數s的優(yōu)化選擇,用獨立成分分析替換拉普拉斯變換實現相似矩陣的特征提取,以及應用創(chuàng)新,將改進譜聚類算法應用到消費金融領域的風控體系.最后,提出展望.未來的研究工作重點可以放在譜聚類算法的穩(wěn)鍵性和可解釋性上,將理論與實際應用場景相結合。
[Abstract]:Cluster analysis has always been an important research hotspot in machine learning and data mining, which can help us to understand the relationship between things more clearly. Recently, spectral clustering algorithm with rising heat is a new and efficient clustering algorithm. Compared with previous clustering algorithms, spectral clustering algorithm is suitable for any shape data set and can solve more diversified problems. Reasonable use of spectral clustering algorithm to extract knowledge from massive data is an important research idea in the future. Especially with the increasing attention of artificial intelligence recently, it is urgent to improve the algorithm precision and reduce the time complexity. In this paper, the spectral clustering algorithm is improved based on independent component analysis and information entropy theory, which enriches the knowledge system of spectral clustering algorithm and provides another new idea for solving various problems in clustering analysis. Most of the research work in this paper focuses on the following three aspects: first, the clustering results of spectral clustering vary with the change of the mesoscale parameters of the similarity measure function. Similarity measure function describes the similarity between samples. Choosing different similarity measure function and even the difference of scale parameter will have a great influence on the clustering effect. From the results of this paper, it can be seen that a good similarity measure function should be able to show the distribution characteristics of the data very well, so this paper introduces the information entropy theory to optimize the scale parameters by minimizing the information entropy. Second, the final clustering results of spectral clustering are influenced by different selection modes of eigenvector of Laplacian matrix. The essence of Laplacian matrix is the result of relaxation of spectral partition criterion. The representation in the data can be understood as feature extraction from the data set. Laplacian matrix is of great significance to spectral clustering. However, the selection of the number of Eigenvectors corresponding to the Laplacian matrix has a certain impact on the final classification results. Replacing Laplacian matrix with ICA algorithm, which has better performance in feature extraction in recent years, is a bold attempt in this paper. Through theoretical analysis and experimental verification, we can see that: The improved spectral clustering algorithm combined with ICA algorithm and information entropy theory can obtain better classification results. Thirdly, the ICASC algorithm is combined with the personal credit evaluation system, and it is applied to the field of consumer finance. Personal credit evaluation is an important reference index for customer classification in consumer finance industry, which can effectively reduce the rate of bad debts and the unnecessary loss of funds. In the end of this paper, the spectral clustering algorithm is combined with the risk control method of consumer finance, and the empirical results show that the spectral clustering algorithm can effectively identify the "bad" customers. The innovations of this dissertation are mainly reflected in the above three aspects: optimizing the selection of the mesoscale parameter s of similarity measure function, replacing Laplace transform with independent component analysis to realize feature extraction of similar matrix, and innovation in application. The improved spectral clustering algorithm is applied to the risk control system in the field of consumer finance. Finally, the prospect is put forward. The emphasis of future research can be put on the stability and explicability of spectral clustering algorithm, which combines the theory with the practical application scenario.
【學位授予單位】:深圳大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:F832.4
本文編號:2459678
[Abstract]:Cluster analysis has always been an important research hotspot in machine learning and data mining, which can help us to understand the relationship between things more clearly. Recently, spectral clustering algorithm with rising heat is a new and efficient clustering algorithm. Compared with previous clustering algorithms, spectral clustering algorithm is suitable for any shape data set and can solve more diversified problems. Reasonable use of spectral clustering algorithm to extract knowledge from massive data is an important research idea in the future. Especially with the increasing attention of artificial intelligence recently, it is urgent to improve the algorithm precision and reduce the time complexity. In this paper, the spectral clustering algorithm is improved based on independent component analysis and information entropy theory, which enriches the knowledge system of spectral clustering algorithm and provides another new idea for solving various problems in clustering analysis. Most of the research work in this paper focuses on the following three aspects: first, the clustering results of spectral clustering vary with the change of the mesoscale parameters of the similarity measure function. Similarity measure function describes the similarity between samples. Choosing different similarity measure function and even the difference of scale parameter will have a great influence on the clustering effect. From the results of this paper, it can be seen that a good similarity measure function should be able to show the distribution characteristics of the data very well, so this paper introduces the information entropy theory to optimize the scale parameters by minimizing the information entropy. Second, the final clustering results of spectral clustering are influenced by different selection modes of eigenvector of Laplacian matrix. The essence of Laplacian matrix is the result of relaxation of spectral partition criterion. The representation in the data can be understood as feature extraction from the data set. Laplacian matrix is of great significance to spectral clustering. However, the selection of the number of Eigenvectors corresponding to the Laplacian matrix has a certain impact on the final classification results. Replacing Laplacian matrix with ICA algorithm, which has better performance in feature extraction in recent years, is a bold attempt in this paper. Through theoretical analysis and experimental verification, we can see that: The improved spectral clustering algorithm combined with ICA algorithm and information entropy theory can obtain better classification results. Thirdly, the ICASC algorithm is combined with the personal credit evaluation system, and it is applied to the field of consumer finance. Personal credit evaluation is an important reference index for customer classification in consumer finance industry, which can effectively reduce the rate of bad debts and the unnecessary loss of funds. In the end of this paper, the spectral clustering algorithm is combined with the risk control method of consumer finance, and the empirical results show that the spectral clustering algorithm can effectively identify the "bad" customers. The innovations of this dissertation are mainly reflected in the above three aspects: optimizing the selection of the mesoscale parameter s of similarity measure function, replacing Laplace transform with independent component analysis to realize feature extraction of similar matrix, and innovation in application. The improved spectral clustering algorithm is applied to the risk control system in the field of consumer finance. Finally, the prospect is put forward. The emphasis of future research can be put on the stability and explicability of spectral clustering algorithm, which combines the theory with the practical application scenario.
【學位授予單位】:深圳大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:F832.4
【相似文獻】
相關博士學位論文 前1條
1 呂紹高;統(tǒng)計學習中回歸與正則化譜聚類算法的研究[D];中國科學技術大學;2011年
相關碩士學位論文 前10條
1 李純;快速譜聚類算法的研究與應用[D];哈爾濱工程大學;2012年
2 董彬;一種基于主動學習的半監(jiān)督譜聚類算法研究[D];中國礦業(yè)大學;2015年
3 劉萍萍;基于特征間隙檢測簇數的譜聚類算法研究[D];南京郵電大學;2015年
4 孫承祥;雙饋型風電機組的風電場建模研究[D];華北電力大學;2015年
5 崔慧嶺;一種面向大數據的文本聚類算法[D];湖北師范大學;2016年
6 徐大海;基于分布式的譜聚類算法在虛擬社區(qū)發(fā)現上的應用研究[D];暨南大學;2016年
7 王有華;基于歸一化壓縮距離的文本譜聚類算法研究[D];貴州大學;2016年
8 張濤;基于密度估計的譜聚類算法研究與應用[D];江南大學;2016年
9 包秀娟;聚類有效性指標結構分析及應用[D];天津大學;2014年
10 周燕琴;基于改進譜聚類算法在醫(yī)學圖像中的應用研究[D];廣西師范學院;2016年
,本文編號:2459678
本文鏈接:http://sikaile.net/jingjilunwen/huobiyinxinglunwen/2459678.html
教材專著