天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

子空間聚類算法研究及應(yīng)用

發(fā)布時間:2018-09-12 13:13
【摘要】:聚類分析是數(shù)據(jù)挖掘領(lǐng)域中重要的研究手段之一,其處理方法可以概括為將一組數(shù)據(jù)集中的所有數(shù)據(jù)按照不同的相似度劃分為不同的類別或者簇的過程。由于聚類分析的無監(jiān)督學(xué)習特性,使得其在眾多領(lǐng)域得到大量的應(yīng)用,包括電子商務(wù)、生物信息、Web日志分析以及金融交易等等。但是,由于受“維度效應(yīng)”的影響,當采用傳統(tǒng)的聚類算法處理高維數(shù)據(jù)時,聚類結(jié)果的精確度和穩(wěn)定性將會大幅度降低。近年來,如何使用聚類分析的方法處理高維數(shù)據(jù)逐漸成為人工智能領(lǐng)域研究的難點和熱點之一。于是,子空間聚類的概念應(yīng)運而生,其基本思想是將原始數(shù)據(jù)所處的特征空間分割成不同的特征子集,從不同的子空間角度按照一定的規(guī)則考察每組數(shù)據(jù)劃分的意義,同時為每組數(shù)據(jù)尋找其對應(yīng)的特征子空間的過程。子空間聚類算法已經(jīng)在高維數(shù)據(jù)集上取得較為理想的結(jié)果。論文針對已有的基于隸屬度表示和基于自我表示模型這兩類子空間聚類算法存在的一些缺點與不足,按照分析已有子空間聚類算法,尋求改進方法加以創(chuàng)新的思路,提出了新的子空間聚類算法。論文的具體內(nèi)容如下:(1)已有的軟子空間聚類算法采用隨機選取數(shù)據(jù)集中的樣本點作為聚類中心點的方法,該方法存在陷入局部最優(yōu)的缺點。針對以上問題,在基于軟子空間聚類框架下,我們提出一種利用QPSO算法和梯度下降法相結(jié)合的思想優(yōu)化軟子空間聚類目標函數(shù)的子空間聚類新方法。通過QPSO算法全局尋優(yōu)的特點,求解子空間中的聚類中心點,同時,利用梯度下降法收斂速度快的性質(zhì),求解樣本點的模糊權(quán)重和隸屬度。在UCI數(shù)據(jù)集上的實驗結(jié)果表明,改進算法提高了聚類精度和聚類結(jié)果的穩(wěn)定性。(2)傳統(tǒng)的軟子空間聚類算法的目標函數(shù)基于歐氏距離,當樣本點維數(shù)過高時,易出現(xiàn)“維數(shù)災(zāi)難”的問題,從而導(dǎo)致軟子空間聚類基于歐氏距離度量函數(shù)的失效。針對單一歐式距離度量目標函數(shù)存在的問題,我們將相關(guān)熵的思想運用到目標函數(shù)中。在新的目標函數(shù)中,利用全新推導(dǎo)出的迭代更新公式求解樣本點的模糊權(quán)重和隸屬度矩陣,同時,結(jié)合QPSO算法求解聚類中心點。在UCI數(shù)據(jù)集上,我們從隨機索引,標準化互信息以及算法顯著性檢驗方面給出了實驗分析,實驗結(jié)果表明我們的算法可以獲得較好的聚類性能。(3)傳統(tǒng)的基于自我表示模型的子空間聚類算法包括稀疏表示和低秩表示這兩類主要方法,該類算法將樣本點中的誤差、噪聲等信息結(jié)合到目標函數(shù)中,通過交替方向乘子法,求解出樣本點系數(shù)表示矩陣,進一步根據(jù)系數(shù)表示矩陣構(gòu)造相似度矩陣。然而,這些方法受限于樣本點中的錯誤結(jié)構(gòu)信息需要作為先驗知識,同時,算法迭代過程耗時較大。針對以上存在的兩點問題,我們將系數(shù)仿射條件結(jié)合拉格朗日乘子法運用到子空間聚類中,將嶺回歸方程作為目標函數(shù),求解系數(shù)矩陣解析解。在流行的人臉庫上的實驗表明,我們的算法在提高了聚類精度和聚類結(jié)果穩(wěn)定性的同時,也降低了計算復(fù)雜度。
[Abstract]:Cluster analysis is one of the most important research methods in data mining. Its processing method can be summarized as the process of dividing all data in a set of data sets into different classes or clusters according to different similarities. Because of unsupervised learning characteristics of clustering analysis, it has been widely used in many fields, including electronics. Business, bioinformatics, Web log analysis, financial transactions and so on. However, due to the "dimensionality effect", when using traditional clustering algorithms to process high-dimensional data, the accuracy and stability of clustering results will be greatly reduced. In recent years, how to use clustering analysis to deal with high-dimensional data has gradually become artificial intelligence. Therefore, the concept of subspace clustering arises at the historic moment. Its basic idea is to divide the feature space of the original data into different feature subsets, examine the significance of each group of data partition according to certain rules from different subspace angles, and find the corresponding feature subspace for each group of data. Subspace clustering algorithms have achieved satisfactory results on high-dimensional datasets. In view of the shortcomings and shortcomings of the existing subspace clustering algorithms based on membership representation and self-representation model, this paper analyzes the existing subspace clustering algorithms to find innovative ways to improve them. A new subspace clustering algorithm is proposed. The main contents of this paper are as follows: (1) The existing soft subspace clustering algorithm adopts the method of randomly selecting the sample points in the data set as the clustering center points, which has the disadvantage of falling into local optimum. A new subspace clustering method based on SO algorithm and gradient descent method is proposed to optimize the objective function of soft subspace clustering.The clustering center points in subspace are solved by global optimization of QPSO algorithm.Meanwhile, the fuzzy weight and membership of sample points are solved by using the fast convergence speed of gradient descent method. The experimental results show that the improved algorithm improves the clustering accuracy and the stability of clustering results. (2) The objective function of traditional soft subspace clustering algorithm is based on Euclidean distance. When the dimension of sample points is too high, the problem of "dimension disaster" is easy to occur, which leads to the failure of soft subspace clustering based on Euclidean distance metric function. In the new objective function, the fuzzy weight and membership matrix of the sample points are solved by the new iterative update formula, and the clustering center is solved by the QPSO algorithm. Experiments on standardized mutual information and algorithm saliency test show that our algorithm can achieve better clustering performance. (3) Traditional subspace clustering algorithms based on self-representation model include sparse representation and low rank representation, which make errors and noises in sample points. The acoustical information is combined with the objective function, and the coefficient representation matrix is solved by the alternating direction multiplier method, and then the similarity matrix is constructed by the coefficient representation matrix. However, these methods are limited by the error structure information in the sample points, and the iterative process of the algorithm is time-consuming. Two-point problem, we apply the coefficient affine condition and Lagrange multiplier method to subspace clustering, and use ridge regression equation as the objective function to solve the coefficient matrix analytic solution. Heterozygosity.
【學(xué)位授予單位】:江南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13
,

本文編號:2239111

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2239111.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶34182***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com