子空間聚類分析新算法及應(yīng)用研究

發(fā)布時(shí)間：2018-01-05 21:18

本文關(guān)鍵詞：子空間聚類分析新算法及應(yīng)用研究　出處：《江南大學(xué)》2017年博士論文　論文類型：學(xué)位論文

【摘要】：高維數(shù)據(jù)普遍存在于各個(gè)領(lǐng)域,特別是進(jìn)入大數(shù)據(jù)時(shí)代,這對(duì)于傳統(tǒng)聚類算法提出了很大的挑戰(zhàn),子空間聚類算法作為有效的解決高維數(shù)據(jù)聚類問(wèn)題的有效算法吸引了研究人員的廣泛關(guān)注。近來(lái),基于稀疏表示(SR)和低秩表示(LRR)的子空間聚類算法憑借其優(yōu)良的性能成為新的研究熱點(diǎn)。本文也集中研究了基于稀疏表示和低秩表示的子空間聚類算法,對(duì)其進(jìn)行了深入研究分析,提出了相關(guān)改進(jìn)方法,提高了算法在處理具體問(wèn)題的性能。論文的主要工作如下:1.提出了一種魯棒的結(jié)構(gòu)約束低秩表示算法(RSLRR)。低秩表示算法在挖掘數(shù)據(jù)子空間結(jié)構(gòu)方法得到了成功的應(yīng)用。但是基于低秩表示的算法通常分類分離的兩個(gè)步驟,第一,通過(guò)求解秩最小化構(gòu)造親和圖;第二,利用譜聚類算法對(duì)親和圖進(jìn)行劃分得到最終的分割結(jié)果。這表示親和圖的構(gòu)造和譜聚類是相互依賴的,而傳統(tǒng)的基于低秩表示的算法是無(wú)法保證最終的結(jié)果為全局最優(yōu)解。論文提出的魯棒的結(jié)構(gòu)約束低秩表示算法通過(guò)將親和圖構(gòu)造和譜聚類結(jié)合在一個(gè)統(tǒng)一的優(yōu)化框架之內(nèi),通過(guò)聯(lián)合優(yōu)化可以同時(shí)得到數(shù)據(jù)聚類結(jié)果和數(shù)據(jù)集的低秩表示結(jié)構(gòu)信息。在多個(gè)數(shù)據(jù)集上的實(shí)驗(yàn)證明了該算法的有效性。2.提出了一種基于流形局部約束的低秩表示算法(MLCLRR)。低秩表示算法能夠有效的挖掘數(shù)據(jù)集的低維子空間結(jié)構(gòu)。但是大部分基于低秩表示的算法并沒(méi)有考慮數(shù)據(jù)集的非線性幾何結(jié)構(gòu),那么在算法處理過(guò)程中會(huì)丟失數(shù)據(jù)集的局部結(jié)構(gòu)信息和相似性信息,而這些信息對(duì)數(shù)據(jù)分析問(wèn)題也起到重要作用。為了提高低秩表示算法在此問(wèn)題上的性能,本文提出了一種基于流形局部約束的低秩表示算法,通過(guò)在在算法框架中引入數(shù)據(jù)的局部流形結(jié)構(gòu),本文提出的算法不僅能夠有效保持?jǐn)?shù)據(jù)的全局低維子空間結(jié)構(gòu),同時(shí)能夠挖掘數(shù)據(jù)的局部非線性幾何結(jié)構(gòu)信息。在不同計(jì)算機(jī)視覺(jué)任務(wù)上的實(shí)驗(yàn)表明了算法的有效性。3.提出了一種Latent Space結(jié)構(gòu)約束低秩表示算法(Lat RSLRR)。大部分已經(jīng)提出的基于稀疏表示和低秩表示的子空間聚類算法實(shí)在原始空間上對(duì)數(shù)據(jù)集進(jìn)行處理,當(dāng)原始數(shù)據(jù)集的維數(shù)較高時(shí),會(huì)大大增加算法的時(shí)間成本。本文提出了一種基于Latent Space的結(jié)構(gòu)約束低秩表示算法,通過(guò)在低維Latent Space中求解數(shù)據(jù)的低秩表示系數(shù)大大提高了計(jì)算效率。同時(shí)多數(shù)低秩表示算法采用數(shù)據(jù)集本身作為數(shù)據(jù)字典,當(dāng)數(shù)據(jù)集中含有較多噪聲和例外點(diǎn)時(shí),會(huì)嚴(yán)重影響算法最終性能,本文提出的算法通過(guò)利用矩陣恢復(fù)技術(shù)求解得到的鑒別性字典作為低秩表示的字典。子空間聚類問(wèn)題上的實(shí)驗(yàn)表明了算法的有效性。4.將半監(jiān)督學(xué)習(xí)和低秩表示進(jìn)行了有機(jī)的結(jié)合,通過(guò)將圖嵌入學(xué)習(xí)和稀疏回歸方法統(tǒng)一在一個(gè)優(yōu)化框架之中,提出了基于低秩表示的半監(jiān)督學(xué)習(xí)算法。目前,大部分基于圖的半監(jiān)督學(xué)習(xí)算法考慮了數(shù)據(jù)的局部近鄰信息,但是忽略了樣本數(shù)據(jù)的全局結(jié)構(gòu)信息。本文提出的方法通過(guò)將數(shù)據(jù)投影到低維子空間中學(xué)習(xí)得到低秩權(quán)重矩陣,在親和圖的構(gòu)造過(guò)程中充分利用數(shù)據(jù)集的已標(biāo)記樣本信息。降維過(guò)程中,算法能夠有效的保留數(shù)據(jù)集的全局結(jié)構(gòu)信息,并且學(xué)習(xí)得到的低秩權(quán)重矩陣能夠有效的降低噪聲數(shù)據(jù)對(duì)最終結(jié)果的影響。在多個(gè)數(shù)據(jù)集上的實(shí)驗(yàn)表明了該算法能夠獲得較高的分類準(zhǔn)確率。5.提出了一種熵加權(quán)遷移軟子空間聚類算法。為了獲得較高的聚類準(zhǔn)確率,傳統(tǒng)聚類算法通常需要大量歷史樣本數(shù)據(jù)的支持,這帶來(lái)的影響是:如果當(dāng)前數(shù)據(jù)采集環(huán)境中存在信息丟失或者數(shù)據(jù)之間的劃分關(guān)系不明確的情況下,這會(huì)導(dǎo)致聚類算法的失效。遷移學(xué)習(xí)對(duì)解決數(shù)據(jù)樣本不足的問(wèn)題具有很好的效果,通過(guò)利用數(shù)據(jù)集的歷史信息,本文提出了一種熵加權(quán)的軟子空間聚類算法。在多個(gè)UCI標(biāo)準(zhǔn)數(shù)據(jù)集和高維基因表達(dá)數(shù)據(jù)集上的實(shí)驗(yàn)表明了算法能夠充分利用數(shù)據(jù)集的歷史信息彌補(bǔ)當(dāng)前數(shù)據(jù)樣本量不足的缺點(diǎn),提高聚類算法的準(zhǔn)確率。
[Abstract]:High dimensional data exists in various fields, especially in the era of big data, it is a big challenge to the traditional clustering algorithm, subspace clustering algorithm is an effective algorithm effectively solve the clustering problem of high dimensional data has attracted wide attention from researchers. In recent years, based on sparse representation (SR) and low rank (LRR) subspace clustering algorithm with its excellent performance has become a new research topic. This paper also concentrated on the sparse subspace clustering algorithm and low rank based on the in-depth research and analysis, put forward relevant improvement methods, improve the performance of the algorithm in dealing with specific problems. The main work of this paper the structure of the thesis are as follows: 1. a robust low rank constraint representation algorithm (RSLRR). The low rank representation algorithm in data mining subspace structure method has been successfully used. But based on low rank representation The two step, the classification algorithm is usually separated by solving the first rank minimization tectonic Affinity Diagram; second, using spectral clustering algorithm to classify the affinity graph to get the final segmentation result. This indicates the affinity graph structure and spectral clustering are interdependent, and the traditional algorithm based on low rank representation is not guaranteed the final result is the global optimal solution. The structure of the proposed robust low rank constraint representation algorithm by affinity graph structure and spectral clustering combination within a unified optimization framework, through the joint optimization can be obtained simultaneously low rank data clustering results and data sets representing structural information. On multiple data sets the experiment proved that.2. the effectiveness of the algorithm this paper proposes a new algorithm based on low rank manifold local constraints (MLCLRR). The low rank representation algorithm to a low dimensional subspace of data mining in the effective structure. But most based on low rank representation algorithm does not consider the nonlinear geometric structure of the data set, then the local structure information in the algorithm process lost data set and the similarity information, and the information of data analysis problems also play an important role. In order to improve the performance of low rank representation algorithm on this problem in this paper. A low rank manifold representation algorithm based on local constraints, through the introduction of data in the local manifold structure in the algorithm framework, the proposed algorithm can not only effectively maintain the data of the global low dimensional space structure, at the same time to local nonlinear geometric structure information of data mining. In different computer vision tasks on the experiment the.3. algorithm presents a Latent Space constraint structure low rank representation algorithm (Lat RSLRR). Most have been proposed based on sparse representation and Subspace clustering algorithm of low rank representation is the original space to deal with the data set, when the high dimension of the original data set, the algorithm will greatly increase the cost of time. This paper presents a structural constraint Latent low rank representation algorithm based on Space, by Latent Space in the low dimensional representation of data in low rank solution the coefficient of the computational efficiency is greatly improved. At the same time, the majority of low rank representation algorithm using the data set itself as the data dictionary, when the data set contains more noise and exceptional point, will seriously affect the final performance of the algorithm, this algorithm through the identification of the dictionary is obtained by using matrix recovery technology as a low rank representation of the subspace dictionary. The problem of clustering experiments show the effectiveness of the.4. algorithm of semi supervised learning and low rank representation for the organic combination of the graph embedding learning and sparse regression method In a unified optimization framework, proposes a semi supervised learning algorithm based on low rank representation. At present, most of the semi supervised learning algorithm based on graph considering local neighbor information of the data, but ignore the global structure information of the sample data. The method proposed in this paper by projecting the data onto a low dimensional subspace learning low rank weight matrix, full data set of labeled samples in the construction process of information using the affinity graph. In process of reduction, the algorithm can effectively preserve the global structure information data set, the low rank weight matrix and learning can effectively reduce the effect of noise data on the final result. In multiple data the set of experiments show that the algorithm can achieve higher classification accuracy.5. an entropy weighted migration soft subspace clustering algorithm is proposed. In order to obtain a higher clustering accuracy, the traditional Clustering algorithms usually need a large number of historical data, the impact of this is: if the relationship between the division of information loss current data acquisition environment or data uncertainty, which causes the failure of clustering algorithm. Transfer learning has good effect on solving the problem of insufficient data, through the use of data in the history of information, this paper proposes a soft subspace clustering algorithm for weighted entropy. Experiments on the data sets show that the algorithm can make full use of the historical data set to make up for the current lack of sample information disadvantages expressed in multiple UCI data sets and high dimension gene, to improve the accuracy of clustering algorithm.

【學(xué)位授予單位】：江南大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 張濤;唐振民;呂建勇;;一種基于低秩表示的子空間聚類改進(jìn)算法[J];電子與信息學(xué)報(bào);2016年11期

2 許凱;吳小俊;尹賀峰;;基于分布式低秩表示的子空間聚類算法[J];計(jì)算機(jī)研究與發(fā)展;2016年07期

3 劉展杰;陳曉云;;局部子空間聚類[J];自動(dòng)化學(xué)報(bào);2016年08期

4 王衛(wèi)衛(wèi);李小平;馮象初;王斯琪;;稀疏子空間聚類綜述[J];自動(dòng)化學(xué)報(bào);2015年08期

5 許凱;吳小俊;;基于重建系數(shù)的子空間聚類融合算法[J];計(jì)算機(jī)應(yīng)用研究;2015年11期

6 舒振球;趙春霞;張浩峰;;基于監(jiān)督學(xué)習(xí)的稀疏編碼及在數(shù)據(jù)表示中的應(yīng)用[J];控制與決策;2014年06期

7 王駿;王士同;鄧趙紅;;聚類分析研究中的若干問(wèn)題[J];控制與決策;2012年03期

8 陳黎飛;郭躬德;姜青山;;自適應(yīng)的軟子空間聚類算法[J];軟件學(xué)報(bào);2010年10期

，

本文編號(hào)：1384872

資料下載