基于改進字典學習的隱子空間聚類算法的研究
發(fā)布時間:2019-06-12 14:02
【摘要】:聚類分析作為一種數(shù)據(jù)分析的工具,是指將抽象的數(shù)據(jù)對象進行聚集而形成多個簇的分析過程,其在模式識別,機器學習,文檔檢索,數(shù)據(jù)挖掘等領(lǐng)域有著廣泛的應(yīng)用。近年來,隨著網(wǎng)絡(luò)的普及,計算機圖像技術(shù)的發(fā)展,使得行業(yè)內(nèi)新增了大量的圖像視頻數(shù)據(jù),并且伴隨著人們對視頻圖像清晰度的要求越來越高,出現(xiàn)了高達數(shù)百TB的高維度數(shù)據(jù)。大多數(shù)傳統(tǒng)聚類算法都是針對低維度的數(shù)據(jù)進行設(shè)計的,因而很難高效的處理高維度數(shù)據(jù)。子空間聚類算法作為傳統(tǒng)聚類算法的一種擴展,是處理高維度數(shù)據(jù)聚類的一種有效途徑。本文的主要研究內(nèi)容是針對基于稀疏表示的隱子空間聚類算法進行改進,進而提高算法的聚類性能,具體內(nèi)容如下:1.詳細介紹了稀疏表示模型與字典學習模型的基本原理,并分別講解了稀疏表示領(lǐng)域與字典學習領(lǐng)域的一些經(jīng)典的算法的步驟及優(yōu)缺點,包括MP,OMP,MOD,KSVD等。接著介紹了子空間聚類與譜聚類的一些背景知識,并詳細推導(dǎo)譜聚類的算法流程,為之后算法的改進奠定基礎(chǔ)。2.綜合闡述了一種基于譜聚類,稀疏表示,以及字典學習的子空間聚類算法,即隱子空間聚類算法(LSC),并詳細介紹了算法的主要思想及相關(guān)的推導(dǎo)過程。3.針對隱子空間聚類算法的訓練字典缺乏穩(wěn)定性和判別性這一缺陷,提出了一種基于判別式字典學習的隱子空間聚類算法的改進算法(ILSC)。該算法在字典學習階段利用一小部分訓練樣本的標簽信息,改進字典學習模型,除了原有的重構(gòu)誤差項外新增稀疏編碼誤差項,構(gòu)造出具有判別性的自適應(yīng)字典,使得信號的稀疏表示更加準確,進而提高算法的聚類精度。4.ILSC算法為了增強字典判別性而新增了兩個誤差項,導(dǎo)致字典學習階段的耗時也成倍增加,針對此缺陷,提出了一種基于增量式字典訓練算法的ILSC算法的改進算法I2LSC。該算法引入增量式算法的思想,每次讀取一小撮訓練數(shù)據(jù),增量式的更新字典及相應(yīng)誤差項,在保證字典判別性的同時大大縮減字典學習階段的耗時。
[Abstract]:Clustering analysis, as a tool of data analysis, refers to the analysis process in which abstract data objects are aggregated to form multiple clusters. Cluster analysis has a wide range of applications in pattern recognition, machine learning, document retrieval, data mining and other fields. In recent years, with the popularity of the network and the development of computer image technology, a large number of image and video data have been added in the industry, and with the increasing requirements for video image clarity, hundreds of TB high-dimensional data have emerged. Most of the traditional clustering algorithms are designed for low-dimensional data, so it is difficult to deal with high-dimensional data efficiently. Subspace clustering algorithm, as an extension of traditional clustering algorithm, is an effective way to deal with high-dimensional data clustering. The main research content of this paper is to improve the hidden subspace clustering algorithm based on sparse representation, and then improve the clustering performance of the algorithm. The specific contents are as follows: 1. The basic principles of sparse representation model and dictionary learning model are introduced in detail, and the steps, advantages and disadvantages of some classical algorithms in sparse representation field and dictionary learning field are explained respectively, including MP,OMP,MOD,KSVD and so on. Then some background knowledge of subspace clustering and spectral clustering is introduced, and the algorithm flow of spectral clustering is deduced in detail, which lays the foundation for the improvement of the algorithm. 2. This paper comprehensively expounds a subspace clustering algorithm based on spectral clustering, sparse representation and dictionary learning, that is, hidden subspace clustering algorithm (LSC), and introduces in detail the main idea of the algorithm and the related derivation process. In order to solve the problem that the training dictionary of hidden subspace clustering algorithm is lack of stability and discrimination, an improved hidden subspace clustering algorithm based on discriminant dictionary learning, (ILSC)., is proposed. In the dictionary learning stage, the algorithm improves the dictionary learning model by using a small part of the label information of the training samples. In addition to the original reconstruction error term, the sparse coding error term is added to construct the discriminant adaptive dictionary, which makes the sparse representation of the signal more accurate, and then improves the clustering accuracy of the algorithm. 4. ILSC algorithm adds two error items to enhance dictionary discrimination. In order to solve this problem, an improved ILSC algorithm I2LSC based on incremental dictionary training algorithm is proposed. The algorithm introduces the idea of incremental algorithm, reads a handful of training data at a time, updates the dictionary and the corresponding error items incrementally, which greatly reduces the time consuming in dictionary learning stage while ensuring dictionary discrimination.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
本文編號:2498080
[Abstract]:Clustering analysis, as a tool of data analysis, refers to the analysis process in which abstract data objects are aggregated to form multiple clusters. Cluster analysis has a wide range of applications in pattern recognition, machine learning, document retrieval, data mining and other fields. In recent years, with the popularity of the network and the development of computer image technology, a large number of image and video data have been added in the industry, and with the increasing requirements for video image clarity, hundreds of TB high-dimensional data have emerged. Most of the traditional clustering algorithms are designed for low-dimensional data, so it is difficult to deal with high-dimensional data efficiently. Subspace clustering algorithm, as an extension of traditional clustering algorithm, is an effective way to deal with high-dimensional data clustering. The main research content of this paper is to improve the hidden subspace clustering algorithm based on sparse representation, and then improve the clustering performance of the algorithm. The specific contents are as follows: 1. The basic principles of sparse representation model and dictionary learning model are introduced in detail, and the steps, advantages and disadvantages of some classical algorithms in sparse representation field and dictionary learning field are explained respectively, including MP,OMP,MOD,KSVD and so on. Then some background knowledge of subspace clustering and spectral clustering is introduced, and the algorithm flow of spectral clustering is deduced in detail, which lays the foundation for the improvement of the algorithm. 2. This paper comprehensively expounds a subspace clustering algorithm based on spectral clustering, sparse representation and dictionary learning, that is, hidden subspace clustering algorithm (LSC), and introduces in detail the main idea of the algorithm and the related derivation process. In order to solve the problem that the training dictionary of hidden subspace clustering algorithm is lack of stability and discrimination, an improved hidden subspace clustering algorithm based on discriminant dictionary learning, (ILSC)., is proposed. In the dictionary learning stage, the algorithm improves the dictionary learning model by using a small part of the label information of the training samples. In addition to the original reconstruction error term, the sparse coding error term is added to construct the discriminant adaptive dictionary, which makes the sparse representation of the signal more accurate, and then improves the clustering accuracy of the algorithm. 4. ILSC algorithm adds two error items to enhance dictionary discrimination. In order to solve this problem, an improved ILSC algorithm I2LSC based on incremental dictionary training algorithm is proposed. The algorithm introduces the idea of incremental algorithm, reads a handful of training data at a time, updates the dictionary and the corresponding error items incrementally, which greatly reduces the time consuming in dictionary learning stage while ensuring dictionary discrimination.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關(guān)期刊論文 前3條
1 李滔;王士同;;適合大規(guī)模數(shù)據(jù)集的增量式模糊聚類算法[J];智能系統(tǒng)學報;2016年02期
2 王衛(wèi)衛(wèi);李小平;馮象初;王斯琪;;稀疏子空間聚類綜述[J];自動化學報;2015年08期
3 蔡曉妍;戴冠中;楊黎斌;;譜聚類算法綜述[J];計算機科學;2008年07期
相關(guān)碩士學位論文 前6條
1 郭新海;基于稀疏表示和低秩矩陣分解的人臉識別與圖像對齊方法研究[D];北京交通大學;2015年
2 付賽男;基于特征降維的場景分類方法研究[D];上海交通大學;2013年
3 王孟月;視覺對象分類:多核多示例學習[D];中國科學技術(shù)大學;2011年
4 雷洋;壓縮感知OMP重構(gòu)算法稀疏字典中匹配原子的選擇方法[D];華南理工大學;2011年
5 趙曉娟;手寫體數(shù)字及英文字符的識別研究[D];東北師范大學;2010年
6 席秋波;基于Ncut的圖像分割算法研究[D];電子科技大學;2010年
,本文編號:2498080
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2498080.html
最近更新
教材專著