天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

自表達屬性選擇研究

發(fā)布時間:2018-03-14 23:24

  本文選題:數(shù)據(jù)挖掘 切入點:圖學習 出處:《廣西師范大學》2017年碩士論文 論文類型:學位論文


【摘要】:高維數(shù)據(jù)通常含有噪音以及冗余。特別是,數(shù)據(jù)的高屬性維度不僅會增加儲存空間,而且屬性維數(shù)在達到某一臨界值后,特定數(shù)據(jù)挖掘算法的性能反而下降,即所謂的“維度災難”。另一方面,由于資源所限等原因數(shù)據(jù)的類標簽在實際應用中很難獲取,因此,無監(jiān)督的屬性約簡通過降低無標簽數(shù)據(jù)的維度以解決上述問題,在數(shù)據(jù)挖掘領域具有重要意義,F(xiàn)有的屬性約簡方法可分為子空間學習和屬性選擇。子空間學習比屬性選擇更高效,但屬性選擇方法得到的結(jié)果更具有可解釋性。本文結(jié)合子空間學習和屬性選擇思想提出兩種無監(jiān)督屬性選擇方法,即從輸入的高維數(shù)據(jù)中選取有意義的屬性(也就是說去除屬性的冗余和噪音),使得輸出的低維數(shù)據(jù)既能提升數(shù)據(jù)的學習效果,又具有可解釋性。本文具體的內(nèi)容和創(chuàng)新點為:(1)基于樣本自表達方法的成功運用,本文利用屬性自表達能力,提出了一種簡單而且有效的無監(jiān)督屬性選擇框架一基于稀疏學習的魯棒自表達屬性選擇算法(SRFS算法)。具體來說,SRFS算法首先采用包含屬性自表達的損失函數(shù),將數(shù)據(jù)每個屬性用其他屬性線性表示來取得自表達系數(shù)矩陣;然后結(jié)合稀疏學習的理論(即用系數(shù)矩陣的l2,1-范數(shù)作為稀疏正則化項)取得稀疏的系數(shù)矩陣。在優(yōu)化所得的目標函數(shù)時,稀疏正則化因子導致重要的屬性對應的自表達系數(shù)值,相對于冗余屬性或者不相關(guān)屬性的值要大,以此區(qū)別屬性的重要性從而達到屬性選擇的目的。SRFS算法利用屬性自表達的方法,使得每個屬性都能被全體屬性很好的表現(xiàn)出來,不重要的屬性或噪音冗余屬性在自表達過程中被賦予很小的權(quán)重或零權(quán)重。在真實數(shù)據(jù)的模擬實驗中,使用支持向量機(SVM)作為屬性選擇的評價方法進行分類,分別作用于被SRFS方法和其他屬性約簡算法處理過的數(shù)據(jù),結(jié)果表明SRFS優(yōu)于其他對比算法。(2)傳統(tǒng)的屬性選擇方法通常不考慮屬性間的關(guān)系,如:數(shù)據(jù)的局部結(jié)構(gòu)或整體結(jié)構(gòu)。而噪聲或離群點會增加數(shù)據(jù)矩陣秩,基于以上事實,本文結(jié)合低秩約束、流形學習、超圖理論和屬性自表達在同一個框架下進行無監(jiān)督屬性選擇,即提出了“基于超圖的屬性自表達無監(jiān)督低秩屬性選擇算法”(SHLFS算法)。具體來說,SHLFS算法首先擴展上述屬性自表達理論,即將各個屬性用其他屬性來表示,然后嵌入一個低秩約束項來去除噪音和離群點的影響。此外,鑒于超圖(Hypergraph)能比一般圖捕獲更復雜的關(guān)系,SHLFS算法使用一個超圖正則化因子來考慮數(shù)據(jù)的高階關(guān)系和局部結(jié)構(gòu),且使用l2,1-范數(shù)正則化實現(xiàn)系數(shù)矩陣的稀疏性。本文進一步證明了所用的低秩約束導致SHLFS算法具有子空間學習的效果。最終,SHLFS算法既考慮了全局的數(shù)據(jù)結(jié)構(gòu)(通過低秩約束)又考慮了局部數(shù)據(jù)結(jié)構(gòu)(通過超圖正則化),而且在進行屬性選擇的同時進行了子空間學習,使得得到的屬性選擇模型既具有可解釋性且性能優(yōu)異。由于比上一方法使用了更強的約束,且考慮了數(shù)據(jù)間的關(guān)系,SHLFS算法比之前的模型更健壯。在實驗部分,使用SVM分類和k-means聚類兩種評價方法,在多類和二類數(shù)據(jù)集上進行實驗,經(jīng)多個評價指標驗證,SHLFS方法比對比屬性約簡方法具有更好的效果。本論文主要針對高維數(shù)據(jù)的特點,設計新的屬性選擇方法。具體地說,本文創(chuàng)新的使用屬性自表達來實現(xiàn)無監(jiān)督屬性選擇,另一方面使用超圖模型和低秩約束表示數(shù)據(jù)之間的高階關(guān)系,并結(jié)合稀疏學習理論給每個屬性賦予不同的權(quán)重以判別屬性的重要性。為保證設計方法的有效性,模擬實驗部分在多個公開數(shù)據(jù)集上進行,對比算法包括近幾年流行的算法和領域經(jīng)典算法,使用分類和聚類作為評價方法,分類準確率(ACC)和標準化互信息(NMI)等多個評價指標。實驗結(jié)果顯示,本文提出的方法均獲得最優(yōu)的效果。后續(xù)的工作擬探索半監(jiān)督學習和深度學習框架設計新的屬性選擇方法。
[Abstract]:High dimensional data usually contain noise and redundancy. In particular, the high dimension data will not only increase the storage space, but also in the attribute dimension reaches a critical value, the performance of data mining algorithm is decreased and the so-called "Curse of dimensionality". On the other hand, from the class to the limit of resources and other reasons the data label in the practical application is difficult to obtain, therefore, attribute reduction without supervision by reducing the dimension of the unlabeled data to solve the problem in the data mining field has important significance. The existing method of attribute reduction can be divided for subspace learning and attribute subspace learning. But more efficient than attribute selection, attribute selection the result obtained is more explicable. Based on subspace learning and feature selection proposed two unsupervised feature selection method, namely from the high dimensional data input in the selection of meaningful Attribute (i.e. attribute redundancy and noise removal), the low dimensional data output data can enhance the learning effect, and it can be explained. The specific content and innovation is: (1) using sample self expression method based on the success of this paper, using the attributes of self expression ability, put forward a a simple and effective unsupervised feature selection framework for a robust sparse learning based on self expression attribute selection algorithm (SRFS algorithm). Specifically, the SRFS algorithm uses the self expression contains property loss function, the data of each attribute with other attributes of a linear representation obtained from the expression of the coefficient matrix; then combining sparse learning theory (which uses the l2,1- norm coefficient matrix as sparse regularization) has been sparse. In the objective function optimization, sparse regularization factor resulting from the important expression of the corresponding attribute Value, relative to the redundant attributes or attribute values to be big, in order to distinguish the importance of attributes so as to achieve the purpose of the.SRFS algorithm of attribute selection method using attribute self expression, so that each attribute can be the attribute good performance out of unimportant attributes or noise redundant attribute weight was given very little or zero weight in self expression. In the simulation of real data, using support vector machine (SVM) as the evaluation method of attribute selection classification, respectively by SRFS method and attribute reduction algorithm of processed data, the results show that SRFS outperforms other algorithms. (2) compared to traditional property methods usually do not consider the relationship between attributes, such as: the local data structure or overall structure. And the noise or outlier data will increase the rank of matrix, based on the above facts, this paper combined with low rank constraint flow Learning, hypergraph theory and attribute self expression in the same framework of unsupervised feature selection, namely "self expression unsupervised low rank attribute selection algorithm based on Hypergraph attribute" (SHLFS algorithm). Specifically, the SHLFS algorithm is extended from the attribute table on each attribute is used to Darius, other attributes that is then embedded into a low rank constraint to remove the influence of noise and outliers. In addition, in view of the hypergraph (Hypergraph) can capture more complex relationship than the general graph, SHLFS algorithm uses a hypergraph regularization factor to consider higher order relational data and local structure, and the use of l2,1- norm sparse the realization of the coefficient matrix. This paper further proves that the low rank constraint for the resulting SHLFS algorithm with subspace learning effect. Finally, SHLFS algorithm considers the global data structure (by low rank constraint) and Considering the local data structure (through hypergraph regularization), and subspace learning while performing attribute selection, which makes the attribute selection model has both interpretability and excellent performance. Due to the stronger constraints than a method, and considered the number according to the relationship between the model of SHLFS algorithm is better than the before the more robust. In the experiment, using two kinds of evaluation methods of SVM classification and K-means clustering experiments in multi class and two types of data sets, multiple evaluation index verification, SHLFS method has better effect than the comparison of attribute reduction method in the thesis. According to the characteristics of high dimensional data, the design of the new the attribute selection method. Specifically, to realize unsupervised feature selection using attribute the self expression, on the other hand, using hypergraph model and low rank constraint representation of higher-order relation between data, combined with sparse learning 鐞嗚緇欐瘡涓睘鎬ц祴浜堜笉鍚岀殑鏉冮噸浠ュ垽鍒睘鎬х殑閲嶈鎬,

本文編號:1613423

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1613423.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d0d93***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com