基于SVDD的特征選擇方法研究及其應用
發(fā)布時間:2018-03-15 21:49
本文選題:支持向量數(shù)據(jù)描述 切入點:特征選擇 出處:《蘇州大學》2015年碩士論文 論文類型:學位論文
【摘要】:在癌癥分類問題中,基因表達數(shù)據(jù)的維數(shù)成千上萬,并且某些特征之間存在相關性。因而如何從大量的高維基因表達數(shù)據(jù)中快速提取出具有有用信息的低維數(shù)據(jù)越來越受到研究人員的關注。本文深入研究了基于支持向量數(shù)據(jù)描述(Support Vector Data Description,SVDD)的特征選擇方法,并將其應用到基因表達數(shù)據(jù)的選擇中,剔除不相關的、冗余基因,保留包含信息量多的基因,從而提高癌癥的分類性能。本文的創(chuàng)新之處在于:提出了一種基于SVDD模型的快速特征選擇算法;谥С窒蛄繑(shù)據(jù)描述的特征選擇方法已經被提出,但是其計算量較大,特征選擇時間過長。針對此問題,本文提出了一種基于支持向量數(shù)據(jù)描述的快速特征選擇算法。新方法的特征選擇是通過對SVDD形成的超球體球心方向上的能量排序來實現(xiàn),并且采用了遞歸特征消除方式來逐漸剔除掉冗余特征。在Leukemia和Colon Tumor數(shù)據(jù)集上的實驗結果表明,新方法能夠快速地進行特征選擇,且所選擇特征對后續(xù)的癌癥分類是有效的。提出了基于多SVDD模型的快速特征選擇算法。上述提到的基于SVDD的特征選擇算法,僅對一類數(shù)據(jù)進行訓練,忽略了其他類別的數(shù)據(jù),只適用于一類或者兩類數(shù)據(jù)。然而,實際生活中多類數(shù)據(jù)更為常見。針對多分類問題,本文提出了一種基于多SVDD的快速特征選擇算法。該算法對每類數(shù)據(jù)建立一個SVDD特征選擇模型,因而可以選擇出多個特征子集,最后將所選擇的特征子集融合起來,得到更有效的特征子集。在兩個兩類癌癥數(shù)據(jù)和三個多類癌癥數(shù)據(jù)集上的實驗驗證了本文方法可以選擇更具有辨別力的特征子集。
[Abstract]:In cancer classification, there are thousands of dimensions of gene expression data. Therefore, how to quickly extract low-dimensional data with useful information from a large number of high-dimensional gene expression data has attracted more and more attention of researchers. The feature selection method of support Vector Data description (SVD) is described by holding vector data. And apply it to the selection of gene expression data, remove irrelevant, redundant genes, and retain genes that contain a lot of information. In order to improve the classification performance of cancer, this paper proposes a fast feature selection algorithm based on SVDD model. The feature selection method based on support vector data description has been proposed, but its computation is large. The feature selection time is too long. In order to solve this problem, a fast feature selection algorithm based on support vector data description is proposed in this paper. The feature selection of the new method is realized by sorting the energy in the direction of the spherical center of the hypersphere formed by SVDD. The recursive feature elimination method is used to eliminate redundant features gradually. Experimental results on Leukemia and Colon Tumor datasets show that the new method can be used to select features quickly. And the selected features are effective for the subsequent cancer classification. A fast feature selection algorithm based on multiple SVDD model is proposed. The feature selection algorithm based on SVDD mentioned above only trains one kind of data and neglects other kinds of data. Only for one or two types of data. However, multi-class data is more common in real life. In this paper, a fast feature selection algorithm based on multiple SVDD is proposed. This algorithm establishes a SVDD feature selection model for each class of data, so that multiple feature subsets can be selected. Finally, the selected feature subsets are fused together. Experimental results on two types of cancer data and three sets of multi-class cancer data show that the proposed method can select more discriminative feature subsets.
【學位授予單位】:蘇州大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:R730.4;TP18
【參考文獻】
相關期刊論文 前1條
1 代琨;于宏毅;李青;;一種基于支持向量機的特征選擇算法[J];模式識別與人工智能;2014年05期
,本文編號:1616956
本文鏈接:http://sikaile.net/yixuelunwen/zlx/1616956.html
最近更新
教材專著