基于反投影表示的腫瘤基因表達譜數(shù)據(jù)分類研究
[Abstract]:With the rapid development of gene chip technology, we can obtain tumor gene expression profile data quickly and accurately. Feature selection and sample classification are two basic problems in tumor classification based on gene expression profile data. The analysis of these data provides a powerful tool for early diagnosis and molecular research. In recent years, sparse representation based tumor classification technology has attracted more and more attention. However, the classifier based on sparse representation has the following problems: (1) highly dependent on sufficient training samples; (2) ignoring the information contained in the test samples; (3) the classification instability of reconstruction errors. Moreover, it is a trend to design efficient and biological gene selection methods. In order to solve the above problems, this paper mainly researches as follows: on the one hand, a tumor classification method based on backprojection representation and class contribution rate is proposed, and the feasibility and stability of the method are proved theoretically. Firstly, by mining the information embedded in the test samples, a new backprojection representation model is constructed to reduce the influence of the number of training samples, and then, in order to match the backprojection representation model, the classification is completed. A new classification criterion, category contribution rate, and a new statistical index, classification stability index, are proposed to quantify the stability of different classification criteria. On the other hand, on the basis of the previous work, a tumor classification method combining two-stage mixed gene selection model and back-projection representation model is proposed. The first stage of the two-stage mixed gene selection method is the primary selection of the three filter methods of BW,SNR and F test. The second stage is the selection of the information gene based on the statistical Lasso method to obtain the possible pathogenic gene. Furthermore, the classification is completed by combining the back-projection representation model. In the first part of the experiment, the effectiveness of the backprojection representation for the small sample problem is first verified, and then the stability of the classification criterion based on the category contribution rate is verified by using the classification stability index. Finally, the robustness of the classification method is tested. For the second work, the necessity of gene selection and the feasibility of Lasso are given. Then the effectiveness of the two-stage hybrid gene selection method is verified by the visual projection map based on principal component analysis (PCA) and classification performance in different stages. It is worth mentioning that the candidate pathogenic genes were further selected and biologically analyzed by this method.
【學位授予單位】:河南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R73-3
【參考文獻】
相關期刊論文 前6條
1 張靖;胡學鋼;李培培;張玉紅;;基于迭代Lasso的腫瘤分類信息基因選擇方法研究[J];模式識別與人工智能;2014年01期
2 張秀秀;王慧;田雙雙;喬楠;閆麗娜;王彤;;高維數(shù)據(jù)回歸分析中基于LASSO的自變量選擇[J];中國衛(wèi)生統(tǒng)計;2013年06期
3 張靖;胡學鋼;張玉紅;施萬鋒;;K-split Lasso:有效的腫瘤特征基因選擇方法[J];計算機科學與探索;2012年12期
4 楊華;駱嘉偉;;基于BW ratio與二進制量子粒子群的基因選擇方法[J];微計算機信息;2011年01期
5 王樹林;王戟;陳火旺;李樹濤;張波云;;腫瘤信息基因啟發(fā)式寬度優(yōu)先搜索算法研究[J];計算機學報;2008年04期
6 李穎新;李建更;阮曉鋼;;腫瘤基因表達譜分類特征基因選取問題及分析方法研究[J];計算機學報;2006年02期
相關博士學位論文 前3條
1 陸慧娟;基于基因表達數(shù)據(jù)的腫瘤分類算法研究[D];中國礦業(yè)大學;2012年
2 于化龍;基于DNA微陣列數(shù)據(jù)的癌癥分類技術研究[D];哈爾濱工程大學;2010年
3 盧新國;基于DNA微陣列基因表達譜數(shù)據(jù)的癌癥檢測研究[D];湖南大學;2007年
相關碩士學位論文 前2條
1 于攀;基于基因表達數(shù)據(jù)的腫瘤分類方法研究[D];重慶大學;2012年
2 張秋水;支持向量機在基因表達數(shù)據(jù)中的研究[D];廈門大學;2007年
,本文編號:2257373
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/2257373.html