Boosting方法在基因微陣列數(shù)據(jù)判別分析中的應(yīng)用

發(fā)布時間：2018-10-23 21:19

【摘要】：基于高通量的“微陣列(Microarray)”技術(shù)的迅速發(fā)展，給統(tǒng)計學專業(yè)人員提供了大量的微陣列數(shù)據(jù)。這類“小樣本、高維度”的資料(m＞＞n)，給傳統(tǒng)的分類判別方法帶來了前所未有的挑戰(zhàn)，Boosting方法作為集成算法中的一員，一直以其“完美”的分類能力吸引著眾多的研究者和應(yīng)用者。本研究在系統(tǒng)介紹了Boosting的基本思想，以及它的兩種算法——AdaBoost和LogitBoost的基本過程的基礎(chǔ)上，，采用這兩種Boosting算法對模擬數(shù)據(jù)和維度較低的資料建立判別預(yù)測模型，并與另兩種集成算法(Bagging和Random-Forest)和三種傳統(tǒng)判別分析方法(Fisher’s線性判別、Fisher’s二次判別和logistic回歸判別)的預(yù)測效果進行了比較。本研究根據(jù)基因微陣列數(shù)據(jù)的特殊性，對兩個網(wǎng)絡(luò)數(shù)據(jù)庫——白血病數(shù)據(jù)和乳腺癌數(shù)據(jù)進行了分析，思路如下：(1)使用FDR控制程序校正P值，以P≤0.05或P≤0.01為標準篩選基因變量，使得維度小于樣本含量，建立判別預(yù)測模型，將Boosting方法與兩種集成算法和三種傳統(tǒng)的方法相比較；(2)按照P值的排序選擇不同數(shù)目的基因預(yù)測變量，分別建立判別預(yù)測模型，考察Boosting的相對優(yōu)勢(包括預(yù)測精度和敏感性)；(3)提取主成分，作主成分判別分析，考察Boosting方法的優(yōu)勢。以上均用交叉驗證思路考察模型的預(yù)測效果和預(yù)測結(jié)果的穩(wěn)定性。本研究主要結(jié)論： 1．Boosting的總體預(yù)測效果普遍優(yōu)于Bagging、Random-Forest以及傳統(tǒng)的
[Abstract]:The rapid development of microarray (Microarray) technology based on high throughput provides a large amount of microarray data to statisticians. This kind of "small sample, high dimensional" data (m > n),) brings an unprecedented challenge to the traditional classification and discrimination methods. The Boosting method is a member of the ensemble algorithm. It has attracted many researchers and applicators for its perfect classification ability. Based on the systematic introduction of the basic idea of Boosting and the basic process of its two algorithms, AdaBoost and LogitBoost, the two Boosting algorithms are used to establish the discriminant prediction model for the simulated data and the low-dimensional data. The prediction results are compared with two other ensemble algorithms (Bagging and Random-Forest) and three traditional discriminant analysis methods (Fisher's linear discriminant, Fisher's quadratic discriminant and logistic regression discriminant). According to the particularity of gene microarray data, two network databases, leukemia data and breast cancer data, were analyzed in this study. The main ideas were as follows: (1) using FDR control program to correct P value, Using P 鈮

本文編號：2290491

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/yixuelunwen/binglixuelunwen/2290491.html

上一篇：甘露聚糖結(jié)合凝集素與臨床疾病的研究進展
下一篇：STR基因座群體遺傳學的研究及數(shù)據(jù)分析軟件的開發(fā)與應(yīng)用

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Boosting方法在基因微陣列數(shù)據(jù)判別分析中的應(yīng)用