天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

統(tǒng)計(jì)學(xué)習(xí)模型分析蛋白質(zhì)表達(dá)對(duì)乳癌細(xì)胞增殖的作用

發(fā)布時(shí)間:2018-09-04 09:40
【摘要】:隨著人們?cè)谌粘I钪信c有害物質(zhì)的接觸越來(lái)越頻繁,癌癥的發(fā)病率也逐漸增高。在這個(gè)大數(shù)據(jù)時(shí)代,如何在錯(cuò)綜復(fù)雜的數(shù)據(jù)中選取有效的部分,變得十分重要。由于統(tǒng)計(jì)學(xué)習(xí)方法能夠更好的挖掘出有用的信息,這使得它成為十分重要的研究?jī)?nèi)容。本文的研究對(duì)象為MD Anderson的一組乳癌細(xì)胞MDA-MB-231所掃描的反時(shí)相蛋白質(zhì)陣列(RPPA)和細(xì)胞增殖數(shù)據(jù)。通過(guò)這些數(shù)據(jù)對(duì)線性回歸、支持向量機(jī)(SVM)和隨機(jī)森林模型(RF)分別進(jìn)行訓(xùn)練,從而找到控制乳癌細(xì)胞增殖的關(guān)鍵蛋白質(zhì)。最終把這些關(guān)鍵蛋白質(zhì)作為癌癥藥物的潛在靶標(biāo)。本文使用的數(shù)據(jù)波動(dòng)性較大,為減少這些數(shù)據(jù)對(duì)統(tǒng)計(jì)效能產(chǎn)生的影響,首先對(duì)RPPA進(jìn)行數(shù)據(jù)預(yù)處理。然后將預(yù)處理過(guò)的RPPA作為輸入數(shù)據(jù),細(xì)胞增殖作為輸出數(shù)據(jù),分別對(duì)線性回歸、SVM和RF進(jìn)行訓(xùn)練,其中在線性回歸模型的應(yīng)用中,提出并使用了主成分分析(PCA)與線性回歸模型相結(jié)合的方法。最后通過(guò)比較三種模型的結(jié)果,得到了既具有較高精確度、又能夠篩選出具有關(guān)鍵影響力的蛋白質(zhì)組合的模型。本文結(jié)果表明,線性回歸模型精確度高,SVM模型能篩選出對(duì)乳癌細(xì)胞增殖起關(guān)鍵作用的蛋白質(zhì)組合,而RF在這兩方面表現(xiàn)都非常好。最后,利用RF對(duì)RPPA進(jìn)行分析,得到28種對(duì)乳癌細(xì)胞影響較大的蛋白質(zhì),查找文獻(xiàn)可知,確認(rèn)其中21種對(duì)乳癌細(xì)胞增殖有很大影響。
[Abstract]:As people contact with harmful substances more and more frequently in their daily life, the incidence of cancer increases gradually. In this big data era, how to select valid parts in the intricate data becomes very important. Because the statistical learning method can better excavate useful information, it becomes a very important research content. The object of this study was reverse phase protein array (RPPA) and cell proliferation data scanned by MDA-MB-231 of a group of breast cancer cells in MD Anderson. These data were used to train linear regression, support vector machine (SVM) and random forest model (RF) to find the key proteins to control the proliferation of breast cancer cells. These key proteins are eventually used as potential targets for cancer drugs. The data used in this paper are highly volatile. In order to reduce the impact of these data on statistical performance, the data preprocessing of RPPA is carried out first. Then the preprocessed RPPA is used as input data and cell proliferation is used as output data to train linear regression SVM and RF, respectively, which are used in the application of linear regression model. The method of combining principal component analysis (PCA) with linear regression model is proposed and used. Finally, by comparing the results of the three models, the model with high accuracy and the ability to screen out protein combinations with key influence is obtained. The results show that the linear regression model with high accuracy can screen out protein combinations that play a key role in the proliferation of breast cancer cells, and RF performs very well in both aspects. Finally, RF was used to analyze RPPA, and 28 kinds of proteins which had a great effect on breast cancer cells were obtained. The results showed that 21 of them had great influence on the proliferation of breast cancer cells.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:R737.9;Q811.4

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 林成德;彭國(guó)蘭;;隨機(jī)森林在企業(yè)信用評(píng)估指標(biāo)體系確定中的應(yīng)用[J];廈門大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年02期

,

本文編號(hào):2221705

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/yixuelunwen/swyx/2221705.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c3116***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com