基于稀疏低秩回歸方法的腫瘤亞型聚類分析
[Abstract]:At present, cancer is one of the major diseases leading to human death. With the development of the second generation sequencing technology, scholars from all over the world have carried out large-scale cancer genome sequencing projects (such as TCGA) and obtained a large number of different types of biological data (such as mRNA expression data and DNA methylation data). Somatic mutation data) has a positive effect on understanding the pathogenesis of cancer, searching for accurate subtypes of cancer, designing effective drugs for cancer treatment, and so on. However, with the new problems, how to fully integrate and use the multiple sets of biologic sequencing data to design a tumor subtype clustering algorithm has become one of the hot topics in bioinformatics. At present, the commonly used analysis methods of tumor subtype clustering are semi-supervised or unsupervised sample allocation for a single biometric data. However, the disadvantage of this kind of method is that many kinds of correlated data types can not be used in a single clustering method, which can easily cause information loss. In recent years, a number of clustering algorithms for tumor subtypes have been proposed based on multigroup biological data. However, these methods are still in the early stage of development, and there are still many problems to be solved. For example, gene pre-screening and real data integration model are constructed to get more accurate results. Therefore, there is an urgent need to develop new data analysis methods. In this paper, the core idea of our work is to project high dimensional multigroup data into a low dimensional subspace containing major biological processes based on sparse low rank regression. Finally, the purpose of data fusion and fast clustering is achieved. The first chapter introduces the research background and significance of subtype analysis based on multi-group data, as well as the current research situation and main research methods at home and abroad. In the second chapter, we introduce the commonly used data of cancer subtype, and enumerate and review some representative clustering algorithms that integrate many kinds of data. Chapter 3 introduces the theory of optimizing iCluster algorithm based on sparse low rank regression method. Based on the sparse low rank regression method, we replace the optimized PCA algorithm, calculate the initial value of the coefficient matrix with sparse low rank property, and ensure the estimation of the optimal posterior probability value in the subsequent iteration process. Compared with the iCluster algorithm, the comparison experiment also verifies the effectiveness of the improved algorithm. In chapter 4, the theory of cluster clustering algorithm based on sparse low rank regression is introduced. It uses a suitable sparse low-rank regression method to find valid low-dimensional subspaces from each biological data, and then integrates these subspaces into a sample-sample similarity matrix. Finally, the cancer subtypes were identified by spectral clustering. The experimental results on three different types of cancer data sets show that the proposed cluster is more effective in predicting life cycle. In GBM subtype analysis, based on the integration of expressed and methylated data, our method can more effectively capture biological features and find subsets of subtypes, and find a new hidden subtype. The fifth chapter introduces some problems in the research, summarizes the full text, and looks forward to the future development direction.
【學(xué)位授予單位】:安徽大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:R730.2;O212.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 黃良;;門限自回歸方法在秋季低溫發(fā)生期預(yù)測(cè)中的運(yùn)用[J];四川氣象;1991年03期
2 王書寧,戴建設(shè),胡萍;未知有界誤差下新的回歸方法[J];控制與決策;1994年04期
3 潘蕙琦,史秉璋;介紹一種回歸方法──浮動(dòng)法[J];數(shù)理統(tǒng)計(jì)與管理;1985年03期
4 倪加勛;介紹一種新的回歸方法——單調(diào)回歸[J];統(tǒng)計(jì)與決策;1986年03期
5 顏金銳 ,林群;秩單調(diào)回歸方法及應(yīng)用[J];廈門大學(xué)學(xué)報(bào)(哲學(xué)社會(huì)科學(xué)版);1993年03期
6 楊自強(qiáng);殷溪源;;基于垂直距離的回歸方法[J];物探化探計(jì)算技術(shù);1993年02期
7 牟永平;怎樣用自回歸方法 做季降水量預(yù)報(bào)[J];山東氣象;1979年01期
8 孫耀東,王太源,宗序平;可線性化回歸方法的改進(jìn)和拓展[J];揚(yáng)州大學(xué)學(xué)報(bào)(自然科學(xué)版);2001年02期
9 潘蕙琦,史秉璋;用最優(yōu)回歸方法評(píng)價(jià)一種選擇回歸子集的新方法[J];數(shù)學(xué)的實(shí)踐與認(rèn)識(shí);1987年02期
10 黃樹顏;回歸方法的數(shù)據(jù)預(yù)處理及其應(yīng)用[J];統(tǒng)計(jì)研究;1986年02期
相關(guān)會(huì)議論文 前2條
1 王莉;楊印生;劉子玉;;基于Binary Logistic回歸方法的農(nóng)村勞動(dòng)力流動(dòng)影響因素分析[A];中國現(xiàn)場(chǎng)統(tǒng)計(jì)研究會(huì)第12屆學(xué)術(shù)年會(huì)論文集[C];2005年
2 周明;陳中笑;;利用二元回歸方法分析我國降水的同位素效應(yīng)[A];S6 大氣成分與天氣氣候變化[C];2012年
相關(guān)博士學(xué)位論文 前1條
1 勾建偉;懲罰回歸方法的研究及其在后全基因關(guān)聯(lián)研究中的應(yīng)用[D];南京醫(yī)科大學(xué);2014年
相關(guān)碩士學(xué)位論文 前5條
1 葛曙光;基于稀疏低秩回歸方法的腫瘤亞型聚類分析[D];安徽大學(xué);2017年
2 郭月玲;百分位數(shù)回歸方法在財(cái)務(wù)管理中的應(yīng)用[D];電子科技大學(xué);2008年
3 杜萬亮;基于獨(dú)立成分分析的多元回歸方法研究[D];東北大學(xué);2009年
4 劉高生;切片逆回歸降維模型擴(kuò)展及其應(yīng)用[D];貴州財(cái)經(jīng)大學(xué);2014年
5 王曉霞;基于分片逆回歸的維數(shù)縮減[D];湖北大學(xué);2011年
,本文編號(hào):2134453
本文鏈接:http://sikaile.net/kejilunwen/yysx/2134453.html