基于改進(jìn)的多核學(xué)習(xí)算法的癌癥分化度預(yù)測及生物通路分析
發(fā)布時間:2018-05-07 14:22
本文選題:多核學(xué)習(xí) + 特征選擇; 參考:《吉林大學(xué)》2017年碩士論文
【摘要】:針對不同類型組學(xué)數(shù)據(jù)高通量測序技術(shù)的快速發(fā)展,為生物信息學(xué)領(lǐng)域的研究帶來了巨大的變革,使得我們在研究中能夠以較低成本迅速產(chǎn)生大量的組學(xué)數(shù)據(jù),這些數(shù)據(jù)包括基因組數(shù)據(jù)、轉(zhuǎn)錄數(shù)據(jù)、表觀遺傳數(shù)據(jù)、蛋白組學(xué)數(shù)據(jù)以及代謝組學(xué)數(shù)據(jù)。更重要的是,其中針對同一樣本測序獲取多種類型的組學(xué)數(shù)據(jù)的研究也越來越普遍。與此同時,出現(xiàn)了大量包含高質(zhì)量以及高置信度的組學(xué)數(shù)據(jù)的公共數(shù)據(jù)庫,這也為我們收集組學(xué)數(shù)據(jù)并進(jìn)行相關(guān)研究提供了便利。對這些組學(xué)數(shù)據(jù)進(jìn)行整合分析的一個關(guān)鍵目標(biāo)就是要確定一個可以預(yù)測表型性狀與相關(guān)結(jié)果、尋找重要生物標(biāo)記物或者解釋復(fù)雜性狀產(chǎn)生所依賴的遺傳基礎(chǔ)的有效的模型。目前針對多組學(xué)數(shù)據(jù)整合的方法策略主要有兩種,分別是多階段分析策略以及元維分析策略。在多階段分析策略中,假設(shè)不同組學(xué)數(shù)據(jù)與復(fù)雜性狀之間的聯(lián)系是線性的、層次的,通過每次整合之間存在聯(lián)系的兩種組學(xué)數(shù)據(jù)分析,根據(jù)分析結(jié)果逐步構(gòu)建模型。而在絕大多數(shù)情況下,復(fù)雜性狀是不同組學(xué)數(shù)據(jù)變化同時作用導(dǎo)致的結(jié)果,多階段分析策略則不能針對復(fù)雜性狀有效地建模。但是在元維分析策略中,則可以通過同時整合多種組學(xué)數(shù)據(jù)來構(gòu)建模型。癌癥分化度,作為癌癥的一種復(fù)雜性狀,表示癌癥細(xì)胞在細(xì)胞形態(tài)以及組織結(jié)構(gòu)的異變程度。其包含了與癌癥的臨床行為例如惡化以及侵襲等相關(guān)的重要信息,并且在制定癌癥臨床治療計劃以及改善癌癥預(yù)后有著重要作用。通過對癌癥分化度的預(yù)測可以大大提升癌癥早期檢測率以及有效地指導(dǎo)治療過程。盡管有很多研究者注意到了癌癥分化度的重要性,并且出現(xiàn)一些與癌癥分化度預(yù)測的相關(guān)研究,但其中鮮有通過整合多種組學(xué)數(shù)據(jù)來解決此問題。因此我們需要一種能夠利用多種組學(xué)數(shù)據(jù)進(jìn)行癌癥分化度預(yù)測的先進(jìn)的算法。在本文中,我們首先提出了一種基于元維分析策略,受p?范式正則化約束的多核學(xué)習(xí)算法,并使用序列最小化算法對其計算效率進(jìn)行改進(jìn)。同時,我們在原有模型的基礎(chǔ)上加入了生物通路信息,使其可以用于評價不同生物通路在不同癌癥分化度中的重要性。最后,我們使用乳腺癌作為研究案例,基于我們提出的算法,整合了經(jīng)過特征選擇之所后得到的基因表達(dá)數(shù)據(jù)以及甲基化數(shù)據(jù),針對不同乳腺癌的分化度構(gòu)造了預(yù)測器。我們的實驗結(jié)果顯示,提出的模型在預(yù)測效果上優(yōu)于目前流行的其他多種組學(xué)數(shù)據(jù)整合模型,并且給出了關(guān)于產(chǎn)生乳腺癌分化度差異在生物通路層面的解釋。此外,我們的模型可以進(jìn)一步揭示相關(guān)組學(xué)數(shù)據(jù)與乳腺癌分化度之間的聯(lián)系,藉此能夠更深入了解產(chǎn)生乳腺癌分化度差異所依賴的生物模型。
[Abstract]:With the rapid development of high-throughput sequencing technology for different types of cluster data, great changes have been brought to the field of bioinformatics, which enables us to quickly generate a large amount of cluster data at lower cost. These data include genomic data, transcriptional data, epigenetic data, proteomics data, and metabonomics data. More importantly, it is more and more common to obtain multiple types of cluster data by sequencing the same sample. At the same time, a large number of public databases with high quality and high confidence have emerged, which makes it convenient for us to collect and study the cluster data. A key objective of integrating these data is to identify an effective model that can predict phenotypic traits and related results, find important biomarkers or explain the genetic basis on which complexity depends. At present, there are two main strategies for multi-group data integration, namely, multi-stage analysis strategy and meta-dimensional analysis strategy. In the multi-stage analysis strategy, it is assumed that the relationship between different sets of data and complexity is linear and hierarchical, and the model is constructed step by step according to the analysis results. In most cases, complexity is the result of simultaneous action of different data sets, and multi-stage analysis strategies cannot effectively model complexity traits. However, in the meta-dimensional analysis strategy, the model can be constructed by integrating multiple sets of data at the same time. Cancer differentiation, as a complex trait of cancer, indicates the degree of aberration of cancer cells in cell morphology and tissue structure. It contains important information related to the clinical behavior of cancer, such as deterioration and invasion, and plays an important role in making plans for clinical treatment of cancer and improving the prognosis of cancer. By predicting the degree of cancer differentiation, the early detection rate of cancer can be greatly improved and the treatment process can be effectively guided. Although many researchers have paid attention to the importance of cancer differentiation and there have been some studies related to the prediction of cancer differentiation, few of them have solved this problem by integrating a variety of cluster data. Therefore, we need an advanced algorithm to predict the degree of cancer differentiation using multiple sets of data. In this paper, we first propose a meta-dimensional analysis strategy. A multi-core learning algorithm with normal regularization constraints is proposed and its computational efficiency is improved by using the sequence minimization algorithm. At the same time, we add biological pathway information based on the original model, which can be used to evaluate the importance of different biological pathways in different degrees of cancer differentiation. Finally, we use breast cancer as a case study. Based on the proposed algorithm, we integrate the gene expression data and methylation data obtained by feature selection, and construct a predictor for different differentiation degree of breast cancer. Our experimental results show that the proposed model is superior to other popular compositional data integration models in predicting results, and gives an explanation of the difference in breast cancer differentiation at the biological pathway level. In addition, our model can further reveal the correlation between the relevant data and the degree of breast cancer differentiation, so that we can further understand the biological model on which the difference of breast cancer differentiation depends.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:R73-3;TP18
【相似文獻(xiàn)】
相關(guān)期刊論文 前3條
1 陳虹,張家德;“老化度”測定指標(biāo)的篩選——銀川地區(qū)1415例健康人生理學(xué)年齡測定[J];老年學(xué)雜志;1989年04期
2 馮琴昌,,方永奇,李小兵,溫志浩,黃小波;人體老化度測試方法探討[J];中國老年學(xué)雜志;1995年04期
3 朱志明;;怎樣自測老化度指標(biāo)[J];今日科苑;2005年12期
相關(guān)會議論文 前1條
1 鮮學(xué)福;辜敏;杜云貴;;變形場、煤化度和外加電場對甲烷在煤層中滲流的影響[A];第九屆全國滲流力學(xué)學(xué)術(shù)討論會論文集(一)[C];2007年
相關(guān)碩士學(xué)位論文 前2條
1 宋天賜;基于改進(jìn)的多核學(xué)習(xí)算法的癌癥分化度預(yù)測及生物通路分析[D];吉林大學(xué);2017年
2 徐永華;汽車用紡織品的霧化度研究[D];北京服裝學(xué)院;2012年
本文編號:1857269
本文鏈接:http://sikaile.net/shoufeilunwen/mpalunwen/1857269.html
最近更新
教材專著