基于轉(zhuǎn)錄組數(shù)據(jù)的多分辨形狀聚類算法及l(fā)ncRNA相關(guān)癌癥分子靶標(biāo)識別研究
本文選題:生物信息學(xué) + 時間序列基因表達(dá)譜數(shù)據(jù)��; 參考:《吉林大學(xué)》2017年碩士論文
【摘要】:當(dāng)前,胃癌的死亡率位居癌癥類疾病前列,我國胃癌患者的數(shù)量約占全球胃癌患者總數(shù)的40%�?蒲腥藛T近年發(fā)現(xiàn)一些長鏈非編碼RNA(lncRNA)在許多癌癥中表達(dá)異常,也由此提升了人們對lncRNA與癌癥關(guān)系的關(guān)注興趣。到現(xiàn)在為止,學(xué)界還并不熟悉胃癌的詳細(xì)分子機制,對于lncRNA尤其如此。通過生物實驗可以獲得癌癥與lncRNA的互作信息,但是與癌癥相關(guān)的lncRNA的實驗鑒定通常需要很高的時間復(fù)雜性和成本。本文提出了一種計算方法,通過重復(fù)使用基于外顯子的胃癌陣列來確定lncRNA與胃癌之間的關(guān)系。經(jīng)過實驗,識別出了一種特定的lncRNA(LINC00365)及其靶標(biāo)差異表達(dá)的基因,其產(chǎn)物被預(yù)測為可以分泌入血液、尿液或者唾液并被鑒定為胃癌的組合生物標(biāo)志物的候選物。利用多源生物學(xué)知識可以進(jìn)行進(jìn)一步的生物信息學(xué)功能、胃癌相關(guān)的lncRNA和編碼基因生物標(biāo)志物的分子機制推斷。本論文的首要工作是對lncRNA的分析。首先重新注釋GEO數(shù)據(jù)庫中人類外顯子1.0st array平臺下的有關(guān)腫瘤的exon array數(shù)據(jù)中的探針,進(jìn)而獲得lncRNA和編碼基因的表達(dá)譜數(shù)據(jù)。然后對與胃癌相關(guān)的數(shù)據(jù)進(jìn)行秩和檢驗,計算p值和表達(dá)數(shù)據(jù)基因在腫瘤與正常樣本數(shù)據(jù)中的倍數(shù)變化值。當(dāng)計算獲得的倍數(shù)變化值大于1.5或小于1/1.5且p值小于0.01時,可認(rèn)為此基因具有顯著的表達(dá)差異性。接著計算皮爾森和斯皮爾曼相關(guān)系數(shù)來構(gòu)建共表達(dá)網(wǎng)絡(luò),用邏輯函數(shù)變換整合皮爾森和斯皮爾曼計算權(quán)重,來顯示顯著的共表達(dá)關(guān)系。分析與差異表達(dá)的lncRNA共表達(dá)的編碼基因的GO和pathway來獲得lncRNA參與的生物學(xué)功能。最后判斷與lncRNA相關(guān)的編碼基因是否可分泌入體液,找到胃癌的組合生物標(biāo)志物并進(jìn)行驗證。本文的第二部分是對時間序列基因表達(dá)譜數(shù)據(jù)進(jìn)行研究。它是隨時間不斷變化的動態(tài)數(shù)據(jù),通過分析時間序列數(shù)據(jù)可以獲得較為有意義的統(tǒng)計特性和顯著的生物學(xué)特征。最近幾年,科研工作者們對時間序列數(shù)據(jù)挖掘的興趣與日俱增,時間序列數(shù)據(jù)在每個時間點上的表達(dá)值有所不同,開發(fā)有效的分析方法是一項重大的挑戰(zhàn)�;跁r間序列的表達(dá)實驗提供了探索隨時間變化的基因表達(dá)譜和理解基因表達(dá)動態(tài)行為的機會,對生物學(xué)和疾病發(fā)展的研究有著至關(guān)重要的作用。本文基于多分辨率分形特征和混合聚類模型算法,探索在不同分辨率下基因表達(dá)隨時間變化的模式。多分辨率分形特征由小波分解獲得,這種具有概率框架的多分辨率形狀混合模型算法提供了更自然、更具魯棒性的方法進(jìn)行聚類分析,其識別的分組基因具有更強的生物學(xué)意義。因此對與腫瘤相關(guān)的時間序列基因表達(dá)譜數(shù)據(jù)應(yīng)用多分辨形狀聚類算法,得到全局和局部的分形特征,可將數(shù)據(jù)分成有顯著生物學(xué)意義的聚類。
[Abstract]:At present, the mortality rate of gastric cancer is among the leading causes of cancer diseases, and the number of gastric cancer patients in China accounts for 40 percent of the total number of gastric cancer patients in the world. In recent years, researchers have found that some long chain noncoding RNAs (LNRNAs) are abnormal in many cancers, thus increasing the interest in the relationship between lncRNA and cancer. Up to now, the detailed molecular mechanism of gastric cancer has not been known, especially for lncRNA. The interaction information between cancer and lncRNA can be obtained by biological experiments, but the identification of cancer-related lncRNA usually takes a lot of time complexity and cost. In this paper, a computational method is proposed to determine the relationship between lncRNA and gastric cancer by repeated use of an exon based gastric cancer array. A specific LNC RNA-LINC00365) and its target differentially expressed genes were identified, and their products were predicted as candidates for secreting into blood, urine or saliva and identified as combined biomarkers for gastric cancer. Further bioinformatics functions, lncRNA related to gastric cancer and molecular mechanism of coding gene biomarkers can be deduced by using multi-source biological knowledge. The main task of this thesis is to analyze lncRNA. Firstly, the probes in the exon array data of human exon 1.0st array in the GEO database were reinterpreted, and then the expression profile data of lncRNA and encoding genes were obtained. Then the data related to gastric cancer were tested by rank sum test to calculate the multiple changes of p value and expression data gene in tumor and normal sample data. When the calculated multiple change value is greater than 1.5 or less than 1 / 1.5 and p value is less than 0.01, it can be concluded that the gene has significant difference in expression. Then the Pearson and Spelman correlation coefficients are calculated to construct the coexpression network and the logical function transformation is used to integrate Pearson and Spelman to calculate the weights to show the significant coexpression relationship. Go and pathway were analyzed to obtain the biological function of lncRNA involved in the differential expression of lncRNA. Finally, whether the coding gene related to lncRNA can be secreted into body fluid, the combined biomarker of gastric cancer was found and verified. The second part of this paper is to study the time series gene expression profile data. It is a dynamic data which changes with time. By analyzing the data of time series, we can obtain more significant statistical characteristics and obvious biological characteristics. In recent years, researchers are more and more interested in time series data mining, and the expression values of time series data are different at each time point. It is a great challenge to develop effective analysis methods. The expression experiment based on time series provides an opportunity to explore the gene expression profiles and to understand the dynamic behavior of gene expression over time. It plays an important role in the study of biology and disease development. Based on multi-resolution fractal features and hybrid clustering algorithm, this paper explores the time-dependent patterns of gene expression at different resolution. The multi-resolution fractal features are obtained by wavelet decomposition. This multi-resolution shape hybrid model algorithm with probability framework provides a more natural and robust method for clustering analysis. The recognized group genes have stronger biological significance. Therefore, the multi-resolution shape clustering algorithm is applied to the time-series gene expression profile data related to tumor. The global and local fractal features can be obtained, and the data can be divided into clusters with significant biological significance.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:R735.2;TP311.13
【相似文獻(xiàn)】
相關(guān)會議論文 前3條
1 馮丹;牛勇敢;;LncRNA與腦卒中的研究分析[A];第一次全國中西醫(yī)結(jié)合檢驗醫(yī)學(xué)學(xué)術(shù)會議暨中國中西醫(yī)結(jié)合學(xué)會檢驗醫(yī)學(xué)專業(yè)委員會成立大會論文匯編[C];2014年
2 李夏君;楊淼;劉冉;尹立紅;;淮安食管癌與癌旁組織中差異表達(dá)LncRNA的初步分析[A];中國毒理學(xué)會第六屆全國毒理學(xué)大會論文摘要[C];2013年
3 沈遠(yuǎn);付漢江;朱娟娟;劉珊珊;鐘一然;鄭曉飛;;應(yīng)用多重PCR方法進(jìn)行l(wèi)ncRNA功能研究[A];中國生物化學(xué)與分子生物學(xué)會第十一次會員代表大會暨2014年全國學(xué)術(shù)會議論文集——專題報告五[C];2014年
相關(guān)博士學(xué)位論文 前3條
1 王富博;尿液lncRNA在前列腺癌早期診斷以及新型miRNA在進(jìn)展機制中的研究[D];第二軍醫(yī)大學(xué);2016年
2 陳升東;精神分裂癥外周血單核細(xì)胞中差異性表達(dá)lncRNA的臨床和實驗研究[D];第二軍醫(yī)大學(xué);2016年
3 劉佳;靜態(tài)牽張力作用下健康和牙周病微環(huán)境來源PDLSCs生物學(xué)功能及LncRNA表達(dá)譜的研究[D];第四軍醫(yī)大學(xué);2016年
相關(guān)碩士學(xué)位論文 前10條
1 周明;人胰腺癌吉西他濱耐藥細(xì)胞株中l(wèi)ncRNA表達(dá)譜初步篩選及研究[D];蘇州大學(xué);2015年
2 郝超;LncRNA在前列腺增生性炎性萎縮向前列腺癌惡性轉(zhuǎn)化過程中差異表達(dá)的初步研究[D];南昌大學(xué)醫(yī)學(xué)院;2015年
3 金雯;基于結(jié)構(gòu)的植物lncRNA相互作用研究[D];吉林大學(xué);2016年
4 孫影;基于miRNA的lncRNA和mRNA的調(diào)控網(wǎng)絡(luò)[D];吉林大學(xué);2016年
5 李凌雪;lncRNA參與幽門螺桿菌感染相關(guān)胃癌的功能及機制研究[D];北京協(xié)和醫(yī)學(xué)院;2016年
6 劉紅梅;GMA誘導(dǎo)的16HBE惡性轉(zhuǎn)化細(xì)胞相關(guān)差異LncRNA篩選及其研究[D];中國疾病預(yù)防控制中心;2016年
7 趙寧;產(chǎn)腸毒素性大腸桿菌感染小鼠致腹瀉LncRNA表達(dá)譜的構(gòu)建[D];寧夏大學(xué);2016年
8 陳牡丹;LncRNA在肝硬化進(jìn)程中的差異表達(dá)[D];福建醫(yī)科大學(xué);2016年
9 廖武;長鏈非編碼RNA(lncRNA)在穩(wěn)定性心絞痛中的表達(dá)譜分析[D];新疆醫(yī)科大學(xué);2017年
10 段偉麗;膀胱癌血清lncRNA診斷模型的建立及其對膀胱癌復(fù)發(fā)監(jiān)測的臨床意義[D];山東大學(xué);2017年
,本文編號:1851985
本文鏈接:http://sikaile.net/yixuelunwen/zlx/1851985.html