基于計(jì)算智能方法的多莖環(huán)pre-miRNA預(yù)測(cè)研究
本文選題:pre-miRNA + SMOTE; 參考:《濟(jì)南大學(xué)》2016年碩士論文
【摘要】:遺傳信息DNA經(jīng)過(guò)轉(zhuǎn)錄生成mRNA,mRNA在核糖體中經(jīng)過(guò)翻譯生成蛋白質(zhì),這就是長(zhǎng)期以來(lái)人們對(duì)生物學(xué)中心法則的理解,但microRNA(miRNA)的發(fā)現(xiàn)卻改變了人們對(duì)中心法則的最初認(rèn)識(shí)。miRNA是一類(lèi)重要的長(zhǎng)度較短(約為21~23個(gè)核苷酸)的非編碼RNA基因,通過(guò)堿基互補(bǔ)配對(duì)原則與靶標(biāo)mRNA結(jié)合來(lái)決定分解還是抑止mRNA的翻譯作用,進(jìn)而起到影響基因表達(dá)的作用。最新研究發(fā)現(xiàn),miRNA調(diào)控著人類(lèi)約20%~30%的基因表達(dá),miRNA不僅參與生理代謝、機(jī)體的生長(zhǎng)和發(fā)育、細(xì)胞增殖與凋亡等,而且實(shí)驗(yàn)證明還與癌癥的發(fā)生有著錯(cuò)綜復(fù)雜的關(guān)系,因此深入研究miRNA將有助于人們深入了解基因調(diào)控網(wǎng)絡(luò)的奧秘,同時(shí)也對(duì)生物進(jìn)化的探索具有重要指導(dǎo)作用。我們的研究工作主要包括以下四個(gè)方面:(1)從miRBase數(shù)據(jù)庫(kù)中提取了695條人類(lèi)pre-miRNA樣本,經(jīng)過(guò)刪除冗余的環(huán)節(jié),最終剩余691條。從人類(lèi)RefSeq基因中獲取了8494條非冗余偽發(fā)夾序列,從Lander手動(dòng)注明建立的人類(lèi)非編碼RNA數(shù)據(jù)庫(kù)中提取了1020條(除miRNA)非編碼RNA序列,刪除冗余的和序列長(zhǎng)度超過(guò)150個(gè)堿基的,剩余754條序列。針對(duì)我們建立的數(shù)據(jù)集不平衡問(wèn)題,我們分別采用樣本數(shù)據(jù)預(yù)處理方法和內(nèi)部方法使陰陽(yáng)性數(shù)據(jù)集達(dá)到平衡。(2)借鑒目前預(yù)測(cè)效果最好的miPred方法中采用的29維全局和內(nèi)在特征,并在此基礎(chǔ)上加入了19維理化和結(jié)構(gòu)特征。選擇最具區(qū)別度的特征能夠減少系統(tǒng)復(fù)雜度提高我們預(yù)測(cè)模型的預(yù)測(cè)效率,所以我們采用包裝和過(guò)濾方法對(duì)這48維樣本特征進(jìn)行最優(yōu)特征選擇,最終剩余21維特征,其中包括7維miPred特征和14維新引入的結(jié)構(gòu)特征,這也證明我們新引入的結(jié)構(gòu)特征比序列特征具有更高的區(qū)別度。(3)鑒于人工神經(jīng)網(wǎng)絡(luò)具有自學(xué)習(xí)、自適應(yīng)與自組織的優(yōu)點(diǎn),所以我們首先選擇人工神經(jīng)網(wǎng)絡(luò)模型進(jìn)行預(yù)測(cè),通過(guò)5折交叉驗(yàn)證,實(shí)驗(yàn)預(yù)測(cè)結(jié)果準(zhǔn)確率為93.58%,明顯高于triplet-SVM和MiPred等其它預(yù)測(cè)方法。(4)將神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)模型對(duì)6095條其它(除人類(lèi))動(dòng)物與miRBase中的139條病毒pre-miRNAs進(jìn)行預(yù)測(cè),預(yù)測(cè)準(zhǔn)確率分別達(dá)到97.18%、94.24%,預(yù)測(cè)效果都得到了很大提高,證明我們構(gòu)建的人工神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)模型能夠有效的預(yù)測(cè)miRNA,并為miRNA的預(yù)測(cè)提供了一條嶄新的研究思路。
[Abstract]:Genetic information DNA is transcribed to produce mRNAs mRNA and translated into proteins in ribosomes, which has long been understood as the central principle of biology. However, the discovery of microRNA (miRNA) has changed the initial understanding of the central rule. MiRNA is an important class of non-coding RNA genes with short length (about 21 ~ 23 nucleotides). Through the principle of base complementary pairing and target mRNA binding to determine whether to decompose or inhibit the translation of mRNA, and then play a role in gene expression. New research has found that miRNA regulates about 20% of human gene expression. MiRNA not only participates in physiological metabolism, body growth and development, cell proliferation and apoptosis, but also has a complex relationship with the occurrence of cancer. Therefore, further study of miRNA will help people to understand the secrets of gene regulatory networks, and also play an important role in the exploration of biological evolution. Our research work mainly includes the following four aspects: (1) 695 human pre-miRNA samples were extracted from miRBase database. 8494 non-redundant pseudo hairpin sequences were obtained from the human RefSeq gene, 1020 non-coding RNA sequences (except miRNA) were extracted from the human non-coding RNA database set up by Lander, and redundant and sequence lengths exceeding 150 bases were deleted. The remaining 754 sequences. In order to solve the imbalance problem of data set, we use sample data preprocessing method and internal method to balance the data set of yin and yang. (2) drawing lessons from the 29 dimensional global and internal characteristics of miPred method, which is the best prediction method at present, On this basis, 19 dimensional physicochemical and structural characteristics were added. Choosing the most distinguishing feature can reduce the complexity of the system and improve the prediction efficiency of our prediction model. So we use packaging and filtering methods to select the optimal feature of the 48 dimensional sample features, and finally the remaining 21 dimensional features. It includes 7 dimensional miPred feature and 14 dimensional new structure feature, which also proves that our new structure feature has a higher distinction than the sequence feature. (3) since the artificial neural network has the advantages of self-learning, self-adaptation and self-organization, So we first choose the artificial neural network model to predict, through 50% discount cross-validation, The accuracy of experimental prediction was 93.58, which was significantly higher than that of other prediction methods such as triplet-SVM and MiPred. (4) the neural network prediction model was used to predict the pre-miRNAs of 6095 other (except human) animals and miRBase viruses. The prediction accuracy is 97.18% and 94.24% respectively, and the prediction effect has been greatly improved, which proves that the artificial neural network prediction model can effectively predict miRNAs, and provides a new research idea for miRNA prediction.
【學(xué)位授予單位】:濟(jì)南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:Q52;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳志茹;洪文學(xué);;嵌入欠采樣技術(shù)的支持向量機(jī)集成分類(lèi)算法的MicroRNA靶標(biāo)預(yù)測(cè)[J];生物醫(yī)學(xué)工程學(xué)雜志;2016年01期
2 王穎;李金;王磊;徐成振;才忠喜;;基于機(jī)器學(xué)習(xí)的microRNA預(yù)測(cè)方法研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2015年02期
3 Miao Zhang;Zhen-Zhou Lai;Dan Li;Yi Shen;;Multi-Class Support Vector Machine Classifier Based on Jeffries-Matusita Distance and Directed Acyclic Graph[J];Journal of Harbin Institute of Technology;2013年05期
4 夏天;肖丙秀;郭俊明;;長(zhǎng)鏈非編碼RNA的作用機(jī)制及其研究方法[J];遺傳;2013年03期
5 趙屹;谷瑞升;杜生明;;生物信息學(xué)研究現(xiàn)狀及發(fā)展趨勢(shì)[J];醫(yī)學(xué)信息學(xué)雜志;2012年05期
6 馬圣運(yùn);白玉;韓凝;王君暉;翁曉燕;邊紅武;朱睦元;;miRNA~*生物合成及其功能研究的新發(fā)現(xiàn)[J];遺傳;2012年04期
7 毛健;趙紅東;姚婧婧;;人工神經(jīng)網(wǎng)絡(luò)的發(fā)展及應(yīng)用[J];電子設(shè)計(jì)工程;2011年24期
8 高青;鞠志花;王長(zhǎng)法;李國(guó)榮;;miRBase-microRNA序列數(shù)據(jù)庫(kù)[J];家畜生態(tài)學(xué)報(bào);2011年06期
9 George P.COBB1,Todd A.ANDERSON;Identification and characterization of new plant microRNAs using EST analysis[J];Cell Research;2005年05期
10 ;Computational Identification of Novel Family Members of MicroRNA Genes in Arabidopsis thaliana and Oryza sativa[J];Acta Biochimica et Biophysica Sinica;2005年02期
相關(guān)博士學(xué)位論文 前2條
1 涂娟娟;PSO優(yōu)化神經(jīng)網(wǎng)絡(luò)算法的研究及其應(yīng)用[D];江蘇大學(xué);2013年
2 高鵬毅;BP神經(jīng)網(wǎng)絡(luò)分類(lèi)器優(yōu)化技術(shù)研究[D];華中科技大學(xué);2012年
相關(guān)碩士學(xué)位論文 前3條
1 陳斌;SMOTE不平衡數(shù)據(jù)過(guò)采樣算法的改進(jìn)與應(yīng)用[D];廣西大學(xué);2015年
2 胡玲玲;MicroRNA預(yù)測(cè)分類(lèi)及其特性研究[D];廈門(mén)大學(xué);2014年
3 張海濤;基于多表達(dá)式編程的分類(lèi)算法研究[D];石家莊經(jīng)濟(jì)學(xué)院;2011年
,本文編號(hào):2092567
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2092567.html