基于改進(jìn)型SVM的基因調(diào)控網(wǎng)絡(luò)構(gòu)建及Spark實(shí)現(xiàn)
本文關(guān)鍵詞: 基因調(diào)控網(wǎng)絡(luò) 改進(jìn)型支持向量機(jī) Spark 轉(zhuǎn)錄因子對(duì) 出處:《天津理工大學(xué)》2017年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:在生物信息學(xué)中,基因調(diào)控網(wǎng)絡(luò)的研究與構(gòu)建是一個(gè)至關(guān)重要的課題,了解基因表達(dá)的調(diào)控機(jī)制,對(duì)于人們認(rèn)識(shí)生物學(xué)過(guò)程以及疾病的發(fā)生機(jī)制都起到了重要的作用。同時(shí),微陣列技術(shù)的不斷發(fā)展與日益完善,為基因調(diào)控網(wǎng)絡(luò)的研究提供了強(qiáng)有力的數(shù)據(jù)保證和技術(shù)支撐。在基因調(diào)控網(wǎng)絡(luò)的研究中,機(jī)器學(xué)習(xí)方法和Spark大數(shù)據(jù)開(kāi)發(fā)平臺(tái)的結(jié)合已經(jīng)成為解決基因調(diào)控網(wǎng)絡(luò)構(gòu)建問(wèn)題的有效手段。對(duì)于處理生物基因序列這樣海量的數(shù)據(jù),傳統(tǒng)的基因鑒別技術(shù)存在成本昂貴、原理復(fù)雜、重復(fù)性差、時(shí)間周期長(zhǎng)等諸多缺陷,遠(yuǎn)遠(yuǎn)不能滿(mǎn)足現(xiàn)代化研究的需求。此時(shí),使用機(jī)器學(xué)習(xí)方法和Spark大數(shù)據(jù)開(kāi)發(fā)平臺(tái)的有機(jī)結(jié)合來(lái)對(duì)生物數(shù)據(jù)進(jìn)行數(shù)據(jù)挖掘操作已經(jīng)成為生物信息學(xué)研究的一種新方法。本文主要就是利用改進(jìn)型支持向量機(jī)方法和Spark大數(shù)據(jù)開(kāi)發(fā)平臺(tái),并結(jié)合已知的轉(zhuǎn)錄因子數(shù)據(jù),用于解決生物信息學(xué)全基因組中預(yù)測(cè)建立基因調(diào)控網(wǎng)絡(luò)的問(wèn)題。本文建立了基于改進(jìn)型支持向量機(jī)的基因調(diào)控網(wǎng)絡(luò)模型,并用建立的模型去預(yù)測(cè)ATGen Express數(shù)據(jù)庫(kù)中的擬南芥的轉(zhuǎn)錄因子對(duì),其識(shí)別率高達(dá)93%,還預(yù)測(cè)了一些未知的轉(zhuǎn)錄關(guān)系。與此同時(shí),將建立的基因調(diào)控網(wǎng)絡(luò)模型部署到Spark大數(shù)據(jù)處理平臺(tái)上,實(shí)驗(yàn)結(jié)果表明,其實(shí)驗(yàn)周期與以往的單機(jī)模式相比,提高了大約7倍。本文通過(guò)對(duì)轉(zhuǎn)錄因子序列的所做的有效的數(shù)據(jù)處理操作,并結(jié)合改進(jìn)型支持向量機(jī)技術(shù)和Spark大數(shù)據(jù)開(kāi)發(fā)平臺(tái),取得的預(yù)測(cè)結(jié)果從準(zhǔn)確率和時(shí)間效率上都超過(guò)以往一些微分方程或聚類(lèi)分析算法的運(yùn)行結(jié)果。在未來(lái),通過(guò)構(gòu)建的完善的基因調(diào)控網(wǎng)絡(luò),人們可以清楚地知道哪個(gè)或哪些基因的共同作用是治療某種疾病的根源,從而為相關(guān)疾病的診療找到了理論支持。
[Abstract]:In bioinformatics, the research and construction of gene regulatory network is a crucial issue. Understanding the regulatory mechanism of gene expression plays an important role in understanding the biological process and the pathogenesis of disease. The continuous development and improvement of microarray technology provide strong data guarantee and technical support for the research of gene regulatory network. The combination of machine learning method and Spark big data development platform has become an effective means to solve the problem of gene regulation network construction. The principles are complex, the repeatability is poor, the time cycle is long, and many other defects, such as far from meeting the needs of modern research. At this time, Using the organic combination of machine learning method and Spark big data development platform to mine biological data has become a new method of bioinformatics research. This paper mainly uses improved support vector machine. Method and Spark big data development platform, Combined with the known transcription factor data, it is used to solve the problem of predicting gene regulation network in the whole genome of bioinformatics. In this paper, a gene regulation network model based on improved support vector machine (SVM) is established. The model was established to predict transcription factor pairs in Arabidopsis thaliana in the ATGen Express database, and the recognition rate was as high as 93%, and some unknown transcriptional relationships were predicted. The model of gene regulation network is deployed to the Spark big data processing platform. The experimental results show that the experimental cycle is compared with the previous single-machine model. Through the effective data processing operation of transcription factor sequence, combined with the improved support vector machine technology and Spark big data development platform, The predicted results are higher in accuracy and time efficiency than those of some previous differential equations or cluster analysis algorithms. In the future, through the construction of a perfect gene regulatory network, It is clear which or which genes work together to cure the root cause of a disease, thus providing theoretical support for the diagnosis and treatment of related diseases.
【學(xué)位授予單位】:天津理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:Q78;TP18
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 易東,楊夢(mèng)蘇,李輝智,黃明輝,王文昌;相關(guān)分析在建立基因調(diào)控網(wǎng)絡(luò)中的應(yīng)用[J];中國(guó)衛(wèi)生統(tǒng)計(jì);2003年03期
2 張家軍;蔡傳政;王翼飛;;基因調(diào)控網(wǎng)絡(luò)中的延滯動(dòng)力學(xué)[J];應(yīng)用科學(xué)學(xué)報(bào);2007年01期
3 郭子龍;紀(jì)兆華;涂華偉;梁艷春;;基因調(diào)控網(wǎng)絡(luò)的研究?jī)?nèi)容及其數(shù)據(jù)分析方法[J];電腦知識(shí)與技術(shù);2008年15期
4 陳少白;羅嘉;;一類(lèi)基因調(diào)控網(wǎng)絡(luò)的定性分析[J];南京信息工程大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年05期
5 李慶偉;全俊龍;劉欣;;基因調(diào)控網(wǎng)絡(luò)研究進(jìn)展[J];遼寧師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年01期
6 葉緯明;呂彬彬;趙琛;狄增如;;少節(jié)點(diǎn)基因調(diào)控網(wǎng)絡(luò)的控制[J];物理學(xué)報(bào);2013年01期
7 王沛;呂金虎;;基因調(diào)控網(wǎng)絡(luò)的控制:機(jī)遇與挑戰(zhàn)[J];自動(dòng)化學(xué)報(bào);2013年12期
8 易東,李輝智;基因調(diào)控網(wǎng)絡(luò)研究與數(shù)學(xué)模型的建立[J];中國(guó)現(xiàn)代醫(yī)學(xué)雜志;2003年24期
9 雷耀山,史定華,王翼飛;基因調(diào)控網(wǎng)絡(luò)的生物信息學(xué)研究[J];自然雜志;2004年01期
10 姜偉;李霞;郭政;李傳星;王麗虹;饒紹奇;;時(shí)間延遲基因調(diào)控網(wǎng)絡(luò)重構(gòu)的決策樹(shù)方法研究[J];中國(guó)科學(xué)(C輯:生命科學(xué));2005年06期
相關(guān)會(huì)議論文 前3條
1 熊江輝;李瑩輝;;基因芯片數(shù)據(jù)分析的新方法與基因調(diào)控網(wǎng)絡(luò)推理[A];全面建設(shè)小康社會(huì):中國(guó)科技工作者的歷史責(zé)任——中國(guó)科協(xié)2003年學(xué)術(shù)年會(huì)論文集(上)[C];2003年
2 王亞麗;周彤;;大規(guī);蛘{(diào)控網(wǎng)絡(luò)因果關(guān)系的辨識(shí)[A];第二十九屆中國(guó)控制會(huì)議論文集[C];2010年
3 馮晶;許勇;李娟娟;;非高斯噪聲激勵(lì)下基因調(diào)控網(wǎng)絡(luò)的研究[A];第十四屆全國(guó)非線(xiàn)性振動(dòng)暨第十一屆全國(guó)非線(xiàn)性動(dòng)力學(xué)和運(yùn)動(dòng)穩(wěn)定性學(xué)術(shù)會(huì)議摘要集與會(huì)議議程[C];2013年
相關(guān)重要報(bào)紙文章 前1條
1 吳佳s,
本文編號(hào):1499784
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/1499784.html