天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

參考基因壓縮庫(kù)間快速遷移算法研究

發(fā)布時(shí)間:2018-02-09 06:44

  本文關(guān)鍵詞: 基于參考基因組壓縮 DNA數(shù)據(jù)壓縮 參考序列轉(zhuǎn)換 FASTA 龍芯 出處:《深圳大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


【摘要】:隨著基因測(cè)序費(fèi)用的降低以及精準(zhǔn)醫(yī)療和基因深度學(xué)習(xí)等新興技術(shù)對(duì)基因大數(shù)據(jù)的需要,目前已進(jìn)入一個(gè)基因數(shù)據(jù)爆發(fā)的時(shí)代。面對(duì)如此海量的基因數(shù)據(jù),如何存儲(chǔ)和傳輸這些數(shù)據(jù)成為當(dāng)前研究的一個(gè)熱點(diǎn),基于參考基因組的壓縮算法以其壓縮率高的特點(diǎn)被廣泛應(yīng)用于各大基因庫(kù)中。同時(shí),這類壓縮算法依賴于參考基因數(shù)據(jù),這也嚴(yán)重的制約了該類壓縮算法產(chǎn)生的壓縮數(shù)據(jù)的共享、合并和傳送等應(yīng)用。本文主要針對(duì)不同壓縮基因庫(kù)由于采用的參考基因的不同而無(wú)法直接共享的問(wèn)題進(jìn)行深入研究,提出了一套快速轉(zhuǎn)換基于不同參考基因的壓縮數(shù)據(jù)進(jìn)行參考序列的轉(zhuǎn)換。主要工作包括:(1)對(duì)多種基因壓縮算法進(jìn)行分類,針對(duì)不同的基因壓縮算法分別討論其特點(diǎn)。并對(duì)幾種最新的基于參考基因組的壓縮算法進(jìn)行詳細(xì)分析。(2)針對(duì)相同壓縮算法不同參考序列的基于參考基因組的壓縮算法進(jìn)行快速的參考基因轉(zhuǎn)換算法研究。該算法主要利用參考基因組序列之間的相似性來(lái)進(jìn)行參考序列的快速遷移。實(shí)驗(yàn)結(jié)果表明遷移所需時(shí)間遠(yuǎn)低于原始的解壓再壓縮方法,同時(shí)也為后面的研究指明方向。(3)在(2)的基礎(chǔ)上進(jìn)行拓展,針對(duì)不同壓縮算法不同參考序列遷移研究,我們選取三種壓縮算法進(jìn)行分析提取共性,并結(jié)合三種壓縮算法的特點(diǎn),在(2)快速遷移算法基礎(chǔ)上提高遷移后壓縮基因的壓縮率,設(shè)計(jì)了兩種遷移算法來(lái)實(shí)現(xiàn)三種壓縮算法的相互遷移。并通過(guò)大量的實(shí)驗(yàn)驗(yàn)證了算法的高效性。(4)最后針對(duì)龍芯平臺(tái)我們實(shí)現(xiàn)了一套完整的具備基因壓縮、遷移和解壓功能的基因數(shù)據(jù)管理工具TReC。并對(duì)其進(jìn)行性能分析,然后通過(guò)多進(jìn)程對(duì)龍芯平臺(tái)上的TReC進(jìn)行性能優(yōu)化,使其能充分利用龍芯多核來(lái)加速TReC的運(yùn)行速度。本文在基于參考基因組壓縮算法過(guò)于依賴參考序列的基礎(chǔ)上,提出了兩個(gè)有效的遷移算法,在遷移時(shí)間上具有很大優(yōu)勢(shì),這些技術(shù)可以有效的緩解基于參考基因組壓縮基因庫(kù)之間相互遷移的問(wèn)題,也為后續(xù)相關(guān)研究提供經(jīng)驗(yàn)和借鑒。
[Abstract]:With the reduction of the cost of gene sequencing and the need for gene big data by new technologies such as precise medical treatment and gene in-depth learning, we have entered an era of gene data explosion. Faced with such a large amount of gene data, How to store and transmit these data has become a hot topic in current research. The compression algorithm based on reference genome is widely used in gene banks because of its high compression ratio. At the same time, this kind of compression algorithm depends on reference gene data. This also seriously restricts the compression data sharing, merging and transferring applications produced by this kind of compression algorithms. This paper mainly focuses on the problem that different compressed gene banks can not be directly shared because of the different reference genes. A set of fast conversion of reference sequences based on compressed data of different reference genes is proposed. The main work includes: 1) classifying various gene compression algorithms. The characteristics of different gene compression algorithms are discussed respectively. Several new compression algorithms based on reference genome are analyzed in detail. (2) Compression based on reference genomes for the same compression algorithm and different reference sequences is analyzed in detail. The algorithm mainly uses the similarity between reference genome sequences to transfer the reference sequences. The experimental results show that the migration time is much lower than the original decompression recompression method. At the same time, it also points out the direction of the later research. (3) expand on the basis of "2". In view of the different compression algorithms and different reference sequence migration research, we select three compression algorithms to analyze and extract the commonalities, and combine the characteristics of the three compression algorithms. On the basis of fast migration algorithm, the compression ratio of post-migration compressed genes was increased. Two migration algorithms are designed to realize the mutual migration of the three compression algorithms, and the efficiency of the algorithm is verified by a large number of experiments. Finally, we implement a complete set of gene compression for the Godson platform. The function of migration and decompression of gene data management tool TReC. and its performance analysis, and then through the multi-process to optimize the performance of the TReC on the Godson platform, In this paper, based on the reference genome compression algorithm, two efficient migration algorithms are proposed, which have great advantages in migration time. These techniques can effectively alleviate the problem of migration between gene banks based on reference genome compression, and also provide experience and reference for further research.
【學(xué)位授予單位】:深圳大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q811.4

【相似文獻(xiàn)】

相關(guān)期刊論文 前1條

1 林小春;;中國(guó)科學(xué)家領(lǐng)銜“破譯”綿羊基因組[J];科技致富向?qū)?2014年17期

相關(guān)重要報(bào)紙文章 前2條

1 記者 白毅;人類腸道微生物最高質(zhì)量參考基因集數(shù)據(jù)庫(kù)問(wèn)世[N];中國(guó)醫(yī)藥報(bào);2014年

2 記者 馬芳;人類首獲自身參考基因組數(shù)據(jù)集合[N];南方日?qǐng)?bào);2010年

相關(guān)博士學(xué)位論文 前3條

1 SAMMINA MAHMOOD;[D];華中農(nóng)業(yè)大學(xué);2016年

2 易會(huì)廣;無(wú)參考基因組的比較基因組學(xué)研究[D];復(fù)旦大學(xué);2013年

3 陳庚;整合多層次數(shù)據(jù)多方位解析和注釋人類轉(zhuǎn)錄組[D];華東師范大學(xué);2014年

相關(guān)碩士學(xué)位論文 前4條

1 張雪瑩;小麥近等基因系白粉病抗性反應(yīng)的轉(zhuǎn)錄組分析[D];山東農(nóng)業(yè)大學(xué);2015年

2 吳欣欣;‘復(fù)瓣跳枝’梅花瓣呈色相關(guān)蛋白質(zhì)組與轉(zhuǎn)錄組分析[D];南京農(nóng)業(yè)大學(xué);2014年

3 譚云濤;運(yùn)用RAD(Restriction Site Associated DNA)技術(shù)構(gòu)建煙草高密度連鎖圖譜[D];昆明理工大學(xué);2016年

4 張義軍;參考基因壓縮庫(kù)間快速遷移算法研究[D];深圳大學(xué);2017年

,

本文編號(hào):1497322

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/1497322.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3e667***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com