天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 基因論文 >

基于Hadoop基因序列比對(duì)BWT索引的建立方法研究

發(fā)布時(shí)間:2018-02-04 12:12

  本文關(guān)鍵詞: 基因序列 BWT索引 后綴數(shù)組 Hadoop MapReduce 出處:《內(nèi)蒙古農(nóng)業(yè)大學(xué)》2016年碩士論文 論文類型:學(xué)位論文


【摘要】:由于基因數(shù)據(jù)的增長(zhǎng)速度飛快,人工進(jìn)行序列比對(duì)已經(jīng)無法滿足科研人員的需求,那么機(jī)器比對(duì)已經(jīng)走上了舞臺(tái),基因序列比對(duì)是基因數(shù)據(jù)分析和處理的基礎(chǔ)。而現(xiàn)在的序列比對(duì)算法大致分為兩類,一類是精確比對(duì)算法,另一類是模糊比對(duì)算法。目前,大部分的基因序列比對(duì)方法都是啟發(fā)式算法,該類算法大致分為兩步:建立索引和序列比對(duì),所以無論是精確比對(duì)算法和非精確比對(duì)算法都離不開索引結(jié)構(gòu)。由此可見,建立索引是基因序列比對(duì)算法的重要步驟,常見的索引構(gòu)建算法大致分為兩類,一類是基于哈希表的算法,另一類是基于后綴樹或后綴數(shù)組的算法。而BWT(Burrows-Wheeler Transform)索引是基于后綴數(shù)組中比較重要的索引結(jié)構(gòu)。目前,構(gòu)建較大基因組序列(例如,人類基因組序列)的BWT索引需要幾個(gè)小時(shí)的串行計(jì)算。本文提出一種基于Hadoop的并行計(jì)算方法構(gòu)建后綴數(shù)組和BWT索引。算法使用MapReduce的數(shù)據(jù)處理功能,并且更改了原有的使用哈希方式的Partitioner,本文使用直接分配任務(wù)來建立索引。本文依次將基因鏈?zhǔn)椎囊粋(gè)堿基輪轉(zhuǎn)到基因鏈尾并與鏈尾的17個(gè)字符形成一個(gè)Key以及相應(yīng)的Map任務(wù),將這些Map任務(wù)根據(jù)新改寫Partitioner分配給Reduce。最終得到全序的后綴數(shù)組和BWT索引,減少建立索引的時(shí)間。通過實(shí)驗(yàn)數(shù)據(jù)表明,本文提出的方法可以節(jié)省索引構(gòu)建的時(shí)間,達(dá)到了預(yù)期目的,并驗(yàn)證了算法的有效性。
[Abstract]:Because of the rapid growth of genetic data, artificial sequence alignment has been unable to meet the needs of researchers, so machine alignment has been on the stage. Sequence alignment is the basis of gene data analysis and processing. Now, sequence alignment algorithms are divided into two categories, one is accurate alignment algorithm, the other is fuzzy alignment algorithm. Most gene sequence alignment methods are heuristic algorithms, which can be divided into two steps: indexing and sequence alignment. Therefore, both accurate alignment algorithm and imprecise alignment algorithm can not be separated from index structure. Thus, building index is an important step of gene sequence alignment algorithm, and the common index construction algorithm can be divided into two categories. One is the algorithm based on hash table. The other is the algorithm based on suffix tree or suffix array. And BWT(Burrows-Wheeler transform). Index is based on the suffix array of the more important index structure. Construction of large genome sequences (for example. Human genome sequence. This paper presents a parallel computing method based on Hadoop to construct suffix array and BWT index. The algorithm uses the data processing of MapReduce. Function. And changed the original hash Partitioner. In this paper, the direct assignment task is used to build the index. In this paper, one base of the first gene chain is rotated to the tail of the gene chain and a Key and the corresponding Map task are formed with the 17 characters of the tail. Assign these Map tasks to Reducebased on the new overwrite Partitioner, resulting in a fully ordered suffix array and BWT index. The experimental data show that the proposed method can save the time of index construction, achieve the expected purpose, and verify the effectiveness of the algorithm.
【學(xué)位授予單位】:內(nèi)蒙古農(nóng)業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3
,

本文編號(hào):1490179

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/1490179.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶79765***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
99视频精品免费视频| 91超频在线视频中文字幕| 少妇人妻一级片一区二区三区| 亚洲中文字幕高清乱码毛片| 国产日产欧美精品视频| 日本加勒比在线观看不卡| 色无极东京热男人的天堂| 午夜精品久久久免费视频| 欧美一区二区三区播放| 国产又大又猛又粗又长又爽| 一区二区在线激情视频| 日韩黄色一级片免费收看| 国产高清一区二区白浆| 高清免费在线不卡视频| 国产欧美日韩精品成人专区| 在线免费看国产精品黄片| 欧美尤物在线观看西比尔| 久久热中文字幕在线视频| 丰满人妻熟妇乱又乱精品古代| 91人妻久久精品一区二区三区 | 亚洲国产av在线视频| 亚洲性日韩精品一区二区| 国产成人av在线免播放观看av| 大香蕉伊人精品在线观看| 中文字幕一区二区三区中文| 五月婷婷缴情七月丁香| 精品午夜福利无人区乱码| 日本丁香婷婷欧美激情| 久久久精品区二区三区| 在线日韩欧美国产自拍| 国产又黄又爽又粗视频在线| 日韩一级一片内射视频4k| 日本男人女人干逼视频| 加勒比人妻精品一区二区| 日韩精品免费一区二区三区| 91熟女大屁股偷偷对白| 免费精品国产日韩热久久| 久久精品国产亚洲av麻豆尤物 | 亚洲一区二区三区三区| 日韩一区二区三区18| 丰满的人妻一区二区三区|