DNA序列比對結(jié)果的存儲與壓縮
發(fā)布時間:2018-04-08 17:00
本文選題:DNA序列比對結(jié)果 切入點:存儲 出處:《復(fù)旦大學(xué)》2012年碩士論文
【摘要】:隨著生物信息學(xué)、分子生物學(xué)等學(xué)科研究的深入,以及人類基因計劃的完成,越來越多的人類基因和其他模式生命體的基因被測序。序列比對是處理測序結(jié)果的方法,可以發(fā)現(xiàn)生物序列之間存在的結(jié)構(gòu)、功能和進化的關(guān)系,是生物信息學(xué)的基礎(chǔ)。 隨著這些測序項目的展開,每天都有海量的DNA序列數(shù)據(jù)產(chǎn)生,DNA序列數(shù)據(jù)經(jīng)過序列比對處理,比對結(jié)果數(shù)據(jù)也隨之出現(xiàn)。雖然存儲設(shè)備的快速發(fā)展已經(jīng)在一定程度上緩解了相關(guān)數(shù)據(jù)量急劇膨脹的問題。然而隨著比對研究的深入,單純依靠增加硬件設(shè)備已經(jīng)無法滿足DNA比對結(jié)果數(shù)據(jù)量快速增長的需求,存儲和使用這些數(shù)據(jù)的成本也終將增加至無法承擔(dān)的規(guī)模。 下一代測序技術(shù)平臺(NGS)在很大程度上減少了測序的成本開銷,使得基因序列分析在實踐醫(yī)療場景之中的應(yīng)用成為可能。因此,不論是從存儲方面,還是應(yīng)用方面考慮,序列比對結(jié)果的壓縮在DNA數(shù)據(jù)的存儲、管理和傳輸中起到了重要作用。DNA序列數(shù)據(jù)的壓縮目前已經(jīng)引起了國內(nèi)外學(xué)術(shù)界的廣泛關(guān)注,然而,很少有學(xué)者研究如何在實際醫(yī)療場景下壓縮比對結(jié)果;虮葘Y(jié)果的存儲在未來的發(fā)展中仍面臨著巨大挑戰(zhàn)。 在本文中,我們從醫(yī)療場景的應(yīng)用角度出發(fā),設(shè)計出滿足需求的存儲結(jié)構(gòu),并在此基礎(chǔ)上設(shè)計出兩種不同的壓縮策略,以降低空間存儲代價。實驗數(shù)據(jù)表明,當(dāng)覆蓋率提升時,我們的壓縮方案略微優(yōu)于RAR標(biāo)準(zhǔn)壓縮和ZIP標(biāo)準(zhǔn)壓縮;谝陨戏椒ㄍ瓿闪恕癉NA序列比對結(jié)果存儲與壓縮系統(tǒng)”,系統(tǒng)實現(xiàn)了對海量DNA比對結(jié)果的存儲,并提供了圖形化界面。
[Abstract]:With the development of bioinformatics, molecular biology and other subjects, and the completion of human gene project, more and more genes of human genes and other model organisms have been sequenced.Although the rapid development of storage devices has to some extent alleviated the problem of the rapid expansion of related data.However, with the deepening of the comparative research, it is no longer possible to meet the demand of increasing the amount of data from DNA comparison results simply by increasing the hardware devices, and the cost of storing and using these data will eventually increase to an unaffordable scale.The next generation sequencing technology platform (NGS) greatly reduces the cost of sequencing, which makes the application of gene sequence analysis in practical medical scenarios possible.Therefore, whether in terms of storage or application, the compression of sequence alignment results in the storage of DNA data,The compression of DNA sequence data plays an important role in the field of management and transmission. At present, the compression of DNA sequence data has attracted extensive attention in academic circles at home and abroad. However, few scholars have studied how to compress the results in actual medical scenarios.The storage of gene comparison results is still facing great challenges in the future.In this paper, we design a storage structure to meet the requirements from the perspective of medical scenarios, and then design two different compression strategies to reduce the cost of space storage.Experimental data show that our compression scheme is slightly better than that of RAR standard and ZIP standard when coverage increases.Based on the above methods, a "DNA sequence alignment result storage and compression system" is completed. The system realizes the storage of massive DNA alignment results, and provides a graphical interface.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前1條
1 張春霆;生物信息學(xué)的現(xiàn)狀與展望[J];中國青年科技;2001年01期
,本文編號:1722517
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1722517.html
最近更新
教材專著