一種堿基精度的腫瘤基因組單體型異質性識別算法
發(fā)布時間:2019-01-08 16:41
【摘要】:針對腫瘤組織的異質性的子克隆解析,提出了一種通過多級子克隆的體細胞突變模式來識別單體型異質性的算法。該算法基于腫瘤組織的多文庫測序數據提取文庫特征和雙末端讀段約束,通過對體細胞突變位點的等位基因變異頻率進行聚類估算出子克隆數目的一個先驗;同時設計了一種拼接識別算法,通過遍歷位點對應的讀段來拼接單體型序列,拼接出的單體型序列的精度為堿基水平;采用后驗概率的最大似然估計解出子克隆的個數、配比及演化關系。仿真實驗表明,當基礎文庫滿足一定測序覆蓋度時,該算法對單體型異質性的識別精度可達到99%以上,能夠取代目前數據分析中常用的兩步法,且獲得高精確的識別結果。
[Abstract]:Based on the analysis of the heterogeneity of tumor tissues, an algorithm is proposed to identify haplotype heterogeneity by the somatic mutation pattern of multistage cloning. Based on the multilibrary sequencing data of tumor tissues, the algorithm extracts library features and double-terminal reading constraints, and estimates a priori number of clones by clustering the allelic mutation frequency of somatic mutation sites. At the same time, a splicing recognition algorithm is designed, which splicing haplotype sequences by traversing the corresponding reading segments of the site, and the accuracy of the stitched haplotype sequences is base level. The maximum likelihood estimation of the posterior probability is used to calculate the number, ratio and evolution of the subclones. The simulation results show that the recognition accuracy of the algorithm for haplotype heterogeneity can reach more than 99% when the base library meets a certain sequencing coverage, which can replace the two-step method commonly used in data analysis and obtain high accurate recognition results.
【作者單位】: 錦州醫(yī)科大學公共基礎學院;西安交通大學計算機科學與技術系;西安交通大學陜西省醫(yī)療健康大數據工程研究中心;第四軍醫(yī)大學腫瘤生物學國家重點實驗室;西安交通大學管理學院;
【基金】:國家自然科學基金資助項目(81400632) 陜西省自然科學基金資助項目(2014JM8350) 中央高校基本科研業(yè)務費專項資金資助項目(GLIJ002)
【分類號】:R73-3
本文編號:2404860
[Abstract]:Based on the analysis of the heterogeneity of tumor tissues, an algorithm is proposed to identify haplotype heterogeneity by the somatic mutation pattern of multistage cloning. Based on the multilibrary sequencing data of tumor tissues, the algorithm extracts library features and double-terminal reading constraints, and estimates a priori number of clones by clustering the allelic mutation frequency of somatic mutation sites. At the same time, a splicing recognition algorithm is designed, which splicing haplotype sequences by traversing the corresponding reading segments of the site, and the accuracy of the stitched haplotype sequences is base level. The maximum likelihood estimation of the posterior probability is used to calculate the number, ratio and evolution of the subclones. The simulation results show that the recognition accuracy of the algorithm for haplotype heterogeneity can reach more than 99% when the base library meets a certain sequencing coverage, which can replace the two-step method commonly used in data analysis and obtain high accurate recognition results.
【作者單位】: 錦州醫(yī)科大學公共基礎學院;西安交通大學計算機科學與技術系;西安交通大學陜西省醫(yī)療健康大數據工程研究中心;第四軍醫(yī)大學腫瘤生物學國家重點實驗室;西安交通大學管理學院;
【基金】:國家自然科學基金資助項目(81400632) 陜西省自然科學基金資助項目(2014JM8350) 中央高校基本科研業(yè)務費專項資金資助項目(GLIJ002)
【分類號】:R73-3
【相似文獻】
相關期刊論文 前1條
1 林東昕,孫瞳;單體型在腫瘤研究中的應用與展望[J];中華腫瘤雜志;2005年05期
相關會議論文 前2條
1 孫瞳;高揚;譚文;馬素芳;張雪梅;王永崗;張清潤;郭永麗;趙丹;曾長青;林東昕;;染色體11q22基質金屬蛋白酶基因簇單體型與肺癌發(fā)生發(fā)展風險(英文)[A];第四屆中國腫瘤學術大會暨第五屆海峽兩岸腫瘤學術會議教育集[C];2006年
2 周翊峰;林東昕;;中國人基質金屬蛋白酶2基因啟動子區(qū)單體型與食管癌風險及功能研究[A];第三屆中國腫瘤學術大會教育論文集[C];2004年
相關博士學位論文 前2條
1 武金才;腫瘤轉移抑制基因HTPAP單體型與肝癌轉移潛能的關系[D];復旦大學;2008年
2 余志杰;Tim-3基因多態(tài)性與湖北漢族人群AML的關聯(lián)分析研究[D];華中科技大學;2014年
相關碩士學位論文 前1條
1 黃萌;XRCC3基因多態(tài)性、環(huán)境與肺癌的關聯(lián)[D];福建醫(yī)科大學;2010年
,本文編號:2404860
本文鏈接:http://sikaile.net/yixuelunwen/zlx/2404860.html
最近更新
教材專著