基于馬爾科夫覆蓋的基因交互檢測算法研究
發(fā)布時間:2019-01-14 16:33
【摘要】:貝葉斯網(wǎng)絡是一種基于概率推理的圖形化網(wǎng)絡,貝葉斯網(wǎng)絡中的馬爾科夫覆蓋則是一種根據(jù)變量間的關(guān)聯(lián)性條件來尋找與目標變量相關(guān)集合的方法。研究發(fā)現(xiàn)貝葉斯網(wǎng)絡中的馬爾科夫覆蓋方法適用于全基因關(guān)聯(lián)分析中的上位性檢測問題。近年來,一系列基于馬爾科夫覆蓋的上位性檢測算法被提出,但是對于大規(guī)模的全基因組數(shù)據(jù)來說,這些算法依舊存在檢測效率低和假陽性率高等問題。本文將就這些問題對基于馬爾科夫覆蓋的上位性檢測算法進行進一步研究。為了提高已有的基于馬爾科夫覆蓋的上位性檢測算法的性能,本文提出了一個優(yōu)化的基于馬爾科夫覆蓋的上位性檢測算法—OMBED(Optimized Markov Blanket for Epistasis Detection)算法。該算法共分為三個階段:移除階段、前向階段和后向階段。在移除階段中,根據(jù)條件獨立判斷移除候選集合中的無關(guān)變量;在前向階段中,利用G2測試值來衡量變量間關(guān)聯(lián)性強度,將關(guān)聯(lián)性強的變量加入到目標集合,將關(guān)聯(lián)性較弱的變量移出候選集合,最終得到最小的馬爾科夫覆蓋變量集合;在后向階段中,主要是移除馬爾科夫覆蓋集合中的假陽性變量。該算法在原有算法的前向階段通過對加入和移出變量操作進行了優(yōu)化,減少了變量G2測試次數(shù),降低了算法復雜度。在一系列模擬數(shù)據(jù)集以及真實數(shù)據(jù)集上的實驗結(jié)果表明,該算法具有較好的檢測效率,并降低了假陽性率。
[Abstract]:Bayesian network is a kind of graphical network based on probabilistic reasoning. Markov covering in Bayesian network is a method to find the correlation set of target variables according to the condition of correlation between variables. It is found that the Markov covering method in Bayesian networks is suitable for the epistasis detection in the whole gene association analysis. In recent years, a series of epistatic detection algorithms based on Markov covering have been proposed, but for large-scale genome data, these algorithms still have problems such as low detection efficiency and high false positive rate. In this paper, we will further study the epistatic detection algorithm based on Markov covering. In order to improve the performance of the existing epistatic detection algorithm based on Markov covering, this paper proposes an optimized epistatic detection algorithm based on Markov covering, called OMBED (Optimized Markov Blanket for Epistasis Detection) algorithm. The algorithm is divided into three stages: removal phase, forward phase and backward phase. In the removal stage, the independent variables in the candidate set are determined independently according to the condition. In the forward phase, the G _ 2 test value is used to measure the correlation strength between variables, the highly correlated variables are added to the target set, the weaker variables are moved out of the candidate set, and the minimum Markov covering variable set is obtained. In the backward phase, the false positive variables in Markov overlay sets are mainly removed. The algorithm optimizes the operations of adding and removing variables in the forward phase of the original algorithm, reduces the number of G2 tests of variables and reduces the complexity of the algorithm. The experimental results on a series of simulated data sets and real data sets show that the algorithm has better detection efficiency and reduces the false positive rate.
【學位授予單位】:湖南師范大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP18
,
本文編號:2408880
[Abstract]:Bayesian network is a kind of graphical network based on probabilistic reasoning. Markov covering in Bayesian network is a method to find the correlation set of target variables according to the condition of correlation between variables. It is found that the Markov covering method in Bayesian networks is suitable for the epistasis detection in the whole gene association analysis. In recent years, a series of epistatic detection algorithms based on Markov covering have been proposed, but for large-scale genome data, these algorithms still have problems such as low detection efficiency and high false positive rate. In this paper, we will further study the epistatic detection algorithm based on Markov covering. In order to improve the performance of the existing epistatic detection algorithm based on Markov covering, this paper proposes an optimized epistatic detection algorithm based on Markov covering, called OMBED (Optimized Markov Blanket for Epistasis Detection) algorithm. The algorithm is divided into three stages: removal phase, forward phase and backward phase. In the removal stage, the independent variables in the candidate set are determined independently according to the condition. In the forward phase, the G _ 2 test value is used to measure the correlation strength between variables, the highly correlated variables are added to the target set, the weaker variables are moved out of the candidate set, and the minimum Markov covering variable set is obtained. In the backward phase, the false positive variables in Markov overlay sets are mainly removed. The algorithm optimizes the operations of adding and removing variables in the forward phase of the original algorithm, reduces the number of G2 tests of variables and reduces the complexity of the algorithm. The experimental results on a series of simulated data sets and real data sets show that the algorithm has better detection efficiency and reduces the false positive rate.
【學位授予單位】:湖南師范大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP18
,
本文編號:2408880
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/2408880.html
最近更新
教材專著