天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 基因論文 >

基因表達(dá)譜缺失數(shù)據(jù)填補(bǔ)融合方法及策略研究

發(fā)布時(shí)間:2018-09-12 16:55
【摘要】:研究背景和意義:基因表達(dá)譜缺失數(shù)據(jù)的大量存在,嚴(yán)重影響后續(xù)分析結(jié)果的準(zhǔn)確性;如何根據(jù)已有數(shù)據(jù)集特征進(jìn)行缺失數(shù)據(jù)的有效填補(bǔ)及策略構(gòu)建和不同填補(bǔ)方法對基因表達(dá)譜后續(xù)分析目的影響評估是功能基因組學(xué)和腫瘤基因組學(xué)研究中極具重要科學(xué)意義的研究內(nèi)容,也是統(tǒng)計(jì)學(xué)和生物信息學(xué)中數(shù)據(jù)分析研究領(lǐng)域的重難點(diǎn)。上述問題的有效解決,使得分析技術(shù)的性能很有可能因?yàn)楦鼮榫_的缺失填補(bǔ)分析策略而進(jìn)一步提高,使得研究者可以更好的利用基因表達(dá)譜數(shù)據(jù)的信息,更為有效地進(jìn)行疾病診斷與治療。研究方法:采用統(tǒng)計(jì)學(xué)、計(jì)算機(jī)科學(xué)和生物醫(yī)學(xué)等多個(gè)交叉學(xué)科的理論研究方法和文獻(xiàn)研究方法,對課題的主要內(nèi)容進(jìn)行探索與證實(shí)。具體通過將基于支持向量回歸的非參多重填補(bǔ)融合方法和非參缺失森林填補(bǔ)法對6個(gè)不同缺失機(jī)制下、不同缺失比例下的不同序列類型的基因表達(dá)譜缺失數(shù)據(jù)進(jìn)行估計(jì)和填補(bǔ),并將填補(bǔ)結(jié)果與K鄰近距離法、貝葉斯主成分分析法和多重填補(bǔ)方法進(jìn)行比較;在一定填補(bǔ)策略構(gòu)建原則的基礎(chǔ)上,結(jié)合不同填補(bǔ)方法的性能,以構(gòu)建不同序列數(shù)據(jù)集、不同缺失產(chǎn)生機(jī)制、不同缺失比例情況下的填補(bǔ)策略,并闡明不同填補(bǔ)方法對基因表達(dá)譜后續(xù)不同分析目的的生物學(xué)影響。研究結(jié)果:(1)對于不同特點(diǎn)的基因表達(dá)譜缺失數(shù)據(jù)集分別使用五種方法來填補(bǔ),通過比較分析后我們發(fā)現(xiàn):標(biāo)準(zhǔn)化均方根誤差隨著缺失比例的增加而增大:非時(shí)間序列的肝癌數(shù)據(jù)集,缺失比例為30%時(shí),貝葉斯主成分分析法、K鄰近距離法、非參缺失森林法、蒙特卡洛多重填補(bǔ)法和基于支持向量回歸的非參多重填補(bǔ)法的標(biāo)準(zhǔn)化均方根誤差(Normalized Root Mean Square Error,NRMSE)分別為0.2877、0.3335、0.2018、0.2550和0.1621;隨機(jī)缺失下時(shí)間序列的乳腺癌數(shù)據(jù)的缺失比例為20%時(shí),五種填補(bǔ)法的NRMSE依次為0.1810、0.3874、0.0780、0.0917和0.0744;非時(shí)間序列的淋巴癌數(shù)據(jù)集,缺失比例為10%時(shí),五種填補(bǔ)方法的類結(jié)構(gòu)保持度(Conserved Pairs Proportion,CPP)值依次為0.8762、0.8753、0.8972、0.8811和0.9797。總體上,支持向量回歸的非參多重填補(bǔ)法(Support Vector Regression Nonparametric Multiple Imputation,SVR-NPMI)的表現(xiàn)較為穩(wěn)健、填補(bǔ)效果最好,其次為非參缺失森林填補(bǔ)法、多重填補(bǔ)法,K鄰近距離法效果最差,其它數(shù)據(jù)集的填補(bǔ)效果與這兩個(gè)數(shù)據(jù)集一致。(2)類結(jié)構(gòu)保持度隨著數(shù)據(jù)集缺失比例的增大而呈現(xiàn)下降的趨勢,如果運(yùn)用不恰當(dāng)?shù)奶钛a(bǔ)方法會對后續(xù)基因表達(dá)譜的研究起誤導(dǎo)性作用,不同的填補(bǔ)方法中,SVR-NPMI的表現(xiàn)較為穩(wěn)健,使用SVR-NPMI填補(bǔ)數(shù)據(jù)集的聚類效果優(yōu)于其它四種方法。(3)通過實(shí)例分析,總結(jié)了不同基因表達(dá)譜缺失數(shù)據(jù)集的填補(bǔ)策略,SVR-NPMI方法在各種因素影響下都有較好的填補(bǔ)效果,但該方法計(jì)算復(fù)雜度高,填補(bǔ)時(shí)間長;非參缺失森林方法在基因少、實(shí)驗(yàn)條件多的基因表達(dá)譜數(shù)據(jù)集中可以取得較好的填補(bǔ)結(jié)果;MI方法在基因表達(dá)譜缺失數(shù)據(jù)集呈現(xiàn)正態(tài)、低維特征且缺失比例低的情況下填補(bǔ)效果可以接受;貝葉斯主成分分析法和K鄰近距離法的填補(bǔ)效果是否優(yōu)劣則與重要參數(shù)的選擇有關(guān)。研究結(jié)論:本研究提出的SVR-NPMI融合方法發(fā)展和豐富了基因表達(dá)譜缺失數(shù)據(jù)的填補(bǔ)模型,推動了生物信息學(xué)技術(shù)分析領(lǐng)域中新方法的發(fā)展,為生物醫(yī)學(xué)等領(lǐng)域大數(shù)據(jù)的分析提供方法學(xué)的借鑒和參考,具有重要的學(xué)術(shù)理論價(jià)值;首次構(gòu)建的針對基因表達(dá)譜缺失數(shù)據(jù)的填補(bǔ)分析策略和開發(fā)的《基因表達(dá)譜缺失數(shù)據(jù)填補(bǔ)分析系統(tǒng)》軟件,可以幫助研究者更好更快的確定適合其數(shù)據(jù)集的填補(bǔ)方法,更為方便快捷地進(jìn)行數(shù)據(jù)分析,提供參考與服務(wù)。
[Abstract]:BACKGROUND AND SIGNIFICANCE: The large number of missing data in gene expression profiles seriously affects the accuracy of subsequent analysis results; how to effectively fill the missing data according to the characteristics of existing data sets and how to construct strategies and evaluate the impact of different filling methods on subsequent analysis of gene expression profiles are functional genomics and tumor genomics. The effective solution of these problems makes it possible for the performance of analytical techniques to be further improved by more precise missing fill analysis strategies, so that researchers can make better use of genes. Methods: The main contents of the subject were explored and verified by the theoretical research methods and literature research methods of statistics, computer science and biomedicine, etc. The non-parametric Multi-Filling method based on support vector regression was used. Fusion method and non-parametric deletion forest filling method were used to estimate and fill the missing data of gene expression profiles of different sequence types under six different deletion mechanisms and different deletion ratios. The filling results were compared with K-nearest distance method, Bayesian principal component analysis and multiple filling method. On the basis of this, we constructed different sequence datasets, different deletion mechanisms and different deletion ratios, and clarified the biological effects of different filling methods on different analysis purposes of gene expression profiles. Five methods are used to fill the set. After comparative analysis, we find that the normalized root mean square error increases with the increase of the missing ratio: Bayesian principal component analysis, K-nearest distance method, non-parametric missing forest method, Monte Carlo multiple filling method and support-based method when the missing ratio is 30%. Normalized Root Mean Square Error (NRMSE) of vector regression was 0.2877, 0.3335, 0.2018, 0.2550 and 0.1621, respectively; NRMSE of the five filling methods was 0.1810, 0.3874, 0.0780, 0.0917 and 0.0744 respectively when the missing rate of breast cancer data was 20% in random missing time series; NRMSE of the five filling methods was 0. The conserved Pairs Proportion (CPP) values of the five filling methods were 0.8762, 0.8753, 0.8972, 0.8811 and 0.9797 respectively when the missing ratio was 10%. At present, it is more robust, filling effect is the best, followed by non-parametric missing forest filling method, multiple filling method, K proximity distance method is the worst, and other data sets filling effect is consistent with the two data sets. (2) Class structure retention shows a downward trend with the increase of missing data sets, if the use of inappropriate filling method will be. Follow-up gene expression profiles play a misleading role, different filling methods, SVR-NPMI performance is more robust, using SVR-NPMI to fill the data set clustering effect is better than the other four methods. (3) Through case analysis, summarized the filling strategies of different gene expression profiles missing data sets, SVR-NPMI method under the influence of various factors. It has a good filling effect, but the method has a high computational complexity and a long filling time; the non-parametric deletion forest method can get a good filling result in the gene expression spectrum dataset with few genes and more experimental conditions; the MI method can get a good filling effect in the case of the missing gene expression spectrum dataset showing normal, low-dimensional characteristics and low missing ratio. Conclusion: The SVR-NPMI fusion method proposed in this study has developed and enriched the filling model of missing data in gene expression profiles, and promoted the development of new methods in the field of bioinformatics technology analysis. It has important academic and theoretical value to provide methodological references for the analysis of large data in the fields of biomedicine and other fields. The strategy of filling and analyzing missing data in gene expression profiles and the software of "filling and analyzing system of missing data in gene expression profiles" developed for the first time can help researchers better and faster determine the data set suitable for them. The filling method is more convenient and quick for data analysis, providing reference and service.
【學(xué)位授予單位】:第三軍醫(yī)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:R3416

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 吳小姣;李高明;易大莉;劉嶺;張彥琦;易東;伍亞舟;;基因表達(dá)譜的非參缺失森林填補(bǔ)算法研究[J];中國衛(wèi)生統(tǒng)計(jì);2016年06期

2 武瑞仙;鄧子兵;譙治蛟;李曉松;;利用Monte Carlo技術(shù)模擬研究不同缺失值處理方法對完全隨機(jī)缺失數(shù)據(jù)的處理效果[J];中國衛(wèi)生統(tǒng)計(jì);2015年03期

3 康茜;李德玉;王素格;冀慶斌;;傳播過程中信號缺失的層次聚類社區(qū)發(fā)現(xiàn)算法[J];計(jì)算機(jī)工程與應(yīng)用;2015年09期

4 沈琳;胡國清;陳立章;譚紅專;;缺失森林算法在缺失值填補(bǔ)中的應(yīng)用[J];中國衛(wèi)生統(tǒng)計(jì);2014年05期

5 黃健斌;康劍梅;齊俊杰;孫鶴立;;一種基于同步動力學(xué)模型的層次聚類方法[J];中國科學(xué):信息科學(xué);2013年05期

6 杜文久;孫勝亮;原坤;;改進(jìn)的MCMC算法—DSY算法及其在估計(jì)IRT模型參數(shù)中的應(yīng)用[J];心理科學(xué);2013年03期

7 鄧明;;基于GMM的缺失數(shù)據(jù)回歸模型的半?yún)?shù)估計(jì)[J];統(tǒng)計(jì)與信息論壇;2013年03期

8 帥平;李曉松;周曉華;劉玉萍;;缺失數(shù)據(jù)統(tǒng)計(jì)處理方法的研究進(jìn)展[J];中國衛(wèi)生統(tǒng)計(jì);2013年01期

9 王鳳梅;胡麗霞;;一種基于近鄰規(guī)則的缺失數(shù)據(jù)填補(bǔ)方法[J];計(jì)算機(jī)工程;2012年21期

10 鄒薇;王會進(jìn);;基于樸素貝葉斯的EM缺失數(shù)據(jù)填充算法[J];微型機(jī)與應(yīng)用;2011年16期

相關(guān)碩士學(xué)位論文 前3條

1 尹婷婷;基因表達(dá)譜識別算法研究[D];南京林業(yè)大學(xué);2015年

2 金連;不完全數(shù)據(jù)中缺失值填充關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年

3 袁中萸;多元線性回歸模型中缺失數(shù)據(jù)填補(bǔ)方法的效果比較[D];中南大學(xué);2008年

,

本文編號:2239636

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/2239636.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶45513***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
亚洲男女性生活免费视频| 日韩黄片大全免费在线看| 午夜精品国产精品久久久| 国产精品九九九一区二区| 国产在线成人免费高清观看av| 亚洲欧美日韩精品永久| 青青操精品视频在线观看| 日韩精品成区中文字幕| 久久精品色妇熟妇丰满人妻91| 六月丁香六月综合缴情| 一区二区三区国产日韩| 免费午夜福利不卡片在线 视频| 好吊妞在线免费观看视频| 熟女乱一区二区三区四区| 最近中文字幕高清中文字幕无| 国产av精品一区二区| 99国产精品国产精品九九 | 丝袜诱惑一区二区三区| 亚洲伦理中文字幕在线观看| 欧美乱妇日本乱码特黄大片| 日本成人三级在线播放| 五月天丁香婷婷一区二区| 日本人妻精品有码字幕| 五月婷婷六月丁香亚洲| 91在线爽的少妇嗷嗷叫| 日本办公室三级在线观看| 麻豆最新出品国产精品| 欧美大黄片在线免费观看| 日韩中文字幕免费在线视频| 激情偷拍一区二区三区视频| 开心久久综合激情五月天| 91精品欧美综合在ⅹ| 国产精品乱子伦一区二区三区 | 久久99爱爱视频视频| 草草视频福利在线观看| 免费性欧美重口味黄色| 国产亚洲欧美一区二区| 午夜视频免费观看成人| 亚洲天堂久久精品成人| 欧美日本道一区二区三区| 欧美黑人暴力猛交精品|