天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 醫(yī)學(xué)論文 > 腫瘤論文 >

腫瘤信息基因選擇與分類方法研究

發(fā)布時(shí)間:2019-04-04 11:23
【摘要】:腫瘤是多基因與環(huán)境共同作用的結(jié)果,大規(guī);虮磉_(dá)譜技術(shù)的出現(xiàn)及其飛速發(fā)展為腫瘤研究提供了一種全新的技術(shù)平臺(tái)。基于基因表達(dá)譜的數(shù)據(jù)挖掘?qū)χ虏』虬l(fā)現(xiàn)、腫瘤臨床診斷、藥物療效判斷和發(fā)病機(jī)理闡明等意義重大。腫瘤基因表達(dá)譜數(shù)據(jù)多具特征維數(shù)高、樣本小或相對(duì)小、樣本背景差異大、存在批次效應(yīng)等非隨機(jī)噪聲、冗余度高、非線性、基因間存在互作效應(yīng)等特點(diǎn),傳統(tǒng)的統(tǒng)計(jì)方法和模式識(shí)別方法應(yīng)用受限。本文針對(duì)基因表達(dá)數(shù)據(jù)特點(diǎn),圍繞信息基因選擇方法和分類器構(gòu)建展開研究,主要結(jié)果如下:(1)基于支持向量機(jī)發(fā)展了高維特征選擇新方法二元矩陣重排過濾器BMSF (Binary Matrix Shift Filter)。大多數(shù)信息基因選擇方法只考慮單個(gè)基因或成對(duì)基因的作用,卻未考慮多個(gè)基因之間的相互作用。本文提出的BMSF算法綜合考慮了多基因間的互作關(guān)系,通過引入隨機(jī)產(chǎn)生的一個(gè)中間(0,1)二元矩陣,將分類問題轉(zhuǎn)化為回歸問題,實(shí)現(xiàn)了核函數(shù)參數(shù)尋優(yōu)前提下基于支持向量機(jī)的高維特征選擇。在基因選擇過程中,對(duì)保留在模型中的基因子集根據(jù)其在腫瘤分類中對(duì)其他基因的貢獻(xiàn)情況進(jìn)行遞歸優(yōu)化并反復(fù)更新。對(duì)9個(gè)癌基因表達(dá)二分類數(shù)據(jù)集, BMSF均以較小的信息基因子集獲得了遠(yuǎn)優(yōu)于文獻(xiàn)報(bào)道的留一法預(yù)測(cè)精度,所選信息基因子集能同時(shí)提高多個(gè)分類器的留一法預(yù)測(cè)精度。(2)基于卡方測(cè)驗(yàn)發(fā)展了魯棒的高維特征選擇與無需訓(xùn)練的直接分類新算法TSG(Top-scoring genes)。預(yù)測(cè)精度既與特征選擇有關(guān),又受分類器的影響;訓(xùn)練是多數(shù)分類器產(chǎn)生過擬合的主要原因。主流算法TSP (Top score pairs)家族既是特征選擇方法又是分類器,本文克服TSP不能反映樣本大小、所選信息基因恒為偶數(shù)個(gè)、多分類時(shí)算法繁瑣等缺陷,提出TSG算法。TSG提出并實(shí)現(xiàn)了基于轉(zhuǎn)導(dǎo)推理、無需訓(xùn)練的直接分類,其決策過程為:先假定某個(gè)待測(cè)樣本屬于正(+)類,合并待測(cè)樣本與訓(xùn)練樣本得卡方值Chi+;再假定待測(cè)樣本屬于負(fù)(-)類,合并待測(cè)樣本與訓(xùn)練樣本得卡方值Chi-;如Chi+ Chi-,則待測(cè)樣本屬于正類,反之,則判為負(fù)類。多分類類推。TSG的特征選擇過程為:先選取出得分最高的基因?qū)S2作為初始信息基因子集,接著每次從剩余的基因中挑選一個(gè)與已入選基因聯(lián)合效應(yīng)最好的基因添加到信息基因子集中,并根據(jù)訓(xùn)練集的留一法精度自動(dòng)確定最終的信息基因子集。TSG對(duì)9個(gè)二分類和10個(gè)多分類數(shù)據(jù)獨(dú)立預(yù)測(cè)均獲得了明顯優(yōu)于文獻(xiàn)報(bào)道的結(jié)果,特別是其訓(xùn)練集留一法預(yù)測(cè)精度與獨(dú)立測(cè)試集預(yù)測(cè)精度相當(dāng)接近,在部分?jǐn)?shù)據(jù)集上獨(dú)立測(cè)試精度甚至優(yōu)于訓(xùn)練集留一法預(yù)測(cè)精度,顯示TSG獨(dú)特的、無需訓(xùn)練的直接分類能有效控制過擬合。(3)基于互作與卡方測(cè)驗(yàn)發(fā)展了信息基因選擇新方法χ~2-IRG-DC (Chi-square test-based Integrated Rank Gene and Direct Classifier).χ~2-IRG-DC特征選擇過程為:先利用單基因卡方值和成對(duì)基因互作卡方值,計(jì)算基因的綜合加權(quán)得分,得基因的重要性排序;再基于χ~2-DC分類器序貫引入排序基因,并依訓(xùn)練集的留一法精度為第一標(biāo)準(zhǔn)、卡方增益為第二標(biāo)準(zhǔn)去冗余,獲得了更為魯棒的信息基因子集;最后基于χ~2-DC和信息基因?qū)嵤┆?dú)立預(yù)測(cè)。χ~2-IRG-DC繼承TSG優(yōu)點(diǎn)的同時(shí),進(jìn)一步通過基因綜合加權(quán)評(píng)分大幅降低了算法復(fù)雜度,通過引入第二標(biāo)準(zhǔn)卡方增益增強(qiáng)了特征選擇的魯棒性。對(duì)9個(gè)二分類和10個(gè)多分類腫瘤基因表達(dá)譜數(shù)據(jù)集的獨(dú)立預(yù)測(cè)精度表明,χ~2-IRG-DC模型明顯優(yōu)于文獻(xiàn)報(bào)道;作為特征選擇方法,χ~2-IRG-DC明顯優(yōu)于mRMR、SVM-RFE、HC-K-TSP、TSG等四種參比特征選擇方法;作為分類器,χ~2-DC明顯優(yōu)于NB、KNN等參比分類器,與SVM分類器性能可比。本文方法對(duì)于推進(jìn)高維數(shù)據(jù)特征選擇和腫瘤分類識(shí)別具有重要理論意義和實(shí)用價(jià)值。
[Abstract]:The development of large-scale gene expression profile and its rapid development provide a brand-new technology platform for tumor research. The data mining based on the gene expression profile is of great significance in the discovery of pathogenic genes, the clinical diagnosis of the tumor, the judgment of the curative effect of the drugs and the mechanism of the pathogenesis. The tumor gene expression profile data has the characteristics of high characteristic dimension, small sample size or relatively small sample background, large sample background difference, high redundancy, non-linearity, interaction effect between genes, and the like, and the traditional statistical method and the pattern recognition method are limited in application. In this paper, based on the characteristics of gene expression data, the research on the selection method of information gene and the construction of the classifier is carried out. The main results are as follows: (1) The binary matrix rearrangement filter BMSF (Binary Matrix Shift Filter) of high-dimensional feature selection is developed based on the support vector machine. Most of the information gene selection methods only take into account the action of a single gene or a pair of genes, but do not take into account the interaction between multiple genes. The BMSF algorithm proposed in this paper comprehensively considers the interaction between multi-genes, and transforms the classification problem into the regression problem by introducing an intermediate (0,1) binary matrix which is randomly generated, and realizes the high-dimensional feature selection based on the support vector machine under the premise of the optimization of the kernel function parameters. In the gene selection process, a subset of the genes remaining in the model is recursively optimized and updated repeatedly according to their contribution to other genes in the tumor classification. For 9 oncogene expression two-class data sets, BMSF is far superior to the one-way prediction accuracy of the literature report with a small subset of information genes, and the selected subset of information genes can improve the prediction accuracy of a plurality of classifiers at the same time. (2) The robust high-dimensional feature selection is developed based on the chi-square test and the new algorithm TSG (Top-scanning genes) without training is developed. The prediction accuracy is not only related to feature selection but also the influence of the classifier; the training is the main cause of the overfitting of most classifiers. The main stream algorithm (TSP) family is not only a feature selection method but also a classifier. In this paper, a TSG algorithm is proposed to overcome the defects such as the size of the sample, the constant number of the selected information genes and the fussy algorithm of the multi-classification. TSG puts forward and realizes the direct classification based on the transfer reasoning and does not need training, and the decision process comprises the following steps of: assuming that a sample to be detected belongs to a positive (+) class, combining the sample to be detected and the training sample to obtain a square value Chi +; and then, assuming that the sample to be tested belongs to a negative (-) class, And combining the sample to be detected and the training sample to obtain a square value Chi-; for example, Chi + Chi-, the sample to be tested belongs to a positive class, and vice versa. And so on. The characteristic selection process of the TSG is that the gene with the highest score is selected as a subset of the initial information genes, and then a gene with the best combination effect with the selected gene is selected from the remaining genes to be added to the information gene subset at a time, And the final information gene subset is automatically determined according to the retention-one method precision of the training set. TSG has obtained the results of independent prediction of 9 two-class and 10 multi-classification data, especially the prediction accuracy of the training set-keeping method is very close to that of the independent test set. The independent test precision on some data sets is even better than that of the training set, which shows that the TSG is unique, and the direct classification without training can effectively control the over-fitting. (3) The new method of selection of information gene was developed based on the interaction and the chi-square test (Chi-square test-based Integrated Rank Gene and Direct Classifier). the 1-2-IRG-DC feature selection process comprises the following steps of: firstly, using a single gene card square value and a pair of gene interaction card square values to calculate the comprehensive weighted score of the gene to obtain the importance of the gene; and sequentially introducing the sequencing gene based on the 1-2-DC classifier, and the first standard according to the retention-one method of the training set, The chi-square gain is the second standard deredundancy, and a more robust subset of information genes is obtained; and finally, independent prediction is carried out on the basis of the 1-2-DC and the information genes. In the meantime, the complexity of the algorithm is greatly reduced by the comprehensive weighted score of the gene, and the robustness of the feature selection is enhanced by introducing the second standard square-square gain. The independent prediction accuracy of 9 two-class and 10 multi-classified tumor gene expression profiles shows that the 2-2-IRG-DC model is better than that of the literature. As a feature selection method, the 1-2-IRG-DC is obviously superior to four reference feature selection methods such as mRMR, SVM-RFE, HC-K-TSP, TSG and the like; as a classifier, The 1 ~ 2-DC is better than that of NB, KNN and other reference classifiers. The method of this paper is of great theoretical and practical value for advancing high-dimensional data feature selection and tumor classification identification.
【學(xué)位授予單位】:湖南農(nóng)業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2015
【分類號(hào)】:R730.2

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李鈞濤;賈英民;;用于癌癥分類與基因選擇的一種改進(jìn)的彈性網(wǎng)絡(luò)(英文)[J];自動(dòng)化學(xué)報(bào);2010年07期

2 黃海燕;;高矮胖瘦由你說[J];大眾科技;1999年08期

3 張樹波;賴劍煌;;基于融合信息的癌癥相關(guān)基因選擇方法[J];計(jì)算機(jī)科學(xué);2010年12期

4 姬翔;王安文;;一種基于SVM和相關(guān)性的基因選擇方法[J];計(jì)算機(jī)應(yīng)用與軟件;2007年06期

5 黃海燕;;胖瘦將由你掌握——人類未來飲食的重大變革[J];大科技;1999年05期

6 游偉;李樹濤;譚明奎;;基于SVM-RFE-SFS的基因選擇方法[J];中國(guó)生物醫(yī)學(xué)工程學(xué)報(bào);2010年01期

7 石修權(quán);王增珍;;多因子降維法在評(píng)價(jià)代謝酶基因-基因-環(huán)境交互作用中的應(yīng)用[J];環(huán)境與健康雜志;2010年12期

8 丁劍濤,黃濤,李蘭英,范鈺,沈巖,吳冠蕓;FMR1基因在人胚胎組織中的選擇剪接表達(dá)[J];中國(guó)醫(yī)學(xué)科學(xué)院學(xué)報(bào);1997年04期

9 孟超;;“瘋狂基因”:進(jìn)化的動(dòng)力?[J];中國(guó)新聞周刊;2011年46期

10 李鈞濤;賈英民;;PCD型自適應(yīng)彈性網(wǎng)絡(luò)在微陣列分類中的應(yīng)用[J];智能系統(tǒng)學(xué)報(bào);2010年03期

相關(guān)會(huì)議論文 前3條

1 任偉;閆桂英;;利用聚類算法來研究基因選擇問題[A];中國(guó)運(yùn)籌學(xué)會(huì)第八屆學(xué)術(shù)交流會(huì)論文集[C];2006年

2 張春美;;守望生命,關(guān)注人的尊嚴(yán)——基因倫理的若干熱點(diǎn)問題[A];中國(guó)的遺傳學(xué)研究——中國(guó)遺傳學(xué)會(huì)第七次代表大會(huì)暨學(xué)術(shù)討論會(huì)論文摘要匯編[C];2003年

3 李卉卉;袁谷;;血管內(nèi)皮生長(zhǎng)因子(VEGF)基因啟動(dòng)子區(qū)G-四鏈體識(shí)別的研究[A];第六屆全國(guó)化學(xué)生物學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2009年

相關(guān)重要報(bào)紙文章 前2條

1 鄭詩亮;薛人望談基因與生命[N];東方早報(bào);2011年

2 本報(bào)記者 章勇;基因選擇和飼養(yǎng)管理可改善羊肉顏色[N];中國(guó)畜牧獸醫(yī)報(bào);2014年

相關(guān)博士學(xué)位論文 前1條

1 張紅燕;腫瘤信息基因選擇與分類方法研究[D];湖南農(nóng)業(yè)大學(xué);2015年

相關(guān)碩士學(xué)位論文 前7條

1 周萍;基于頻度與聯(lián)合效應(yīng)的基因選擇[D];西安電子科技大學(xué);2009年

2 曹濤;基于聚類的混合基因選擇方法研究[D];湖南大學(xué);2011年

3 姬翔;基于SVM的多病類診斷基因選擇方法研究[D];西安電子科技大學(xué);2005年

4 吳希賢;基于優(yōu)化算法的基因選擇與癌癥分類[D];湖南大學(xué);2008年

5 劉申嶺;基于SVM的基因選擇[D];西安電子科技大學(xué);2004年

6 高紅超;基于聚類的基因選擇算法和DPC聚類算法研究[D];陜西師范大學(xué);2015年

7 陸燕;基于啟發(fā)式聚類的混合特征基因選擇方法研究[D];湖南大學(xué);2010年

,

本文編號(hào):2453761

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/yixuelunwen/zlx/2453761.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶62991***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
欧美中文字幕日韩精品| 国产一区欧美一区日本道| 精品伊人久久大香线蕉综合| 国产一级特黄在线观看| 噜噜中文字幕一区二区| 污污黄黄的成年亚洲毛片| 中文字幕一区二区熟女| 一区二区三区日韩中文| 国产成人午夜福利片片| 日本人妻丰满熟妇久久| 久久亚洲成熟女人毛片| 亚洲乱码av中文一区二区三区| 欧美午夜视频免费观看| 不卡中文字幕在线视频| 熟妇久久人妻中文字幕| 国产精品欧美一级免费| 日韩日韩欧美国产精品| 亚洲高清中文字幕一区二区三区| 欧美国产日韩变态另类在线看| 欧美一区二区在线日韩| 五月婷婷综合激情啪啪| 中文字日产幕码三区国产| 在线日本不卡一区二区| 亚洲乱码av中文一区二区三区| 成人精品国产亚洲av久久 | 国产精品伦一区二区三区四季| 欧美一级日韩中文字幕| 欧美精品一区二区水蜜桃| 国产精品日韩欧美一区二区| 在线免费不卡亚洲国产| 福利专区 久久精品午夜| 粉嫩一区二区三区粉嫩视频| 国产一二三区不卡视频| 日韩人妻中文字幕精品| 日本高清一道一二三区四五区| 国产精品一区二区三区欧美| 日本男人女人干逼视频| 日韩一区二区三区免费av| 色婷婷在线视频免费播放| 99精品人妻少妇一区二区人人妻| 久久精品国产第一区二区三区|