基于機(jī)器學(xué)習(xí)的A型流感病毒跨種傳播和抗原關(guān)系預(yù)測(cè)研究
發(fā)布時(shí)間:2018-06-30 08:38
本文選題:機(jī)器學(xué)習(xí) + 支持向量機(jī) ; 參考:《華中科技大學(xué)》2012年博士論文
【摘要】:禽流感病毒是禽適應(yīng)的A型流感病毒,在過去的十幾年間,禽流感病毒的跨種傳播給人類社會(huì)造成了重大的生命財(cái)產(chǎn)損失,引起了社會(huì)的高度關(guān)注。H3N2亞型流感病毒是另一種對(duì)人類社會(huì)具有重要影響的A型流感病毒,它的抗原變異讓疫苗失去作用,為全球流感病毒監(jiān)控工作帶來較大的困難。研究這兩類A型流感病毒的跨種傳播和抗原關(guān)系具有重要的理論和現(xiàn)實(shí)意義。基于機(jī)器學(xué)習(xí)、信息論、特征選擇等方法研制并改進(jìn)了禽流感病毒禽到人的跨種傳播和H3N2亞型流感病毒的抗原關(guān)系預(yù)測(cè)模型,同時(shí)識(shí)別了禽流感病毒禽到人傳播的90個(gè)特征氨基酸位置以及18個(gè)H3N2流感病毒抗原變異關(guān)鍵氨基酸位置,從而可以為公共健康提供早期預(yù)警,為相關(guān)的分子決定因素和底層機(jī)制研究提供思路。 首先,根據(jù)現(xiàn)在尚未有實(shí)驗(yàn)驗(yàn)證的不能實(shí)現(xiàn)禽到人傳播的禽流感病毒的情況,結(jié)合一分類SVM適用于負(fù)樣本較難確定的問題的特點(diǎn),探索了使用一分類SVM來預(yù)測(cè)禽流感病毒禽到人傳播的可行性,通過氨基酸組成、二肽組成及自相關(guān)系數(shù)編碼禽流感病毒蛋白質(zhì)序列,構(gòu)建了一分類SVM預(yù)測(cè)模型,其預(yù)測(cè)精度超過了當(dāng)前已有的反向神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)模型。 其次,在前期工作建測(cè)試用的負(fù)樣本時(shí),發(fā)現(xiàn)構(gòu)建的負(fù)樣本比已有的預(yù)測(cè)模型中用到的負(fù)樣本具有更高的可靠性,因此擴(kuò)大了兩類樣本的數(shù)據(jù)規(guī)模并采取傳統(tǒng)的兩分類方法提升預(yù)測(cè)禽流感病毒禽到人的跨種傳播同時(shí)挖掘有生物學(xué)意義的特征。通過信息熵的方法首先選擇了90個(gè)特征氨基酸位置,,基于理化性質(zhì)編碼這些特征位置后使用了多種特征選擇方法包括Relief,mRMR,信息增益及遺傳算法選取了最優(yōu)特征子集,利用這個(gè)最優(yōu)特征子集構(gòu)建的預(yù)測(cè)模型性能有了大幅提高,同時(shí)最終選擇的理化特性在兩類樣本中差異明顯,表明了這些特征的有效性,此外其中的兩個(gè)理化性質(zhì)得到多個(gè)生物學(xué)研究結(jié)果的支持。 再次,人工收集了來自于相關(guān)文獻(xiàn)中記錄的H3N2流感病毒抗原變異數(shù)據(jù),將最近三個(gè)H3N2抗原變異研究中用到的數(shù)據(jù)規(guī)模擴(kuò)大了近一倍。然后比較了多種打分策略,包括優(yōu)勢(shì)比,互信息,Phi相關(guān)系數(shù)并聯(lián)合多元線性回歸最終識(shí)別了18個(gè)H3N2流感病毒抗原變異關(guān)鍵位置,這18個(gè)關(guān)鍵位置均位于HA蛋白的5個(gè)抗原表位中,有8個(gè)位置與已識(shí)別的正選擇位置相吻合,說明了本研究識(shí)別的18個(gè)抗原變異關(guān)鍵位置對(duì)H3N2流感病毒抗原變異具有重要作用。 最后,在上一部分工作的基礎(chǔ)上,期望改進(jìn)H3N2流感病毒抗原關(guān)系的預(yù)測(cè)模型,降低假陽(yáng)性;诎被岬哪承┩蛔兛赡懿⒉辉斐煽乖儺悾(dāng)理化性質(zhì)改變時(shí)才造成抗原變異的提示,集成了多種理化性質(zhì)變化來改進(jìn)預(yù)測(cè)H3N2流感病毒的抗原關(guān)系。通過互信息與層次聚類篩選了候選理化性質(zhì),最終的實(shí)驗(yàn)結(jié)果表明構(gòu)建的預(yù)測(cè)模型比上一部分工作構(gòu)建的模型性能有了較大提高,同時(shí)優(yōu)于當(dāng)前其他三個(gè)H3N2抗原關(guān)系預(yù)測(cè)模型,包括漢明距離預(yù)測(cè)模型,分組打分多元線性回歸模型以及決策樹。此外進(jìn)一步構(gòu)建了H3N2流感病毒抗原關(guān)系預(yù)測(cè)的Web工具,為相關(guān)研究人員提供在線服務(wù)。
[Abstract]:Avian influenza virus is avian influenza virus A. In the past decade, the transmission of avian influenza virus has caused great loss of life and property to human society. It has attracted social attention that.H3N2 subtype influenza virus is another A influenza virus which has important effects on human society. Its antigen variation makes vaccines. The study of the cross species transmission and antigen relationship of these two types of A influenza viruses has important theoretical and practical significance. Based on machine learning, information theory, feature selection and other methods, it has developed and improved avian influenza virus to human trans species transmission and H3N2 subtype influenza virus. The antigen relationship prediction model, at the same time identified the 90 characteristic amino acid positions of avian influenza virus avian to human transmission and the position of 18 H3N2 influenza virus antigen variant key amino acids, can provide early warning for public health, and provide ideas for the related molecular determinants and the underlying mechanism.
First, based on the fact that avian influenza virus can not be transmitted to human transmission, a classified SVM is suitable for the characteristics of the more difficult negative samples. A classification of SVM is used to predict the feasibility of avian influenza virus to human transmission, through the composition of amino acid, the composition of two peptide and the autocorrelation coefficient. A SVM prediction model is constructed based on the protein sequence of the code avian influenza virus, and its prediction accuracy is higher than that of the existing reverse neural network prediction model.
Secondly, when the negative sample used in the previous work is built, it is found that the negative sample constructed is more reliable than the negative sample used in the existing prediction model. Therefore, the data scale of the two types of samples is expanded and the traditional two classification method is adopted to improve the prediction of the cross species transmission of avian influenza virus to human and to excavate the biological meaning. First, 90 characteristic amino acids are selected by the information entropy method. After coding these characteristics based on physicochemical properties, a variety of feature selection methods, including Relief, mRMR, information gain and genetic algorithm, are used to select the best feature subset. The performance of the prediction model constructed with this optimal subset is significant. At the same time, the physical and chemical properties of the final selection are distinct in the two types of samples, indicating the effectiveness of these characteristics, and the two physical and chemical properties of them are supported by the results of multiple biological studies.
Again, the H3N2 influenza virus antigen variation data from the related literature were collected artificially and nearly doubled the size of the data used in the recent three H3N2 antigens variation studies. Then a variety of scoring strategies were compared, including dominance ratio, mutual information, Phi correlation coefficient and combined multiple linear regression to identify 18 H3N2 flows. The 18 key positions of the 18 key positions are located in the 5 epitopes of the antigen, and 8 positions are in accordance with the identified positive selection positions. It shows that the key positions of the 18 antigens identified in this study are important for the H3N2 influenza virus antigen variation.
Finally, on the basis of the previous work, we expect to improve the prediction model of the H3N2 influenza virus antigen relationship and reduce the false positive. Some mutations based on amino acids may not cause the antigen variation, but when the physical and chemical properties change, the antigen variation can be prompted, and many kinds of physical and chemical changes are integrated to improve the prediction of the H3N2 influenza virus. Antigen relationship. The candidate physicochemical properties are screened by mutual information and hierarchical clustering. The final experimental results show that the predicted model is better than the previous model of the previous three H3N2 models, including the Hamming distance prediction model, and the grouping is divided into multiple linear regression. In addition, the Web tool for prediction of H3N2 influenza antigen relationship was further constructed to provide online services for relevant researchers.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:R373;TP181
【共引文獻(xiàn)】
相關(guān)期刊論文 前2條
1 何冰;宋曉峰;;基于蛋白質(zhì)序列的泛素化位點(diǎn)預(yù)測(cè)研究進(jìn)展[J];現(xiàn)代生物醫(yī)學(xué)進(jìn)展;2012年18期
2 盧亮;李棟;賀福初;;蛋白質(zhì)泛素化修飾的生物信息學(xué)研究進(jìn)展[J];遺傳;2013年01期
相關(guān)博士學(xué)位論文 前2條
1 李立奇;rFN/CDH的亞細(xì)胞位點(diǎn)預(yù)測(cè)及基于LbL技術(shù)的rFN/CDH仿生界面的構(gòu)建及初步評(píng)價(jià)[D];第三軍醫(yī)大學(xué);2012年
2 陳震;基于序列信息的蛋白質(zhì)功能位點(diǎn)預(yù)測(cè)的算法開發(fā)[D];中國(guó)農(nóng)業(yè)大學(xué);2014年
本文編號(hào):2085682
本文鏈接:http://sikaile.net/xiyixuelunwen/2085682.html
最近更新
教材專著