天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 化學(xué)論文 >

基于深度學(xué)習(xí)的化合物QSAR分類和有機(jī)碳吸附系數(shù)預(yù)測

發(fā)布時間:2018-04-20 23:02

  本文選題:機(jī)器學(xué)習(xí) + 深度學(xué)習(xí)。 參考:《新疆大學(xué)》2017年碩士論文


【摘要】:隨著計(jì)算機(jī)技術(shù)的高速發(fā)展和廣泛應(yīng)用及大數(shù)據(jù)產(chǎn)業(yè)規(guī)模呈現(xiàn)幾何增長,化合物定量構(gòu)效活性/屬性關(guān)系(quantitative structure-activity/property relationship,QSAR/QSPR)也得到了迅速發(fā)展,并上升到一個更高的水平。從最初在生物領(lǐng)域的應(yīng)用,逐漸擴(kuò)展到藥物科學(xué)、環(huán)境科學(xué)、藥物化學(xué)、藥物設(shè)計(jì)、醫(yī)學(xué)等眾多領(lǐng)域。其目的在于通過使用各種計(jì)算學(xué)、統(tǒng)計(jì)學(xué)方法研究化合物的結(jié)構(gòu)參數(shù)與其各種理化性質(zhì)及生物活性之間的關(guān)系,從而在分子層面上了解化合物的微觀結(jié)構(gòu)。因其涉及的領(lǐng)域較為廣泛,它所研究的對象包括化合物的生物活性、藥物毒性、及藥物在人體內(nèi)的吸收速率等。特別是在環(huán)境化學(xué)領(lǐng)域,由于大量的有機(jī)化合物進(jìn)入環(huán)境中,對自然生態(tài)系統(tǒng)和人類都有很大的危害性。然而,以往對QSAR的建模通常采用的都是淺層機(jī)器學(xué)習(xí)方法,例如啟發(fā)式方法、多元線性回歸、徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)、反向傳播神經(jīng)網(wǎng)絡(luò)、支持向量機(jī)等模型,它們的共性是作用于樣本數(shù)量少并且問題規(guī)模不是特別復(fù)雜的場景下。這便限制了其進(jìn)一步處理復(fù)雜問題和海量數(shù)據(jù)時的泛化能力。近年來深度學(xué)習(xí)作為機(jī)器學(xué)習(xí)的一個分支,已經(jīng)廣泛的應(yīng)用于多個領(lǐng)域,并且取得了一系列令人滿意的成果。特別是在大數(shù)據(jù)時代下,更需要利用深度學(xué)習(xí)技術(shù)處理很多淺層機(jī)器學(xué)習(xí)模型無法解決的問題。本文以口服生物利用度,CYP450 1A2酶的抑制性和logKoc為研究對象,以深度學(xué)習(xí)算法為基礎(chǔ),建立了基于深度學(xué)習(xí)的QSAR分類和logKoc預(yù)測模型,主要內(nèi)容由三個部分組成。第一部分以口服生物利用度為研究對象,通過分子計(jì)算軟件生成2D和3D分子特征作為棧式自編碼模型的輸入,讓其自動學(xué)習(xí)分子的特征,利用softmax實(shí)現(xiàn)口服生物利用度分類。并與一些淺層模型(支持向量機(jī)和人工神經(jīng)網(wǎng)絡(luò))做對比,來驗(yàn)證基于棧式自編碼模型對口服生物利用度分類的有效性。第二部分為基于深度信念網(wǎng)絡(luò)的CYP450 1A2抑制性分類模型,試驗(yàn)選取13000個化合物作為數(shù)據(jù)集,采用PubChem和MACCS分子指紋進(jìn)行分子結(jié)構(gòu)表征,利用DBN的半監(jiān)督學(xué)習(xí)方式從預(yù)處理后的特征中學(xué)習(xí)更本質(zhì)的特征表達(dá),避免人工提取特征的過程,實(shí)現(xiàn)CYP450 1A2的抑制性分類。第三部分為基于無向圖遞歸神經(jīng)網(wǎng)絡(luò)(UGRNN)的深度學(xué)習(xí)方法。首先將化合物分子結(jié)構(gòu)表示成無向圖的形式,然后利用遞歸神經(jīng)網(wǎng)絡(luò)對分子圖結(jié)構(gòu)進(jìn)行特征抽取,實(shí)現(xiàn)對logKoc的預(yù)測。此外該模型結(jié)合用皮爾遜相關(guān)系數(shù)法找出脂水分配系數(shù)(logP)作為另一輸入(簡稱UGRNN+logP),進(jìn)一步提升了預(yù)測精度。
[Abstract]:With the rapid development and wide application of computer technology and the geometric growth of big data's industrial scale, QSAR / QSPRs have also developed rapidly and reached a higher level. From the initial application in the biological field, it has gradually expanded to many fields, such as pharmaceutical science, environmental science, drug chemistry, drug design, medicine and so on. The purpose of this study is to study the relationship between the structural parameters of the compounds and their physical and chemical properties and biological activities by using various computational methods, so as to understand the microstructure of the compounds at the molecular level. Because of its wide range of fields, it studies the biological activities of compounds, drug toxicity, and drug absorption rate in the human body. Especially in the field of environmental chemistry, because a large number of organic compounds enter the environment, it is harmful to the natural ecosystem and human beings. However, in the past, the modeling of QSAR is usually based on shallow machine learning methods, such as heuristic method, multiple linear regression, radial basis function neural network, back propagation neural network, support vector machine and so on. Their commonality is that they work in situations where the number of samples is small and the size of the problem is not particularly complex. This limits its generalization ability to deal with complex problems and massive data. In recent years, as a branch of machine learning, deep learning has been widely used in many fields, and has achieved a series of satisfactory results. Especially in big data's time, it is necessary to use depth learning technology to deal with many problems that can not be solved by shallow machine learning model. In this paper, the inhibition of CYP450 1A2 enzyme and logKoc in oral bioavailability were studied. Based on the deep learning algorithm, the QSAR classification and logKoc prediction model based on deep learning were established. The main contents were composed of three parts. In the first part, taking oral bioavailability as the research object, 2D and 3D molecular features are generated by molecular computing software as the input of stack self-coding model to automatically learn the molecular characteristics, and the classification of oral bioavailability is realized by softmax. Compared with some shallow models (support vector machine and artificial neural network), the effectiveness of the self-coding model based on stack for oral bioavailability classification is verified. The second part is the CYP450 1A2 inhibitory classification model based on deep belief network. 13000 compounds are selected as data sets and the molecular structure is characterized by PubChem and MACCS fingerprints. The semi-supervised learning method of DBN is used to learn the more essential feature expression from the pretreated features, to avoid the process of artificial feature extraction, and to realize the inhibitory classification of CYP450 1A2. The third part is the depth learning method based on undirected graph recurrent neural network (UGRNN). First, the molecular structure of compounds is expressed as an undirected graph, then the structure of the molecular graph is extracted by recursive neural network, and the prediction of logKoc is realized. In addition, the model combined with Pearson correlation coefficient to find out the fat-water partition coefficient (log P) as another input (abbreviated as UGRNN log P) further improves the prediction accuracy.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:O621.1;TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 史新宇;禹龍;田生偉;葉飛躍;錢進(jìn);高雙印;;基于深度學(xué)習(xí)的口服生物利用度分類研究[J];計(jì)算機(jī)科學(xué);2016年04期

2 陳云霽;陳天石;;人工神經(jīng)網(wǎng)絡(luò)處理器[J];中國科學(xué):生命科學(xué);2016年02期

3 劉翠;楊書程;李民;劉培慶;陳纘光;張仁偉;;藥物篩選新技術(shù)及其應(yīng)用進(jìn)展[J];分析測試學(xué)報;2015年11期

4 王勇;趙儉輝;章登義;葉威;;基于稀疏自編碼深度神經(jīng)網(wǎng)絡(luò)的林火圖像分類[J];計(jì)算機(jī)工程與應(yīng)用;2014年24期

5 楊帆;馮翔;阮羚;陳俊武;夏榮;陳昱龍;金志輝;;基于皮爾遜相關(guān)系數(shù)法的水樹枝與超低頻介損的相關(guān)性研究[J];高壓電器;2014年06期

6 王放;曹永孝;狄佳;;絕對生物利用度計(jì)算方法的討論[J];醫(yī)學(xué)爭鳴;2014年02期

7 劉建偉;劉媛;羅雄麟;;深度學(xué)習(xí)研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2014年07期

8 李曼華;孫昊鵬;尤啟冬;;CYP1A2抑制劑預(yù)測模型的建立及評價[J];中國藥科大學(xué)學(xué)報;2013年05期

9 余凱;賈磊;陳雨強(qiáng);徐偉;;深度學(xué)習(xí)的昨天、今天和明天[J];計(jì)算機(jī)研究與發(fā)展;2013年09期

10 劉嫻;聞洋;趙元慧;;有機(jī)污染物土壤吸附預(yù)測模型及其影響因素[J];環(huán)境化學(xué);2013年07期

相關(guān)博士學(xué)位論文 前2條

1 袁永娜;QSPR/QSAR在化學(xué)、藥物化學(xué)和環(huán)境科學(xué)中的應(yīng)用研究[D];蘭州大學(xué);2010年

2 馬衛(wèi)平;線性和非線性方法在QSAR/QSPR研究中的應(yīng)用[D];蘭州大學(xué);2007年

相關(guān)碩士學(xué)位論文 前8條

1 閆奕霖;大數(shù)據(jù)環(huán)境下化合物類藥性與活性預(yù)測研究[D];新疆大學(xué);2016年

2 晁麗;細(xì)胞色素P450抑制劑虛擬篩選與分子對接[D];重慶大學(xué);2014年

3 閔建亮;基于2D分子指紋和非平衡數(shù)據(jù)集的藥物與受體交互作用預(yù)測研究[D];景德鎮(zhèn)陶瓷學(xué)院;2014年

4 田盛;類藥性和生物利用度的理論預(yù)測研究[D];蘇州大學(xué);2011年

5 李煥;基于量子化學(xué)計(jì)算的藥物活性定量構(gòu)效關(guān)系研究[D];河南師范大學(xué);2010年

6 鞏志國;苯的衍生物、液晶分子和苯乙烯聚合的構(gòu)效關(guān)系的研究與分析[D];蘭州大學(xué);2009年

7 夏彬彬;徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)在環(huán)境化學(xué)和藥物化學(xué)中的應(yīng)用[D];蘭州大學(xué);2008年

8 曾小蘭;部分持久性有機(jī)污染物的定量結(jié)構(gòu)—性質(zhì)關(guān)系研究[D];桂林工學(xué)院;2007年

,

本文編號:1779815

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/huaxue/1779815.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0d15f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com