當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于SVM的化合物致突變性分類預(yù)測(cè)系統(tǒng)的研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-12-25 15:09

【摘要】：隨著科技的進(jìn)步,越來(lái)越多的藥物被研發(fā)出來(lái)對(duì)抗各種疾病,而藥物研發(fā)的過(guò)程需要花費(fèi)大量的物力、人力,且研發(fā)周期長(zhǎng)。藥物研發(fā)的整個(gè)過(guò)程中需要考慮該藥物的ADMET(吸收,分布,代謝,排泄,毒性)五個(gè)特性,其中藥物毒性中的致突變性與癌癥密切相關(guān)。在藥物研發(fā)的最后階段動(dòng)物人體試驗(yàn)階段,會(huì)對(duì)藥物的對(duì)人體的致突變型進(jìn)行測(cè)試,而在這個(gè)階段往往有很多藥物的致突變性實(shí)驗(yàn)結(jié)果會(huì)因?yàn)閷?duì)動(dòng)物或者人體損害太大而被放棄研發(fā),這樣便浪費(fèi)了之前各個(gè)階段的資金投入。近年來(lái),計(jì)算機(jī)技術(shù)中的模式識(shí)別技術(shù)迅速發(fā)展,被應(yīng)用到社會(huì)上的各個(gè)領(lǐng)域,生物信息和藥物研發(fā)也是模式識(shí)別的一個(gè)重要研究方向。本系統(tǒng)的主要功能是利用機(jī)器學(xué)習(xí)算法對(duì)化合物的致突變性進(jìn)行預(yù)測(cè)分類,通過(guò)分類模型對(duì)化合物致突變性相關(guān)的化合物特征進(jìn)行分析。系統(tǒng)提供大量的化合物以及化合物的特征屬性作為分類模型的訓(xùn)練集,其中包括各研究機(jī)構(gòu)對(duì)該化合物致突變性研究的結(jié)果。系統(tǒng)為用戶提供化合物特征計(jì)算、特征選擇、數(shù)據(jù)清洗、分類模型建立、化合物致突變性預(yù)測(cè)、結(jié)果分析、結(jié)果文件保存等功能。研究人員可以根據(jù)預(yù)測(cè)結(jié)果來(lái)分析影響化合物致突變性的關(guān)鍵特征。系統(tǒng)使用Java語(yǔ)言開(kāi)發(fā),使用的Spring MVC框架進(jìn)行系統(tǒng)架構(gòu),使用MySQL數(shù)據(jù)庫(kù)進(jìn)行化合物特征和個(gè)人信息等數(shù)據(jù)的存儲(chǔ),實(shí)現(xiàn)了數(shù)據(jù)處理模塊、預(yù)測(cè)分類模塊、結(jié)果分析模塊、系統(tǒng)管理模塊和個(gè)人信息模塊。在數(shù)據(jù)處理模塊中,系統(tǒng)根據(jù)化合物的SMILES序列計(jì)算出化合物的1446緯化合物特征描述符并對(duì)特征數(shù)據(jù)進(jìn)行缺失值處理和規(guī)范化等操作,再使用信息增益、CFS和Relief等特征選擇算法來(lái)對(duì)特征進(jìn)行降維處理。在預(yù)測(cè)分類模塊中,系統(tǒng)采用支持向量機(jī)算法模型再使用Adaboost算法對(duì)支持向量機(jī)模型進(jìn)行迭代提升,從而提高系統(tǒng)的預(yù)測(cè)準(zhǔn)度。經(jīng)過(guò)各種交叉驗(yàn)證和獨(dú)立測(cè)試集驗(yàn)證,系統(tǒng)能夠較為準(zhǔn)確的預(yù)測(cè)化合物的致突變性,精確度達(dá)到83.5%。在功能上和性能上都能滿足用戶的需求,達(dá)到預(yù)期研究效果。
[Abstract]:With the development of science and technology, more and more drugs are developed to fight various diseases, and the process of drug development takes a lot of material resources, manpower, and long research and development cycle. Five characteristics of ADMET (absorption, distribution, metabolism, excretion and toxicity) should be considered in the whole process of drug development. The mutagenicity of drug toxicity is closely related to cancer. In the final phase of drug development in animal human trials, the mutagenicity of drugs in humans will be tested. At this stage, many drug mutagenicity experiments will be abandoned because of too much damage to animals or human beings, thus wasting the previous stages of investment. In recent years, the pattern recognition technology in computer technology has developed rapidly and been applied to all fields of society. Biological information and drug development are also an important research direction of pattern recognition. The main function of this system is to predict and classify the mutagenicity of compounds by using machine learning algorithm, and to analyze the characteristics of compounds related to mutagenicity by classification model. The system provides a large number of compounds and their characteristic attributes as training sets for classification models, including the results of the studies on the mutagenicity of the compounds by various research institutions. The system provides users with functions such as compound feature calculation, feature selection, data cleaning, classification model building, compound mutagenicity prediction, result analysis, result file preservation, and so on. The researchers can use the predicted results to analyze key characteristics that affect mutagenicity of compounds. The system uses Java language to develop, uses the Spring MVC frame to carry on the system structure, uses the MySQL database to carry on the compound characteristic and the personal information and so on data storage, has realized the data processing module, the forecast classification module, the result analysis module, System management module and personal information module. In the data processing module, according to the SMILES sequence of the compound, the system calculates the characteristic descriptor of the compound in 1446 latitudes, processes the missing value and normalizes the characteristic data, and then uses the information gain. Feature selection algorithms such as CFS and Relief are used to reduce the dimension of features. In the prediction classification module, the support vector machine (SVM) algorithm model is adopted and the Adaboost algorithm is used to iterate the SVM model to improve the prediction accuracy of the system. After a variety of cross-validation and independent test set verification, the system can accurately predict the mutagenicity of compounds, with an accuracy of 83.555. In function and performance can meet the needs of users, to achieve the desired results.
【學(xué)位授予單位】：遼寧大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TQ460;TP181

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 晏皓鸞;黃景碧;;學(xué)習(xí)者情感挖掘:一個(gè)重要的教育技術(shù)研究領(lǐng)域[J];軟件導(dǎo)刊(教育技術(shù));2014年01期

2 計(jì)智偉;胡珉;尹建新;;特征選擇算法綜述[J];電子設(shè)計(jì)工程;2011年09期

3 劉慶和;梁正友;;一種基于信息增益的特征優(yōu)化選擇方法[J];計(jì)算機(jī)工程與應(yīng)用;2011年12期

4 丁世飛;齊丙娟;譚紅艷;;支持向量機(jī)理論與算法研究綜述[J];電子科技大學(xué)學(xué)報(bào);2011年01期

5 朱樹(shù)先;張仁杰;;支持向量機(jī)核函數(shù)選擇的研究[J];科學(xué)技術(shù)與工程;2008年16期

6 姚勇;趙輝;劉志鏡;;一種非線性支持向量機(jī)決策樹(shù)多值分類器[J];西安電子科技大學(xué)學(xué)報(bào);2007年06期

7 廖明陽(yáng);吳純啟;;藥物毒理學(xué)研究的發(fā)展現(xiàn)狀與趨勢(shì)[J];毒理學(xué)雜志;2007年05期

8 毛勇;周曉波;夏錚;尹征;孫優(yōu)賢;;特征選擇算法研究綜述[J];模式識(shí)別與人工智能;2007年02期

9 李琳;張曉龍;;基于RBF核的SVM學(xué)習(xí)算法的優(yōu)化計(jì)算[J];計(jì)算機(jī)工程與應(yīng)用;2006年29期

10 王興玲,李占斌;基于網(wǎng)格搜索的支持向量機(jī)核函數(shù)參數(shù)的確定[J];中國(guó)海洋大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年05期

相關(guān)碩士學(xué)位論文前1條

1 李曉嵐;基于Relief特征選擇算法的研究與應(yīng)用[D];大連理工大學(xué);2013年

，

本文編號(hào)：2391291

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2391291.html

上一篇：雙螺母墊片預(yù)緊式滾珠絲杠副軸向接觸靜剛度靈敏度與可靠性分析
下一篇：機(jī)器人輔助經(jīng)皮螺釘內(nèi)固定治療骨盆和髖臼骨折

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于SVM的化合物致突變性分類預(yù)測(cè)系統(tǒng)的研究與實(shí)現(xiàn)