當(dāng)前位置：主頁(yè) > 醫(yī)學(xué)論文 > 藥學(xué)論文 >

基于失衡數(shù)據(jù)挖掘的藥物靶點(diǎn)預(yù)測(cè)方法研究

發(fā)布時(shí)間：2018-04-29 16:11

本文選題：藥物靶點(diǎn) + 數(shù)據(jù)挖掘�。� 參考：《哈爾濱理工大學(xué)》2017年碩士論文

【摘要】：藥物靶點(diǎn)的發(fā)現(xiàn)和定位是新藥研究成功的關(guān)鍵。進(jìn)入后基因組時(shí)代,伴隨著化學(xué)基因組以及藥理學(xué)技術(shù)的飛速發(fā)展涌現(xiàn)出了數(shù)量龐大的潛在靶點(diǎn)和海量的生物活性數(shù)據(jù)。然而在藥物靶點(diǎn)的研究中,到目前為止,被臨床驗(yàn)證的藥物靶點(diǎn)的數(shù)量還很少,迄今為止只有約500個(gè)藥物靶點(diǎn)。究其原因,有一部分原因是因?yàn)殡S著冗余數(shù)據(jù)的積累,僅憑著簡(jiǎn)單的分析方法已經(jīng)不能滿足高通量大規(guī)模數(shù)據(jù)分析的需求,而傳統(tǒng)的方法由于通量,準(zhǔn)確度和費(fèi)用的限制,實(shí)驗(yàn)手段的應(yīng)用難以廣泛開(kāi)展。而作為一類快速、低成本的方法,應(yīng)對(duì)大量的數(shù)據(jù),基于數(shù)據(jù)挖掘技術(shù)的藥物靶點(diǎn)預(yù)測(cè)方法正受到越來(lái)越多的重視�；谶@個(gè)背景,本文探討了基于失衡數(shù)據(jù)挖掘的藥物靶點(diǎn)預(yù)測(cè),以加快藥物靶點(diǎn)發(fā)現(xiàn)過(guò)程,節(jié)約成本。從眾多蛋白質(zhì)中預(yù)測(cè)藥物靶點(diǎn)是一個(gè)典型的數(shù)據(jù)失衡問(wèn)題,在用分類器進(jìn)行預(yù)測(cè)時(shí)準(zhǔn)確率會(huì)出現(xiàn)不同程度下降,因此本文在數(shù)據(jù)層面上首先采用基于遺傳算法改進(jìn)的少數(shù)類樣本合成過(guò)采樣技術(shù)SMOTE(synthetic minority oversampling technique)算法先對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,增加少數(shù)類樣本個(gè)數(shù),平衡藥物靶點(diǎn)和非藥物靶點(diǎn)的數(shù)量比。在此基礎(chǔ)上從算法層面使用引入集成學(xué)習(xí)的SVM分類器進(jìn)行藥物靶點(diǎn)的預(yù)測(cè),相比單一的SVM分類器,該方法提高了預(yù)測(cè)模型的泛化性能。為了論證所提出方法的有效性,本文首先構(gòu)建兩組數(shù)據(jù)集,一個(gè)數(shù)據(jù)集由所有的人類蛋白質(zhì)數(shù)據(jù)構(gòu)成,另一個(gè)數(shù)據(jù)集由在藥物靶點(diǎn)中占比較高的人類G蛋白偶聯(lián)受體數(shù)據(jù)構(gòu)成。對(duì)數(shù)據(jù)集中每一個(gè)蛋白質(zhì)提取其相對(duì)應(yīng)的一級(jí)結(jié)構(gòu)、多肽特征及蛋白質(zhì)的基本理化性質(zhì)特征,作為訓(xùn)練分類器的特征空間并進(jìn)行特征選擇來(lái)降低分類器的學(xué)習(xí)負(fù)擔(dān)。之后對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,并通過(guò)對(duì)模型參數(shù)的調(diào)整構(gòu)建最優(yōu)分類器。在實(shí)驗(yàn)構(gòu)建與分析部分分別用SVM分類器和Adaboost-SVM分類器對(duì)數(shù)據(jù)集進(jìn)行分類,并分析比較了兩種分類器在數(shù)據(jù)預(yù)處理前后應(yīng)用在兩組數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果,兩組分類結(jié)果相互驗(yàn)證,增加分類結(jié)果的可信度。實(shí)驗(yàn)的結(jié)果驗(yàn)證了本文提出的方法的有效性,同時(shí)表明本文所提出的方法可以有效對(duì)藥物靶點(diǎn)進(jìn)行預(yù)測(cè),為藥物研發(fā)工作者提供前期參考依據(jù)。
[Abstract]:The discovery and location of drug targets is the key to the success of new drug research. In the post-genome era, with the rapid development of chemical genome and pharmacological technology, a large number of potential targets and massive bioactivity data have emerged. However, in the research of drug targets, the number of drug targets verified by clinic is very few, so far, there are only about 500 drug targets. Part of the reason is that with the accumulation of redundant data, simple analytical methods can no longer meet the needs of high-throughput large-scale data analysis, while traditional methods are limited by flux, accuracy, and cost. The application of experimental means is difficult to carry out widely. As a kind of fast and low cost method, drug target prediction method based on data mining technology has been paid more and more attention to in response to a large amount of data. Based on this background, this paper discusses the drug target prediction based on unbalanced data mining in order to speed up the process of drug target discovery and save the cost. Predicting drug targets from a large number of proteins is a typical data imbalance, and the accuracy of prediction with classifiers tends to decline to varying degrees. Therefore, in this paper, firstly, the SMOTE(synthetic minority oversampling technique based on genetic algorithm (GA) is used to preprocess the data to increase the number of samples. Balance the number of drug and non-drug targets. On this basis, an integrated learning SVM classifier is used to predict drug targets from the algorithm level. Compared with a single SVM classifier, this method improves the generalization performance of the prediction model. To demonstrate the effectiveness of the proposed method, two sets of data sets are constructed, one is composed of all human protein data and the other is human G protein-coupled receptor data which account for a high proportion of drug targets. The primary structure, polypeptide features and basic physicochemical properties of proteins are extracted from each protein in the dataset, which can be used as the feature space of the training classifier and the feature selection to reduce the learning burden of the classifier. Then the data is preprocessed and the optimal classifier is constructed by adjusting the model parameters. In the part of experimental construction and analysis, SVM classifier and Adaboost-SVM classifier are used to classify the data sets, and the experimental results of the two classifiers before and after data preprocessing are analyzed and compared. Increase the reliability of the classification results. The experimental results verify the effectiveness of the proposed method, and show that the proposed method can effectively predict drug targets, and provide a preliminary reference for drug R & D workers.
【學(xué)位授予單位】：哈爾濱理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：R91;TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 魏霞;;掃描新的分子結(jié)構(gòu)及藥物靶點(diǎn)[J];國(guó)外醫(yī)學(xué).藥學(xué)分冊(cè);2005年06期

2 潘雄;;科學(xué)家鎖定癌癥治療新型藥物靶點(diǎn)[J];功能材料信息;2008年Z1期

3 張音;王松俊;刁天喜;樓鐵柱;;藥物靶點(diǎn)的發(fā)展前景及需要解決的關(guān)鍵問(wèn)題[J];中國(guó)新藥雜志;2009年19期

4 崔大明;;藥物靶點(diǎn)篩選的研究[J];中國(guó)衛(wèi)生產(chǎn)業(yè);2013年14期

5 姚志劍,張志佒;肥胖相關(guān)基因與減肥藥物靶點(diǎn)[J];中國(guó)慢性病預(yù)防與控制;2004年03期

6 王娟;李學(xué)軍;;基于細(xì)胞信號(hào)動(dòng)態(tài)網(wǎng)絡(luò)的藥物靶點(diǎn)發(fā)現(xiàn)[J];藥學(xué)學(xué)報(bào);2010年01期

7 黃鋒;杜冠華;;新的藥物靶點(diǎn)肝X受體[J];食品與藥品;2009年05期

8 杜俊蓉;類異戊二烯通路與藥物靶點(diǎn)[J];華西藥學(xué)雜志;1999年04期

9 ;新藥物靶點(diǎn)有望幫助改進(jìn)放射療效[J];中國(guó)醫(yī)藥生物技術(shù);2010年05期

10 楊紅芹;李學(xué)軍;;化學(xué)蛋白質(zhì)組學(xué)與藥物靶點(diǎn)的發(fā)現(xiàn)[J];藥學(xué)學(xué)報(bào);2011年08期

相關(guān)會(huì)議論文前7條

1 王學(xué)健;;一場(chǎng)關(guān)于藥物靶點(diǎn)的紛爭(zhēng)[A];新觀點(diǎn)新學(xué)說(shuō)學(xué)術(shù)沙龍文集23：新藥發(fā)現(xiàn)——尋找維護(hù)人類健康的武器[C];2008年

2 杜冠華;;藥物靶點(diǎn)的發(fā)現(xiàn)和確證研究[A];中國(guó)科協(xié)第二十三屆新觀點(diǎn)新學(xué)說(shuō)學(xué)術(shù)沙龍論文集[C];2008年

3 杜冠華;;藥物靶點(diǎn)研究現(xiàn)狀[A];新觀點(diǎn)新學(xué)說(shuō)學(xué)術(shù)沙龍文集23：新藥發(fā)現(xiàn)——尋找維護(hù)人類健康的武器[C];2008年

4 周文霞;;組合藥物靶點(diǎn)研究[A];新觀點(diǎn)新學(xué)說(shuō)學(xué)術(shù)沙龍文集23：新藥發(fā)現(xiàn)——尋找維護(hù)人類健康的武器[C];2008年

5 杜冠華;;抗AD藥物靶點(diǎn)和藥物發(fā)現(xiàn)研究進(jìn)展[A];2009全國(guó)抗衰老與老年癡呆學(xué)術(shù)會(huì)議論文匯編[C];2009年

6 陳一岳;;藥物靶點(diǎn)與新藥開(kāi)發(fā)——看制藥工業(yè)藥理的發(fā)展未來(lái)[A];中國(guó)藥理學(xué)會(huì)第九屆制藥工業(yè)藥理學(xué)術(shù)會(huì)議論文摘要匯編[C];2000年

7 劉揚(yáng)中;;蛋白質(zhì)內(nèi)含子—結(jié)核桿菌的新藥物靶點(diǎn)及其抑制作用[A];中華醫(yī)學(xué)會(huì)結(jié)核病學(xué)分會(huì)2010年學(xué)術(shù)年會(huì)論文匯編[C];2010年

相關(guān)重要報(bào)紙文章前10條

1 本報(bào)記者閆松;藥物靶點(diǎn)研究和新藥研發(fā)亟待創(chuàng)新[N];大眾科技報(bào);2008年

2 木易;有多少藥物靶點(diǎn)值得期待？[N];中國(guó)醫(yī)藥報(bào);2007年

3 本報(bào)記者　洪天語(yǔ);一個(gè)靶點(diǎn)成就一個(gè)產(chǎn)業(yè)[N];醫(yī)藥經(jīng)濟(jì)報(bào);2006年

4 毛宇;美合成出普適性埃博拉藥物靶點(diǎn)[N];科技日?qǐng)?bào);2014年

5 曹穎新;專家研討藥學(xué)發(fā)展前沿問(wèn)題[N];學(xué)習(xí)時(shí)報(bào);2008年

6 劉伯寧;基因組學(xué)研究推進(jìn)新藥研發(fā)[N];中國(guó)醫(yī)藥報(bào);2011年

7 中科院上海生命科學(xué)院神經(jīng)所李帥;鎮(zhèn)痛之痛[N];文匯報(bào);2011年

8 本報(bào)記者白毅;醫(yī)園藥苑競(jìng)爭(zhēng)春(上)[N];中國(guó)醫(yī)藥報(bào);2011年

9 馬艷紅;探索AD分子機(jī)制全面開(kāi)展“靶點(diǎn)”研究[N];中國(guó)醫(yī)藥報(bào);2003年

10 本報(bào)記者陳錚;新藥研發(fā)困境催生新的研發(fā)特性[N];中國(guó)醫(yī)藥報(bào);2011年

相關(guān)碩士學(xué)位論文前6條

1 周進(jìn);人類激酶組蛋白質(zhì)藥物靶點(diǎn)的識(shí)別與系統(tǒng)圖譜分析[D];重慶大學(xué);2016年

2 蔡立葛;基于失衡數(shù)據(jù)挖掘的藥物靶點(diǎn)預(yù)測(cè)方法研究[D];哈爾濱理工大學(xué);2017年

3 趙麗;篩選必要基因預(yù)測(cè)的訓(xùn)練集及細(xì)菌致病菌藥物靶點(diǎn)的識(shí)別[D];西北農(nóng)林科技大學(xué);2015年

4 韓緒軍;HCV核酸檢測(cè)方法建立及藥物靶點(diǎn)變異分析[D];昆明理工大學(xué);2009年

5 王春麗;利用SVM挖掘GPCR中潛在的藥物靶點(diǎn)[D];重慶醫(yī)科大學(xué);2013年

6 陳廷威;基于公共數(shù)據(jù)庫(kù)的藥物靶點(diǎn)相互作用網(wǎng)絡(luò)研究[D];山東大學(xué);2012年

，

本文編號(hào)：1820649

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/yixuelunwen/yiyaoxuelunwen/1820649.html

上一篇：紫草素的脂質(zhì)體制備工藝及其質(zhì)量評(píng)價(jià)研究
下一篇：慢性病患者藥店購(gòu)藥行為意向的影響機(jī)制探討:計(jì)劃行為理論視角下的實(shí)證研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于失衡數(shù)據(jù)挖掘的藥物靶點(diǎn)預(yù)測(cè)方法研究