基于CUDA平臺(tái)的機(jī)器學(xué)習(xí)算法GPU并行化的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-12-21 15:45
【摘要】:目前機(jī)器學(xué)習(xí)的主要任務(wù)是對(duì)大批量用戶數(shù)據(jù)進(jìn)行學(xué)習(xí)分析,幫助管理者做決策。由于目前數(shù)據(jù)的維度以及樣本數(shù)很大,導(dǎo)致CPU串行處理耗時(shí)過多。而另一方面,GPU(Graphics Processing Unit)快速發(fā)展,擁有強(qiáng)大的并行處理能力。由于GPU高效低價(jià)且天然并行,科研人員開始利用GPU做通用計(jì)算。CUDA(Compute Unified Device Architecture)是NVIDIA推出的用于發(fā)揮NVIDIA GPU通用計(jì)算能力的編程環(huán)境。采用CUDA編程模型,可以簡(jiǎn)單有效的使用GPU對(duì)機(jī)器學(xué)習(xí)相關(guān)算法進(jìn)行并行化設(shè)計(jì)與實(shí)現(xiàn)。本文主要研究基礎(chǔ)機(jī)器學(xué)習(xí)算法GPU并行化的可行性和實(shí)現(xiàn)方法,希望從中尋找出一種從CPU平臺(tái)到CUDA平臺(tái)的通用移植方案。主要工作包括:針對(duì)分類機(jī)器學(xué)習(xí)算法,以KNN和決策樹算法為例,先是分析原有算法的性能消耗模塊,接著對(duì)主要性能消耗模塊進(jìn)行CUDA加速,最終設(shè)計(jì)出了KNN和決策樹算法適合CUDA的并行化方案,并選取KNN算法進(jìn)行了實(shí)驗(yàn),對(duì)比分析并行化前后的差異。最后總結(jié)了分類機(jī)器學(xué)習(xí)算法基于CUDA并行化的方案。針對(duì)聚類機(jī)器學(xué)習(xí)算法,以k-means和DBScan為例,先是分析原有算法的性能消耗模塊,接著對(duì)主要性能消耗模塊進(jìn)行CUDA加速,最終設(shè)計(jì)出了k-means和DBScan算法適合CUDA的并行化方案,并選取k-means算法進(jìn)行了實(shí)驗(yàn),對(duì)比分析并行化前后的差異。最后總結(jié)了聚類機(jī)器學(xué)習(xí)算法基于CUDA并行化的方案。本文最后將基于CUDA的機(jī)器學(xué)習(xí)并行化方案成功應(yīng)用到實(shí)際的工程中。
[Abstract]:At present, the main task of machine learning is to analyze mass user data and help managers make decisions. Because of the large data dimension and sample number, CPU serial processing takes too much time. On the other hand, GPU (Graphics Processing Unit) has developed rapidly and has powerful parallel processing ability. Due to the high efficiency and low cost and natural parallelism of GPU, researchers began to use GPU to do general computing. CUDA (Compute Unified Device Architecture), which is a programming environment developed by NVIDIA to give full play to the general computing ability of NVIDIA GPU. By using CUDA programming model, the parallel design and implementation of machine learning algorithms can be implemented simply and effectively by using GPU. This paper mainly studies the feasibility and implementation method of parallelization of basic machine learning algorithm (GPU), hoping to find a general transplanting scheme from CPU platform to CUDA platform. The main work includes: aiming at the classification machine learning algorithm, taking KNN and decision tree algorithm as examples, firstly analyzing the performance consumption module of the original algorithm, then accelerating the main performance consumption module with CUDA. Finally, the parallelization scheme of KNN and decision tree algorithm suitable for CUDA is designed, and the experiment of KNN algorithm is carried out, and the differences before and after parallelization are compared and analyzed. Finally, the parallel scheme of classifying machine learning algorithm based on CUDA is summarized. For clustering machine learning algorithm, taking k-means and DBScan as examples, the performance consumption module of the original algorithm is analyzed, and then the main performance consumption module is accelerated by CUDA. Finally, the parallelization scheme of k-means and DBScan algorithm suitable for CUDA is designed. The k-means algorithm is selected to experiment, and the differences before and after parallelization are compared and analyzed. Finally, the scheme of clustering machine learning algorithm based on CUDA parallelization is summarized. Finally, the machine learning parallelization scheme based on CUDA is successfully applied to practical engineering.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181
本文編號(hào):2389115
[Abstract]:At present, the main task of machine learning is to analyze mass user data and help managers make decisions. Because of the large data dimension and sample number, CPU serial processing takes too much time. On the other hand, GPU (Graphics Processing Unit) has developed rapidly and has powerful parallel processing ability. Due to the high efficiency and low cost and natural parallelism of GPU, researchers began to use GPU to do general computing. CUDA (Compute Unified Device Architecture), which is a programming environment developed by NVIDIA to give full play to the general computing ability of NVIDIA GPU. By using CUDA programming model, the parallel design and implementation of machine learning algorithms can be implemented simply and effectively by using GPU. This paper mainly studies the feasibility and implementation method of parallelization of basic machine learning algorithm (GPU), hoping to find a general transplanting scheme from CPU platform to CUDA platform. The main work includes: aiming at the classification machine learning algorithm, taking KNN and decision tree algorithm as examples, firstly analyzing the performance consumption module of the original algorithm, then accelerating the main performance consumption module with CUDA. Finally, the parallelization scheme of KNN and decision tree algorithm suitable for CUDA is designed, and the experiment of KNN algorithm is carried out, and the differences before and after parallelization are compared and analyzed. Finally, the parallel scheme of classifying machine learning algorithm based on CUDA is summarized. For clustering machine learning algorithm, taking k-means and DBScan as examples, the performance consumption module of the original algorithm is analyzed, and then the main performance consumption module is accelerated by CUDA. Finally, the parallelization scheme of k-means and DBScan algorithm suitable for CUDA is designed. The k-means algorithm is selected to experiment, and the differences before and after parallelization are compared and analyzed. Finally, the scheme of clustering machine learning algorithm based on CUDA parallelization is summarized. Finally, the machine learning parallelization scheme based on CUDA is successfully applied to practical engineering.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 王寧;;一種基于集群的通用并行計(jì)算框架設(shè)計(jì)[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2016年35期
2 高榮偉;;“當(dāng)驚世界殊”:百億億次超級(jí)計(jì)算機(jī)向我們走來[J];上海企業(yè);2016年12期
3 姚旺;胡欣;劉飛;王紅霞;劉文文;;基于GPU的高性能并行計(jì)算技術(shù)[J];計(jì)算機(jī)測(cè)量與控制;2014年12期
4 Suiang-Shyan LEE;Ja-Chen LIN;;An accelerated K-means clustering algorithm using selection and erasure rules[J];Journal of Zhejiang University-Science C(Computers & Electronics);2012年10期
相關(guān)碩士學(xué)位論文 前3條
1 張苗;基于GPU的人臉定位算法研發(fā)與優(yōu)化[D];浙江大學(xué);2016年
2 張唯唯;基于GPU的高性能計(jì)算研究與應(yīng)用[D];南京航空航天大學(xué);2015年
3 林森;基于CUDA平臺(tái)的C4.5算法研究[D];西安電子科技大學(xué);2011年
,本文編號(hào):2389115
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2389115.html
最近更新
教材專著