天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

CUDA圖形處理器對(duì)聚類算法的加速實(shí)現(xiàn)

發(fā)布時(shí)間:2018-07-27 19:03
【摘要】:近些年來(lái),伴隨著信息處理以及通信技術(shù)的快速發(fā)展,促進(jìn)了許多領(lǐng)域輸出以及獲取數(shù)據(jù)能力的進(jìn)一步提高,這給我們帶來(lái)了海量有待處理的數(shù)據(jù)信息。如今,GB或是TB數(shù)量級(jí)的數(shù)據(jù)已經(jīng)十分常見(jiàn)。因此,傳統(tǒng)串行化的數(shù)據(jù)挖掘技術(shù)已無(wú)法有效地對(duì)這些數(shù)據(jù)進(jìn)行處理,取而代之的就是并行化的數(shù)據(jù)挖掘技術(shù)。如今,多核CPU(Central Processing Unit)作為并行化數(shù)據(jù)挖掘的計(jì)算平臺(tái)已經(jīng)十分普遍。但是,隨著數(shù)據(jù)規(guī)模逐漸增大,數(shù)據(jù)挖掘所涉及算法的復(fù)雜度越來(lái)越高,其所帶來(lái)的高密度數(shù)據(jù)運(yùn)算將會(huì)耗費(fèi)處理器大量的運(yùn)算時(shí)間,這對(duì)整個(gè)系統(tǒng)的性能以及功耗是十分不利的。為了降低這部分大規(guī)模數(shù)據(jù)的運(yùn)算量,我們需要選擇一個(gè)更加適合這類運(yùn)算的計(jì)算處理單元來(lái)減輕系統(tǒng)的負(fù)荷。而圖形處理器——GPU(Graphics Processing Unit),憑借其特殊的體系結(jié)構(gòu),十分適合大規(guī)模高密度數(shù)據(jù)的并行計(jì)算。在設(shè)計(jì)之初,GPU的設(shè)計(jì)者們就為其配置了大量的數(shù)據(jù)計(jì)算單元以及較高內(nèi)存訪問(wèn)帶寬以應(yīng)對(duì)日益苛刻的圖形處理以及電子游戲等應(yīng)用的要求。特別是自從NVIDIA在2007年推出CUDA(Compute Unified Device Architecture)架構(gòu)的GPU產(chǎn)品及其配套的開(kāi)環(huán)境后,使得基于GPU平臺(tái)的開(kāi)發(fā)與基于CPU平臺(tái)的開(kāi)發(fā)十分相似,讓開(kāi)發(fā)者能夠快速地開(kāi)始基于CUDA架構(gòu)的GPU進(jìn)行開(kāi)發(fā)。目前,在很多領(lǐng)域,如科學(xué)計(jì)算、金融工程、數(shù)據(jù)挖掘等,開(kāi)發(fā)者們都在嘗試使用CUDA架構(gòu)GPU來(lái)提升系統(tǒng)的運(yùn)算能力。除了在傳統(tǒng)的單機(jī)環(huán)境中實(shí)現(xiàn)GPU運(yùn)算,同時(shí)對(duì)于GPU在分布式環(huán)境中應(yīng)用也有了越來(lái)越多的研究。本論文將選取兩種數(shù)據(jù)挖掘領(lǐng)域中普遍使用的聚類算法,k均值聚類以及單鏈合并式分層聚類,并基于一款NVIDIA GTX260系列CUDA架構(gòu)的GPU分別對(duì)兩種聚類算法進(jìn)行并行化的實(shí)現(xiàn),從而論證了GPU對(duì)于聚類算法性能提升的有效性及可行性。最后本文將結(jié)合企業(yè)客戶關(guān)系管理系統(tǒng)(Customer Relationship Management,CRM)的實(shí)際需求,實(shí)現(xiàn)Hadoop框架下的GPU聚類運(yùn)算。聚類是數(shù)據(jù)挖掘過(guò)程中一個(gè)十分常用的操作,其主要目的是將隨意散開(kāi)的對(duì)象根據(jù)某種約定的相似性或相關(guān)度將其聚集在一起,形成一個(gè)或多個(gè)簇。本論文所實(shí)現(xiàn)的k均值聚類算法,主要是指將N個(gè)待聚集的對(duì)象或稱節(jié)點(diǎn)按照其空間歐式距離的遠(yuǎn)近將其劃分到距離最近的簇中,數(shù)次迭代后,最終形成K個(gè)簇。作為一種經(jīng)典的聚類算法,該算法已被廣泛使用于數(shù)據(jù)挖掘、生物信息、圖像識(shí)別、人工智能等領(lǐng)域。其中著名的Apache Mahout就在其聚類運(yùn)算中用到k均值聚類算法。對(duì)于實(shí)驗(yàn)中所實(shí)現(xiàn)的另一種聚類算法——合并式分層聚類,首先將N個(gè)獨(dú)立的數(shù)據(jù)對(duì)象初始化為N個(gè)獨(dú)立的子簇,然后以歐式距離作為子簇間相似度的度量標(biāo)準(zhǔn),將距離最近的子簇進(jìn)行合并,多次迭代后,以整個(gè)數(shù)據(jù)集中所有數(shù)據(jù)對(duì)象都包含于一個(gè)簇中作為算法的結(jié)束。雖然其所實(shí)現(xiàn)的操作與最終結(jié)果都與k均值有所不同,但是算法中都包含了大規(guī)模的數(shù)據(jù)運(yùn)算,并且計(jì)算之間有較高的獨(dú)立性,正是因?yàn)檫@種共性,使我們能夠利用多核CPU和GPU對(duì)聚類算法進(jìn)行并行化處理,從而大幅縮減其程序運(yùn)行時(shí)間及提高了數(shù)據(jù)的吞吐量。由于企業(yè)CRM需要運(yùn)行于分布式計(jì)算環(huán)境中,從而選擇了開(kāi)源軟件Hadoop。Haoop框架可以讓用戶將普通性能的計(jì)算機(jī)組成集群進(jìn)行大規(guī)模數(shù)據(jù)的分布式計(jì)算,同時(shí)提供了很強(qiáng)的容錯(cuò)性及可靠性。目前,諸如Google、Facebook等公司都在使用基于Hadoop框架的分布式計(jì)算。本論文的實(shí)驗(yàn)中,分別設(shè)計(jì)了三組不同規(guī)模的輸入數(shù)據(jù)及目標(biāo)聚類數(shù)進(jìn)行k均值運(yùn)算。同時(shí)由于硬件條件所限,對(duì)于分層聚類算法,設(shè)計(jì)了較k均值聚類小一些的數(shù)據(jù)規(guī)模。根據(jù)CUDA的編程模型,程序分為兩部分,其中非高密度計(jì)算的程序運(yùn)行在CPU端,而聚類運(yùn)算所涉及到的大規(guī)模浮點(diǎn)數(shù)數(shù)學(xué)運(yùn)算則運(yùn)行在GPU端,這種模型也被稱為CPU+GPU的異構(gòu)計(jì)算。同時(shí)我們也將相同的實(shí)驗(yàn)對(duì)象基于多核CPU進(jìn)行運(yùn)算。最后通過(guò)對(duì)比兩種實(shí)現(xiàn)方式的運(yùn)算耗時(shí),獲得基于GPU實(shí)現(xiàn)的加速比。最后本文將實(shí)現(xiàn)基于Hadoop框架的GPU聚類運(yùn)算,其中的實(shí)現(xiàn)方法及結(jié)果可以作為企業(yè)CRM設(shè)計(jì)的原型與參考。
[Abstract]:In recent years, with the rapid development of information processing and communication technology, it has promoted the further improvement of output and data acquisition in many fields. This has brought us a large amount of data to be processed. Nowadays, GB or TB orders of magnitude of data are very common. Therefore, the traditional serial data mining technology has been unable to be used. It is effective to deal with these data, which is replaced by parallel data mining technology. Nowadays, multi-core CPU (Central Processing Unit) is very common as a computing platform for parallel data mining. However, as the scale of data is increasing, the complexity of the algorithms involved in data mining is getting higher and higher, which brings high level. Density data operation will consume a large number of computing time of the processor, which is very bad for the performance and power of the whole system. In order to reduce the computation of this part of large-scale data, we need to choose a computing unit that is more suitable for this kind of operation to reduce the load of the system. And the graphic processor - GPU (Graphics Processing Unit), with its special architecture, is very suitable for parallel computing of large scale and high density data. At the beginning of the design, GPU designers have configured a large number of data computing units and high memory access bandwidth to meet the demands of increasingly harsh graphics processing and electronic games, especially since NV When IDIA launched the GPU products of the CUDA (Compute Unified Device Architecture) architecture in 2007 and its supporting environment, the development of GPU based platform is very similar to the development of the CPU platform, allowing developers to quickly start developing GPU based on CUDA architecture. In many fields, such as scientific computing, financial engineering, Data mining, and so on, developers are trying to use the CUDA architecture GPU to improve the computing power of the system. In addition to the implementation of GPU operation in the traditional single machine environment, and more and more research on the application of GPU in the distributed environment. This paper will select two kinds of clustering algorithms commonly used in the data mining domain, K mean clustering And single chain combined hierarchical clustering, and based on a NVIDIA GTX260 series CUDA architecture GPU respectively to the two clustering algorithms to carry out parallel implementation, thus demonstrating the effectiveness and feasibility of GPU clustering algorithm performance improvement. Finally, this paper will combine the enterprise customer relationship management system (Customer Relationship Management, CRM). The actual requirement is to realize the GPU clustering operation under the Hadoop framework. Clustering is a very common operation in the process of data mining. The main purpose of this paper is to assemble the randomly scattered objects together to form one or more clusters according to some agreed similarity or correlation. The K mean clustering algorithm implemented in this paper mainly refers to N is divided into the nearest cluster according to the distance of its spatial Euclidean distance. After several iterations, K clusters are formed after several iterations. As a classic clustering algorithm, the algorithm has been widely used in data mining, biological information, image recognition, artificial intelligence and so on. The famous Apache Mahout The K means clustering algorithm is used in its clustering operation. For another clustering algorithm implemented in the experiment, combined hierarchical clustering, first N independent data objects are initialized to N independent subclusters, and the Euclidean distance is used as the measure of similarity between subclusters, and the nearest sub cluster is merged and repeated multiple times. After generation, all data objects in the whole data set are included in a cluster as the end of the algorithm. Although the operation and final results are different from the K mean, all the algorithms contain large data operations and there is a high independence between the computing. It is precisely because of this commonality that we can use it. Multi core CPU and GPU parallel clustering algorithm to reduce the running time of the program and improve the throughput of the data. Because enterprise CRM needs to run in the distributed computing environment, the choice of the open source software Hadoop.Haoop framework allows the users to cluster the common computer cluster for large-scale data. Distributed computing, at the same time, provides strong fault tolerance and reliability. At present, such as Google, Facebook and other companies are using the distributed computing based on the Hadoop framework. In this paper, three groups of different sizes of input data and target clustering number are designed to perform K mean operation. The layer clustering algorithm has designed a smaller size of data than the K mean clustering. According to the programming model of CUDA, the program is divided into two parts, in which the program of non high density calculation runs in the CPU end, and the mathematical operation of the large floating point number involved in the clustering operation is run at the GPU end, and this model is also called the heterogeneous calculation of CPU+GPU. At the same time, I We also calculate the same object based on the multi core CPU. Finally, by comparing the time consuming of the two implementations, the acceleration ratio based on the GPU implementation is obtained. Finally, this paper will implement the GPU clustering operation based on the Hadoop framework, and the realization method and the result can be used as the prototype and reference of the enterprise CRM design.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 ;NVIDIA GeForce FX被評(píng)為2002年最佳圖形處理器[J];CAD/CAM與制造業(yè)信息化;2003年Z1期

2 李海燕;張春元;李禮;任巨;;圖形處理器的流執(zhí)行模型[J];計(jì)算機(jī)工程;2008年22期

3 ;MathWorks為MATLAB提供GPU支持[J];電子與電腦;2010年10期

4 楊毅;郭立;史鴻聲;郭安泰;;面向移動(dòng)設(shè)備的3D圖形處理器設(shè)計(jì)[J];小型微型計(jì)算機(jī)系統(tǒng);2009年08期

5 ;MathWorks為MATLAB提供GPU支持[J];電信科學(xué);2010年10期

6 ;MathWorks為MATLAB提供GPU支持[J];中國(guó)電子商情(基礎(chǔ)電子);2010年10期

7 ;MathWorks為MATLAB提供GPU支持[J];電信科學(xué);2010年S2期

8 韓俊剛;劉有耀;張曉;;圖形處理器的歷史現(xiàn)狀和發(fā)展趨勢(shì)[J];西安郵電學(xué)院學(xué)報(bào);2011年03期

9 ;產(chǎn)品推介[J];電子產(chǎn)品世界;2012年09期

10 ;產(chǎn)業(yè)信息[J];單片機(jī)與嵌入式系統(tǒng)應(yīng)用;2013年12期

相關(guān)會(huì)議論文 前7條

1 張春燕;;一種基于圖形處理器的數(shù)據(jù)流計(jì)算模式[A];全國(guó)第19屆計(jì)算機(jī)技術(shù)與應(yīng)用(CACIS)學(xué)術(shù)會(huì)議論文集(下冊(cè))[C];2008年

2 徐侃;陳如山;杜磊;朱劍;楊陽(yáng);;可編程圖形處理器加速無(wú)條件穩(wěn)定的Crank-Nicolson FDTD分析三維微波電路[A];2009年全國(guó)微波毫米波會(huì)議論文集(下冊(cè))[C];2009年

3 周國(guó)亮;馮海軍;何國(guó)明;陳紅;李翠平;王珊;;基于圖形處理器的Cuboid算法[A];第26屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(B輯)[C];2009年

4 畢文元;陳志強(qiáng);;利用可編程圖形處理器加速CT重建與體數(shù)據(jù)的繪制[A];第十一屆中國(guó)體視學(xué)與圖像分析學(xué)術(shù)會(huì)議論文集[C];2006年

5 劉偉峰;楊權(quán)一;曹邦功;孟凡密;周潔;;基于GPU的高度并行Marching Cubes改進(jìn)算法[A];2008年全國(guó)開(kāi)放式分布與并行計(jì)算機(jī)學(xué)術(shù)會(huì)議論文集(上冊(cè))[C];2008年

6 林旭生;田緒紅;馮志煒;陳茂資;;GPU加速的蟻群算法在HP模型中的應(yīng)用[A];第十四屆全國(guó)圖象圖形學(xué)學(xué)術(shù)會(huì)議論文集[C];2008年

7 方建文;于金輝;陳海英;;三維卡通水與物體交互作用的動(dòng)畫(huà)建模[A];中國(guó)計(jì)算機(jī)圖形學(xué)進(jìn)展2008--第七屆中國(guó)計(jì)算機(jī)圖形學(xué)大會(huì)論文集[C];2008年

相關(guān)重要報(bào)紙文章 前10條

1 樂(lè)山 樂(lè)水;圖形處理技術(shù)的全球?qū)@季中蝿?shì)[N];中國(guó)知識(shí)產(chǎn)權(quán)報(bào);2010年

2 嚴(yán)威川;明明白白顯卡“芯”[N];中國(guó)電腦教育報(bào);2007年

3 ;NEC圖形處理器每秒運(yùn)行50.2G條指令[N];計(jì)算機(jī)世界;2003年

4 游訊;圖形處理器GPU[N];人民郵電;2011年

5 本報(bào)記者 姜姝;AMD嵌入式技術(shù)為波音飛機(jī)保駕護(hù)航[N];中國(guó)信息化周報(bào);2014年

6 均兒;人人都有臺(tái)超級(jí)計(jì)算機(jī)[N];電腦報(bào);2008年

7 ;AMD啟動(dòng)“Fusion”企業(yè)品牌推廣計(jì)劃[N];人民郵電;2008年

8 本報(bào)記者 田夢(mèng);Adobe CS4全面支持GPU加速[N];計(jì)算機(jī)世界;2009年

9 趙欣;“玩”3D,筆記本也行![N];中國(guó)計(jì)算機(jī)報(bào);2003年

10 ;HP Compaq Evo D210教育信息化的好幫手[N];中國(guó)計(jì)算機(jī)報(bào);2003年

相關(guān)博士學(xué)位論文 前7條

1 祖淵;基于圖形處理器的高速并行算法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年

2 李雪;基于圖形處理的多點(diǎn)地質(zhì)統(tǒng)計(jì)算法及模型評(píng)價(jià)[D];中國(guó)科學(xué)技術(shù)大學(xué);2016年

3 曹小鵬;圖形處理器關(guān)鍵技術(shù)和光線追蹤并行結(jié)構(gòu)研究[D];西安電子科技大學(xué);2015年

4 楊珂;基于圖形處理器的數(shù)據(jù)管理技術(shù)研究[D];浙江大學(xué);2008年

5 穆帥;針對(duì)不規(guī)則應(yīng)用的圖形處理器資源調(diào)度關(guān)鍵技術(shù)研究[D];清華大學(xué);2013年

6 夏健明;基于圖形處理器的大規(guī)模結(jié)構(gòu)計(jì)算研究[D];華南理工大學(xué);2009年

7 黃濤;基于GPU的多點(diǎn)地質(zhì)統(tǒng)計(jì)逐點(diǎn)模擬并行算法的研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2013年

相關(guān)碩士學(xué)位論文 前10條

1 劉銳;GPU在FD-OCT系統(tǒng)數(shù)據(jù)處理及實(shí)時(shí)圖像顯示中的應(yīng)用[D];北京理工大學(xué);2015年

2 彭歡;基于GPU的二維FDTD加速算法研究[D];西安電子科技大學(xué);2013年

3 徐蔚;基于圖形處理器的窗口系統(tǒng)的研究[D];西安工程大學(xué);2015年

4 孫修宇;基于GPU的三維圖像重建方法研究[D];中國(guó)民航大學(xué);2011年

5 劉伍鋒;基于PCI總線的主設(shè)備功能仿真與驗(yàn)證[D];西安電子科技大學(xué);2016年

6 李天驥;圖形處理器存儲(chǔ)系統(tǒng)的高精度System Verilog模型與自動(dòng)化仿真驗(yàn)證[D];西安電子科技大學(xué);2016年

7 陳貴華;基于RDMA高性能通信庫(kù)的設(shè)計(jì)與實(shí)現(xiàn)[D];華中科技大學(xué);2015年

8 馬仁駿;CUDA圖形處理器對(duì)聚類算法的加速實(shí)現(xiàn)[D];上海交通大學(xué);2013年

9 黃偉鈿;面向移動(dòng)平臺(tái)的3D圖形處理器的設(shè)計(jì)[D];華南理工大學(xué);2011年

10 王旭;圖形處理器的仿真驗(yàn)證[D];哈爾濱工業(yè)大學(xué);2007年

,

本文編號(hào):2148850

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/kehuguanxiguanli/2148850.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e6938***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com