當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

CPU和GPU協(xié)同運(yùn)算下的DEFLATE算法性能加速研究

發(fā)布時(shí)間：2018-04-07 16:29

本文選題：GPU　切入點(diǎn)：OPENCL　出處：《吉林大學(xué)》2013年碩士論文

【摘要】：異構(gòu)計(jì)算，是未來(lái)高性能計(jì)算領(lǐng)域發(fā)展的主要趨勢(shì)。順應(yīng)這種技術(shù)的發(fā)展趨勢(shì)，GPU通用計(jì)算技術(shù)正面臨前所未有的發(fā)展機(jī)遇。圖形處理器在硬件設(shè)計(jì)方面具有：浮點(diǎn)運(yùn)算能力強(qiáng)大，適合大規(guī)模、高計(jì)算密度并行數(shù)據(jù)處理等特點(diǎn)。GPU通用計(jì)算技術(shù)就是利用圖形處理器的這些特點(diǎn)去完成那些非圖形處理領(lǐng)域的通用計(jì)算任務(wù)。隨著GPU編程技術(shù)的不斷發(fā)展，利用GPU去協(xié)同CPU構(gòu)建計(jì)算性能強(qiáng)大且成本較低的高性能計(jì)算平臺(tái)具有廣泛的應(yīng)用前景。信息時(shí)代帶來(lái)了“信息爆炸”。隨著互聯(lián)網(wǎng)的興盛以及移動(dòng)通信的發(fā)展，海量信息的存儲(chǔ)及通信等問(wèn)題日益突顯。數(shù)據(jù)壓縮的作用及其社會(huì)效益、經(jīng)濟(jì)效益將越來(lái)越明顯。如果不進(jìn)行數(shù)據(jù)壓縮，那么無(wú)論是數(shù)據(jù)存儲(chǔ)還是數(shù)據(jù)傳輸都很難實(shí)用化。使用數(shù)據(jù)壓縮的好處在于：數(shù)據(jù)壓縮不僅僅是能夠?yàn)橛脩?hù)節(jié)約存儲(chǔ)空間，也能較快的傳輸各種信息，減小通信延遲。此外，在節(jié)省通信帶寬和節(jié)約信息傳送資源消耗方面，，數(shù)據(jù)壓縮也能起到很大的作用。當(dāng)前數(shù)據(jù)壓縮領(lǐng)域流行的無(wú)損壓縮算法有很多，如DEFLATE、BZIP2、LZMA、LZMA2等等，其中壓縮速度最快的是DEFLATE壓縮算法。根據(jù)對(duì)DEFLATE算法性能的實(shí)際測(cè)試，發(fā)現(xiàn)DEFLATE算法在大數(shù)據(jù)文件壓縮方面的性能表現(xiàn)，并不如預(yù)期那樣令人滿(mǎn)意。因此，本文將以對(duì)DEFLATE算法進(jìn)行性能改進(jìn)作為實(shí)際出發(fā)點(diǎn)，研究如何利用GPU通用編程技術(shù)優(yōu)化日常應(yīng)用軟件的執(zhí)行效率。在DEFLATE算法的各種實(shí)現(xiàn)版本中，我們選擇了GZIP的實(shí)現(xiàn)版本，因?yàn)樵搶?shí)現(xiàn)版本是最接近RFC1951文檔中關(guān)于DEFLATE算法的描述。在優(yōu)化方案上，我們選擇了使用CPU和GPU協(xié)同運(yùn)算加速的方式。為GZIP的運(yùn)行重新設(shè)計(jì)了一套并行流水線(xiàn)機(jī)制，對(duì)部分算法的CPU實(shí)現(xiàn)代碼進(jìn)行了改進(jìn)，并用OpenCL編程框架將DEFLATE算法中比較適合GPU編程的部分進(jìn)行了kernel實(shí)現(xiàn)。經(jīng)過(guò)多種硬件環(huán)境的測(cè)試，結(jié)果表明，該解決方案的最終實(shí)現(xiàn)版本能夠?qū)Σ糠譁y(cè)試用例起到不錯(cuò)的加速效果。本文在實(shí)現(xiàn)對(duì)DEFLATE算法進(jìn)行CPU和GPU協(xié)同加速的同時(shí)，也對(duì)以下內(nèi)容作了較深入的分析和研究： 1.對(duì)GPU體系架構(gòu)和GPU編程技術(shù)進(jìn)行了探討。對(duì)NVIDIA和AMD兩大顯卡生產(chǎn)廠商的GPU架構(gòu)設(shè)計(jì)特點(diǎn)，進(jìn)行了系統(tǒng)分析。給出了GPU編程技術(shù)的發(fā)展歷史，介紹了利用OpenCL的編寫(xiě)GPU通用程序的方法。 2.總結(jié)分析了相關(guān)的數(shù)據(jù)壓縮技術(shù)。以BZIP2算法和DEFLATE算法為例，分析了壓縮算法的壓縮原理和常見(jiàn)的壓縮技術(shù)。細(xì)致分析了DEFLATE算法的LZ77壓縮編碼部分和Huffman熵編碼部分。 3.對(duì)GZIP源碼的分析。主要分析其源碼的組成結(jié)構(gòu)、關(guān)鍵函數(shù)的實(shí)現(xiàn)細(xì)節(jié)以及優(yōu)化可能性分析等。通過(guò)那些分析內(nèi)容構(gòu)建出一套比較合理的優(yōu)化解決方案，并實(shí)現(xiàn)于最終的改進(jìn)代碼中。
[Abstract]:Heterogeneous computing is the main trend in the field of high performance computing in the future.Following the development trend of this technology, GPU general computing technology is facing unprecedented development opportunities.Graphics processor in hardware design has: floating-point computing power is powerful, suitable for large-scale,The general computing technology of GPU is to make use of these characteristics of GPU to accomplish the general computing tasks in the field of non-graphic processing.With the development of GPU programming technology, using GPU to cooperate with CPU to build high performance computing platform with powerful computing performance and low cost has a wide application prospect.The information age has brought the "information explosion".With the prosperity of the Internet and the development of mobile communication, the storage and communication of mass information become increasingly prominent.The function of data compression and its social benefit, economic benefit will be more and more obvious.Without data compression, both data storage and data transmission are difficult to apply.The advantage of using data compression is that data compression can not only save storage space for users, but also can transmit all kinds of information quickly and reduce communication delay.In addition, data compression can also play an important role in saving communication bandwidth and resource consumption.There are many lossless compression algorithms in the field of data compression, such as flash BZIP2 / LZMALZMA2 and so on. Among them, the fastest compression speed is the DEFLATE compression algorithm.According to the actual performance test of DEFLATE algorithm, it is found that the performance of DEFLATE algorithm in big data file compression is not as satisfactory as expected.Therefore, this paper takes the performance improvement of DEFLATE algorithm as the starting point, and studies how to optimize the execution efficiency of daily application software by using the general programming technology of GPU.Among the various versions of the implementation of the DEFLATE algorithm, we chose the implementation version of GZIP because it is the closest to the description of the DEFLATE algorithm in the RFC1951 document.In the optimization scheme, we choose to use CPU and GPU to accelerate the cooperative operation.This paper redesigns a set of parallel pipeline mechanism for the operation of GZIP, improves the CPU implementation code of some algorithms, and implements the part of DEFLATE algorithm which is more suitable for GPU programming with OpenCL programming framework.After testing in various hardware environments, the results show that the final implementation version of the solution can accelerate part of the test cases well.In this paper, the DEFLATE algorithm for CPU and GPU co-acceleration, but also for the following in-depth analysis and research:1.The GPU architecture and GPU programming technology are discussed.The characteristics of GPU architecture design of NVIDIA and AMD display card manufacturers are systematically analyzed.The development history of GPU programming technology is given, and the method of writing GPU general program using OpenCL is introduced.2.The related data compression techniques are summarized and analyzed.Taking BZIP2 algorithm and DEFLATE algorithm as examples, the compression principle and common compression techniques of compression algorithm are analyzed.The LZ77 compression coding part and Huffman entropy coding part of DEFLATE algorithm are analyzed in detail.3.Analysis of GZIP source code.Mainly analyzes its source code composition structure, the key function realization detail as well as the optimization possibility analysis and so on.A reasonable set of optimization solutions is constructed by analyzing the content and implemented in the final improvement code.
【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類(lèi)號(hào)】：TP338.6;TP391.41

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 謝茂濤;計(jì)算機(jī)數(shù)字邏輯器件故障分析[J];鄂州大學(xué)學(xué)報(bào);1999年03期

2 陳一鳴;I~2C總線(xiàn)技術(shù)在彩電中的應(yīng)用[J];浙江萬(wàn)里學(xué)院學(xué)報(bào);2002年01期

3 王楠,高慶獅,侯紫峰,宋建平;一種高安全性的新型存儲(chǔ)體系[J];計(jì)算機(jī)研究與發(fā)展;2004年05期

4 宗軍紅,王春生,孔令旭,景素霞,胡憲鋒;數(shù)據(jù)庫(kù)物理設(shè)計(jì)需注意的幾點(diǎn)問(wèn)題[J];油氣田地面工程;2002年02期

5 田春仿;微電腦控制系統(tǒng)的剖析方法[J];武漢理工大學(xué)學(xué)報(bào)(信息與管理工程版);2002年02期

6 楊曉紅;微機(jī)的散熱分析[J];高等職業(yè)教育-天津職業(yè)大學(xué)學(xué)報(bào);2002年05期

7 孫峻嶺;智能感溫火災(zāi)探測(cè)器的低功耗設(shè)計(jì)[J];安徽電子信息職業(yè)技術(shù)學(xué)院學(xué)報(bào);2004年02期

8 林昱;主板技術(shù)的新發(fā)展[J];北京聯(lián)合大學(xué)學(xué)報(bào);2001年02期

9 蘇陽(yáng);軟件的抗干擾技術(shù)軟件[J];信息技術(shù)與標(biāo)準(zhǔn)化;2002年03期

10 張春玲;《匯編語(yǔ)言程序設(shè)計(jì)》教學(xué)難點(diǎn)解析[J];河北廣播電視大學(xué)學(xué)報(bào);2002年03期

相關(guān)會(huì)議論文前10條

1 李全鋼;張芳;宋振興;;基于CPU卡的一卡通系統(tǒng)在宣鋼2#服務(wù)區(qū)的應(yīng)用[A];2011年河北省冶金信息化自動(dòng)化年會(huì)論文集[C];2011年

2 鄒云鵬;康雁;;基于CPU的光線(xiàn)投射算法的并行計(jì)算方法[A];中國(guó)生物醫(yī)學(xué)工程學(xué)會(huì)成立30周年紀(jì)念大會(huì)暨2010中國(guó)生物醫(yī)學(xué)工程學(xué)會(huì)學(xué)術(shù)大會(huì)青年優(yōu)秀論文[C];2010年

3 李求實(shí);王秋月;王珊;;平衡IO和CPU的XML關(guān)鍵詞檢索技術(shù)[A];第26屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（A輯）[C];2009年

4 胡益新;奚永新;;基于雙32位CPU的電除塵控制器的設(shè)計(jì)[A];第十四屆中國(guó)電除塵學(xué)術(shù)會(huì)議論文集[C];2011年

5 羅宗揚(yáng);薛利興;左德承;張展;楊孝宗;;基于JTAG的CPU故障注入工具的設(shè)計(jì)與實(shí)現(xiàn)[A];第十四屆全國(guó)容錯(cuò)計(jì)算學(xué)術(shù)會(huì)議(CFTC'2011)論文集[C];2011年

6 趙朝霞;;淺論梅山2~#石灰豎窯控制系統(tǒng)[A];第十一屆全國(guó)自動(dòng)化應(yīng)用技術(shù)學(xué)術(shù)交流會(huì)論文集[C];2006年

7 樊番;;PLC與智能從站之間的通訊[A];2008年全國(guó)軋鋼生產(chǎn)技術(shù)會(huì)議文集[C];2008年

8 樊番;;PLC與智能從站之間的通訊[A];2008年河北省軋鋼技術(shù)與學(xué)術(shù)年會(huì)論文集（下）[C];2008年

9 謝衛(wèi)才;林友杰;彭磊;謝澍;;基于雙CPU的電機(jī)節(jié)能控制[A];第十六屆中國(guó)小電機(jī)技術(shù)研討會(huì)論文摘要集[C];2011年

10 肖永順;陳志強(qiáng);張麗;;工業(yè)CT斷層重建算法的通用計(jì)算硬件加速[A];2004年CT和三維成像學(xué)術(shù)年會(huì)論文集[C];2004年

相關(guān)重要報(bào)紙文章前10條

1 本報(bào)記者湯銘;CPU：迎接融合時(shí)代[N];計(jì)算機(jī)世界;2011年

2 蘇州國(guó)芯科技有限公司董事長(zhǎng) 鄭茳;打開(kāi)國(guó)產(chǎn)嵌入式CPU應(yīng)用之路[N];中國(guó)電子報(bào);2011年

3 重慶陳靜;移動(dòng)CPU大躍進(jìn)性能競(jìng)賽引擔(dān)憂(yōu)[N];電腦報(bào);2011年

4 本報(bào)記者劉肖勇;網(wǎng)絡(luò)藍(lán)軍:一塊CPU，就是一架轟炸機(jī)[N];廣東科技報(bào);2011年

5 劉清;稅務(wù)總局嚴(yán)打虛假出口CPU騙取退稅行為[N];中國(guó)貿(mào)易報(bào);2009年

6 何雨欣　李延霞;稅務(wù)總局：嚴(yán)打虛假出口CPU騙稅行為[N];中國(guó)企業(yè)報(bào);2009年

7 上海硅知識(shí)產(chǎn)權(quán)交易中心俞慧月;中國(guó)須盡早確定CPU專(zhuān)利對(duì)策[N];中國(guó)電子報(bào);2009年

8 黑龍江高林;彩電CPU故障的分析與速修兩例[N];電子報(bào);2011年

9 記者孫勇;稅務(wù)總局嚴(yán)打虛假出口CPU騙稅行為[N];經(jīng)濟(jì)日?qǐng)?bào);2009年

10 本報(bào)記者劉麗麗;浪潮推新服務(wù)器國(guó)產(chǎn)CPU市場(chǎng)化啟程[N];計(jì)算機(jī)世界;2011年

相關(guān)博士學(xué)位論文前10條

1 葛海通;32位高性能嵌入式CPU及平臺(tái)研發(fā)[D];浙江大學(xué);2009年

2 朱二周;基于CPU/GPU平臺(tái)的虛擬化技術(shù)研究[D];上海交通大學(xué);2012年

3 肖漢;基于CPU+GPU的影像匹配高效能異構(gòu)并行計(jì)算研究[D];武漢大學(xué);2011年

4 王明宇;低功耗雙界面CPU智能卡芯片的研究與設(shè)計(jì)[D];復(fù)旦大學(xué);2011年

5 李波;基于異構(gòu)多核平臺(tái)的優(yōu)化編程研究[D];華中科技大學(xué);2011年

6 鄭丹丹;嵌入式CPU的納米尺度SRAM設(shè)計(jì)研究[D];浙江大學(xué);2009年

7 趙佳;虛擬機(jī)動(dòng)態(tài)遷移的關(guān)鍵問(wèn)題研究[D];吉林大學(xué);2013年

8 李敏;基于協(xié)同異構(gòu)模型的成形模擬計(jì)算加速[D];華中科技大學(xué);2010年

9 廖永波;SOC軟硬件協(xié)同方法及其在FPGA芯片測(cè)試中的應(yīng)用研究[D];電子科技大學(xué);2010年

10 林一松;面向GPU的低功耗軟件優(yōu)化關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2012年

相關(guān)碩士學(xué)位論文前10條

1 李晶;CPU和GPU協(xié)同運(yùn)算下的DEFLATE算法性能加速研究[D];吉林大學(xué);2013年

2 徐連軍;雙CPU冗余通信控制系統(tǒng)的研究與實(shí)現(xiàn)[D];西安電子科技大學(xué);2011年

3 林守林;基于CPU利用率的功率調(diào)整策略的研究與實(shí)現(xiàn)[D];中國(guó)地質(zhì)大學(xué)（北京）;2010年

4 馮元華;嵌入式多CPU控制器硬件體系分析與設(shè)計(jì)[D];暨南大學(xué);2010年

5 徐峰;雙界面CPU智能卡并行測(cè)試儀設(shè)計(jì)開(kāi)發(fā)[D];復(fù)旦大學(xué);2011年

6 奧飚;運(yùn)用CPU卡的電力收費(fèi)系統(tǒng)中密鑰管理系統(tǒng)的安全性研究與設(shè)計(jì)[D];華北電力大學(xué)（北京）;2011年

7 范曉亮;基于FPGA的雙核模型機(jī)CPU的設(shè)計(jì)與實(shí)現(xiàn)[D];東北大學(xué);2008年

8 胡杰;CPU-GPU異構(gòu)平臺(tái)計(jì)算模型的研究與應(yīng)用[D];大連理工大學(xué);2011年

9 成思遠(yuǎn);異構(gòu)（CPU-GPU）計(jì)算機(jī)系統(tǒng)性能評(píng)測(cè)與優(yōu)化技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年

10 謝萌;針對(duì)Multi-core CPU和General Purpose GPU在MATLAB下微分方程常用算法的優(yōu)化[D];河北科技大學(xué);2012年

本文編號(hào)：1719957

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1719957.html

上一篇：基于嵌入式的分布式太陽(yáng)能照明系統(tǒng)
下一篇：標(biāo)準(zhǔn)CPCI板卡的多操作系統(tǒng)驅(qū)動(dòng)程序開(kāi)發(fā)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

CPU和GPU協(xié)同運(yùn)算下的DEFLATE算法性能加速研究