面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣算法研究

發(fā)布時(shí)間：2018-09-10 20:16

【摘要】：面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣算法已經(jīng)成為高性能計(jì)算領(lǐng)域的關(guān)鍵問題之一。稀疏矩陣算法是自然科學(xué)和社會(huì)科學(xué)中許多領(lǐng)域進(jìn)行數(shù)值模擬計(jì)算時(shí)的關(guān)鍵技術(shù)和性能瓶頸,為了提高稀疏矩陣算法的計(jì)算性能,需要提高相應(yīng)算法在特定平臺(tái)上的計(jì)算效率。然而,一方面,稀疏矩陣算法的計(jì)算與訪存行為與矩陣的稀疏結(jié)構(gòu)相關(guān),是典型的不規(guī)則類算法,很難發(fā)掘時(shí)間與空間的局部性;另一方面,隨著包含算法加速器的異構(gòu)并行體系結(jié)構(gòu)成為高性能計(jì)算機(jī)體系結(jié)構(gòu)的主流,并行程序執(zhí)行效率的影響因素日益復(fù)雜多樣,傳統(tǒng)的面向具體平臺(tái)的程序并行與優(yōu)化方法已經(jīng)無法滿足高效率并行程序的開發(fā)需求。為了應(yīng)對(duì)這些問題和挑戰(zhàn),本文對(duì)面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣算法進(jìn)行了深入的研究,主要工作與創(chuàng)新點(diǎn)如下:(1)提出了面向CPU-GPU異構(gòu)平臺(tái)的搜索方向優(yōu)化的寬度優(yōu)先搜索算法。本文實(shí)現(xiàn)了基于GPU的自底向上的寬度優(yōu)先搜索算法,提高了GPU線程訪存的連續(xù)性并降低了線程間負(fù)載的不均衡,并進(jìn)一步實(shí)現(xiàn)了CPU-GPU協(xié)同的自底向上的寬度優(yōu)先搜索算法,充分利用了CPU-GPU異構(gòu)并行計(jì)算平臺(tái)上所有的計(jì)算資源,并有效提高了寬度優(yōu)先搜索算法對(duì)大規(guī)模節(jié)點(diǎn)前沿的處理速度。在此基礎(chǔ)上,設(shè)計(jì)了面向CPU-GPU異構(gòu)體系結(jié)構(gòu)的優(yōu)化搜索方向的寬度優(yōu)先搜索算法,通過結(jié)合基于多核CPU的自頂向下的串行寬度優(yōu)先搜索算法、自頂向下的并行寬度優(yōu)先搜索算法以及CPU-GPU協(xié)同的自底向上的寬度優(yōu)先搜索算法,提高了寬度優(yōu)先搜索算法對(duì)不同規(guī)模節(jié)點(diǎn)前沿的適應(yīng)性。(2)提出了面向CPU-GPU異構(gòu)平臺(tái)的稀疏矩陣向量乘算法。本文提出了基于索引信息壓縮的稀疏矩陣分塊存儲(chǔ)數(shù)據(jù)結(jié)構(gòu),通過合并位置相近的同行非零元,減少了矩陣非零元素對(duì)應(yīng)的索引信息的數(shù)據(jù)量,從而一方面提高了稀疏矩陣訪存的規(guī)則性和局部性,另一方面控制了填零元所引入的額外的計(jì)算和訪存開銷,并通過采用二級(jí)混合存儲(chǔ)數(shù)據(jù)結(jié)構(gòu)提高了對(duì)于矩陣不同稀疏結(jié)構(gòu)的適應(yīng)性。在此基礎(chǔ)上,實(shí)現(xiàn)了基于計(jì)算訪存量的負(fù)載均衡算法,分別設(shè)計(jì)了針對(duì)多核CPU和GPU的優(yōu)化的Sp MV算法,有效的提高了稀疏矩陣向量乘算法的計(jì)算效率,并進(jìn)一步實(shí)現(xiàn)了CPU-GPU協(xié)同的稀疏矩陣向量乘算法,充分發(fā)掘了異構(gòu)體系結(jié)構(gòu)計(jì)算平臺(tái)的計(jì)算能力。(3)提出了面向異構(gòu)平臺(tái)的基于超節(jié)點(diǎn)數(shù)據(jù)結(jié)構(gòu)的稀疏矩陣分解算法。本文以稀疏矩陣Cholesky分解算法為研究對(duì)象,在數(shù)據(jù)組織方面,改進(jìn)了稀疏矩陣超節(jié)點(diǎn)數(shù)據(jù)結(jié)構(gòu),通過超節(jié)點(diǎn)合并和分塊控制計(jì)算粒度;在計(jì)算調(diào)度方面,將分解過程映射為一系列的數(shù)據(jù)塊任務(wù),并設(shè)計(jì)了相應(yīng)的任務(wù)生成與調(diào)度算法,在滿足數(shù)據(jù)依賴性的前提下提高任務(wù)的并行性,從而顯著提高了稀疏矩陣Cholesky分解算法在GPU上的實(shí)現(xiàn)效率。通過將控制任務(wù)和計(jì)算任務(wù)分別映射到CPU和GPU上,有效提高了CPU-GPU異構(gòu)平臺(tái)上稀疏矩陣分解算法的計(jì)算性能。(4)提出了面向CPU-GPU異構(gòu)平臺(tái)的稀疏三角方程組求解算法。本文提出了面向稀疏結(jié)構(gòu)的分塊處理策略,根據(jù)稀疏三角矩陣非零元密度的不同將計(jì)算過程分解,并設(shè)計(jì)了基于分塊數(shù)據(jù)結(jié)構(gòu)的稀疏三角方程組求解算法。在此基礎(chǔ)上,設(shè)計(jì)了面向負(fù)載均衡的線程映射算法,并針對(duì)GPU設(shè)計(jì)了基于線程warp的計(jì)算組織策略,降低了GPU線程的控制分支所造成的性能損失,并進(jìn)一步實(shí)現(xiàn)了CPU-GPU協(xié)同的稀疏三角方程組求解并行算法,有效提高了CPU-GPU異構(gòu)平臺(tái)上稀疏三角方程組求解算法的計(jì)算效率。
[Abstract]:Sparse matrix algorithm for heterogeneous architecture has become one of the key problems in high performance computing. Sparse matrix algorithm is the key technology and performance bottleneck in many fields of natural science and social sciences. In order to improve the performance of sparse matrix algorithm, it is necessary to improve the corresponding algorithm in particular. However, on the one hand, the computation and memory access behavior of the sparse matrix algorithm is related to the sparse structure of the matrix, and it is a typical irregular algorithm. It is difficult to discover the locality of time and space; on the other hand, with the heterogeneous parallel architecture including the algorithm accelerator, it becomes the main architecture of the high performance computer architecture. In order to meet these problems and challenges, the sparse matrix algorithm for heterogeneous architecture is studied in this paper. The main work is as follows:1. The main innovations are as follows: (1) A breadth-first search algorithm for CPU-GPU heterogeneous platform search direction optimization is proposed. This paper implements a GPU-based bottom-up breadth-first search algorithm, which improves the continuity of GPU thread access and reduces the load imbalance between threads, and further realizes the bottom-up CPU-GPU cooperation. Width-first search algorithm makes full use of all computing resources on CPU-GPU heterogeneous parallel computing platform, and effectively improves the processing speed of width-first search algorithm for large-scale node frontier. On this basis, an optimized search direction search algorithm for CPU-GPU heterogeneous architecture is designed. The top-down serial width-first search algorithm based on multi-core CPU, the top-down parallel width-first search algorithm and the CPU-GPU cooperative bottom-up width-first search algorithm improve the adaptability of the width-first search algorithm to different scale node frontiers. (2) The sparse matrix orientation for CPU-GPU heterogeneous platform is proposed. In this paper, a sparse matrix block storage data structure based on index information compression is proposed. By merging peer-to-peer non-zero elements with similar positions, the amount of data of index information corresponding to non-zero elements of the matrix is reduced. Thus, the regularity and locality of sparse matrix access are improved on the one hand, and the zero-filling elements are controlled on the other hand. Additional computation and memory access overhead are added, and the adaptability to different sparse structures of the matrix is improved by using a two-level hybrid storage data structure. On this basis, load balancing algorithm based on computational memory access is implemented, and optimized Sp MV algorithm for multi-core CPU and GPU is designed, which effectively improves the sparse matrix vector. The computational efficiency of the multiplication algorithm is improved, and the CPU-GPU cooperative sparse matrix vector multiplication algorithm is further realized, which fully exploits the computing capability of heterogeneous architecture computing platform. (3) A sparse matrix decomposition algorithm based on super-node data structure for heterogeneous platform is proposed. In the aspect of data organization, the sparse matrix super-node data structure is improved, and the computing granularity is controlled by super-node merging and partitioning; in the aspect of computing scheduling, the decomposition process is mapped into a series of data block tasks, and the corresponding task generation and scheduling algorithm is designed to improve the parallelism of tasks under the premise of satisfying the data dependence. By mapping control tasks and computing tasks to CPU and GPU respectively, the performance of the sparse matrix decomposition algorithm on CPU-GPU heterogeneous platform is effectively improved. (4) A sparse triangular equations solving algorithm for CPU-GPU heterogeneous platform is proposed. A block processing strategy for sparse structure is proposed, and the calculation process is decomposed according to the non-zero element density of sparse triangular matrix, and a sparse triangular equations solving algorithm based on block data structure is designed. On this basis, a load balancing oriented thread mapping algorithm is designed, and a thread warp-based meter is designed for GPU. Computational organization strategy reduces the performance loss caused by the control branches of GPU threads, and further realizes the CPU-GPU cooperative parallel algorithm for sparse trigonometric equations, which effectively improves the computational efficiency of sparse trigonometric equations solving algorithm on CPU-GPU heterogeneous platform.
【學(xué)位授予單位】：國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP301.6;TP332

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 徐衛(wèi)克;;體系結(jié)構(gòu)評(píng)估方法的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)與現(xiàn)代化;2009年08期

2 范玉順;;面向服務(wù)的企業(yè)的體系結(jié)構(gòu)與關(guān)鍵技術(shù)[J];航空制造技術(shù);2010年03期

3 姜志平;丁峰;易侃;羅晨;;綜合電子信息系統(tǒng)綜合級(jí)體系結(jié)構(gòu)概念及框架[J];指揮信息系統(tǒng)與技術(shù);2012年03期

4 張佳南;葛健;潘海俠;;論體系結(jié)構(gòu)[J];計(jì)算機(jī)科學(xué);2012年S2期

5 李斌;64位體系結(jié)構(gòu)的研制[J];管理科學(xué)文摘;1997年05期

6 ;新一代32位RISC體系結(jié)構(gòu)[J];電子產(chǎn)品世界;1998年Z1期

7 於丹;體系結(jié)構(gòu),來自低潮時(shí)期的回顧與思考[J];微電腦世界;1999年09期

8 彭濤,蔣凡,孟憲海,李曦,趙振西;RISC體系結(jié)構(gòu)計(jì)算機(jī)的中斷檢測(cè)[J];計(jì)算機(jī)工程;2000年07期

9 周俊,葉酉蓀;一種戰(zhàn)區(qū)綜合電子信息系統(tǒng)互通體系結(jié)構(gòu)[J];電子工程師;2001年03期

10 ;全球商定包括3G的無線體系結(jié)構(gòu)[J];廣東通信技術(shù);2001年11期

相關(guān)會(huì)議論文前10條

1 董永貴;董恩生;賈惠波;;生物啟發(fā)儀器的體系結(jié)構(gòu)及實(shí)現(xiàn)技術(shù)[A];第二屆全國(guó)信息獲取與處理學(xué)術(shù)會(huì)議論文集[C];2004年

2 王翠茹;高麗鮮;;元數(shù)據(jù)集成體系結(jié)構(gòu)的研究[A];2009全國(guó)計(jì)算機(jī)網(wǎng)絡(luò)與通信學(xué)術(shù)會(huì)議論文集[C];2009年

3 魏晨曦;房鴻瑞;;NASA未來深空測(cè)控新概念研究[A];中國(guó)空間科學(xué)學(xué)會(huì)第七次學(xué)術(shù)年會(huì)會(huì)議手冊(cè)及文集[C];2009年

4 甘仞初;謝瑩;曹炳文;;需求驅(qū)動(dòng)的自適應(yīng)體系結(jié)構(gòu)的知識(shí)體系研究[A];第八屆中國(guó)管理科學(xué)學(xué)術(shù)年會(huì)論文集[C];2006年

5 李俊超;張占月;甘朝虹;楊欣;;C~4ISR體系結(jié)構(gòu)設(shè)計(jì)方法研究[A];2013第一屆中國(guó)指揮控制大會(huì)論文集[C];2013年

6 余亞平;馬秀琴;;國(guó)內(nèi)PACS系統(tǒng)淺析[A];中華醫(yī)學(xué)會(huì)第十三屆全國(guó)放射學(xué)大會(huì)論文匯編（下冊(cè)）[C];2006年

7 李海靈;;信息化礦山建設(shè)體系結(jié)構(gòu)在嘉樂泉煤礦的應(yīng)用[A];煤礦自動(dòng)化與信息化——第21屆全國(guó)煤礦自動(dòng)化與信息化學(xué)術(shù)會(huì)議暨第3屆中國(guó)煤礦信息化與自動(dòng)化高層論壇論文集（下冊(cè)）[C];2011年

8 余毅敏;何川;楊青彬;;淺析移動(dòng)Agent技術(shù)及其在TMN管理中的應(yīng)用優(yōu)勢(shì)[A];2008通信理論與技術(shù)新進(jìn)展——第十三屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集（上）[C];2008年

9 陳英武;葛冰峰;熊健;姜江;楊克巍;;基于可執(zhí)行體系結(jié)構(gòu)的體系優(yōu)化設(shè)計(jì)過程[A];經(jīng)濟(jì)全球化與系統(tǒng)工程——中國(guó)系統(tǒng)工程學(xué)會(huì)第16屆學(xué)術(shù)年會(huì)論文集[C];2010年

10 楚旺;錢德沛;;基于體系結(jié)構(gòu)的軟件生產(chǎn)線開發(fā)方法的形式化框架[A];2005年全國(guó)理論計(jì)算機(jī)科學(xué)學(xué)術(shù)年會(huì)論文集[C];2005年

相關(guān)重要報(bào)紙文章前10條

1 劉群峰邋李德彪;重視體系結(jié)構(gòu)力的再生[N];中國(guó)國(guó)防報(bào);2007年

2 ;Power Architecture：不斷滿足新興市場(chǎng)需求[N];中國(guó)電子報(bào);2006年

3 ;電聯(lián)關(guān)注面向用戶基于業(yè)務(wù)的體系結(jié)構(gòu)[N];人民郵電;2001年

4 ;EA的新回報(bào)[N];網(wǎng)絡(luò)世界;2005年

5 中科院計(jì)算所孫凝暉;體系結(jié)構(gòu)—?jiǎng)?chuàng)新的步伐不斷[N];計(jì)算機(jī)世界;2003年

6 ;王鋼：發(fā)展通用CPU走自主知識(shí)產(chǎn)權(quán)之路[N];人民政協(xié)報(bào);2004年

7 ;網(wǎng)絡(luò)體系：促進(jìn)電信與IP的融合[N];人民郵電;2000年

8 梁懿嫻;美國(guó)F5公司推出互聯(lián)網(wǎng)控制體系結(jié)構(gòu)[N];國(guó)際商報(bào);2001年

9 安燁;企業(yè)門戶的特點(diǎn)及體系結(jié)構(gòu)[N];網(wǎng)絡(luò)世界;2001年

10 本報(bào)記者李良玉;“雙贏”的戰(zhàn)略決策[N];計(jì)算機(jī)世界;2000年

相關(guān)博士學(xué)位論文前10條

1 朱玄;基于憶阻器的存儲(chǔ)加密體系結(jié)構(gòu)技術(shù)[D];國(guó)防科學(xué)技術(shù)大學(xué);2014年

2 鄒丹;面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣算法研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2013年

3 文梅;流體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2006年

4 李嘉欣;基三體系結(jié)構(gòu)中并行運(yùn)算的關(guān)鍵機(jī)制研究[D];北京理工大學(xué);2010年

5 李長(zhǎng)云;基于體系結(jié)構(gòu)的軟件動(dòng)態(tài)演化研究[D];浙江大學(xué);2005年

6 姜軍;可執(zhí)行體系結(jié)構(gòu)及DoDAF的可執(zhí)行化方法研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2008年

7 伍楠;高效能流體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2008年

8 樊金斗;高性能路由器中存儲(chǔ)體系結(jié)構(gòu)的研究[D];清華大學(xué);2013年

9 高妍妍;ASIP體系結(jié)構(gòu)形式化建模與驗(yàn)證方法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2009年

10 何義;流體系結(jié)構(gòu)指令管理及系統(tǒng)虛擬化仿真技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2010年

相關(guān)碩士學(xué)位論文前10條

1 桂軍;板—墻體系結(jié)構(gòu)抗震性能分析[D];昆明理工大學(xué);2015年

2 祝家意;一種產(chǎn)品線體系結(jié)構(gòu)可變性設(shè)計(jì)方法[D];復(fù)旦大學(xué);2010年

3 徐斌;基于體系結(jié)構(gòu)方法的建模工具擴(kuò)展研究[D];電子科技大學(xué);2010年

4 常武;三層分布式PACS體系結(jié)構(gòu)的研究與實(shí)現(xiàn)[D];北京工業(yè)大學(xué);2001年

5 姚知力;產(chǎn)品生命周期管理系統(tǒng)的體系結(jié)構(gòu)及關(guān)鍵技術(shù)研究[D];西北工業(yè)大學(xué);2005年

6 李軒;基于一體化衛(wèi)星體系結(jié)構(gòu)的星載軟件快速開發(fā)環(huán)境的研究與實(shí)現(xiàn)[D];國(guó)防科學(xué)技術(shù)大學(xué);2010年

7 徐紅梅;基于體系結(jié)構(gòu)的第三方物流信息系統(tǒng)建模研究[D];大連海事大學(xué);2008年

8 張夏;體系結(jié)構(gòu)在線演化技術(shù)及其變更影響分析研究[D];復(fù)旦大學(xué);2009年

9 姜文峰;基于體系結(jié)構(gòu)的復(fù)用技術(shù)研究及其在用電營(yíng)業(yè)系統(tǒng)中的應(yīng)用[D];河海大學(xué);2003年

10 郝永春;基于排隊(duì)理論的軟件體系結(jié)構(gòu)性能研究[D];太原理工大學(xué);2003年

，

本文編號(hào)：2235500

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2235500.html

上一篇：基于預(yù)拷貝的虛擬機(jī)內(nèi)存狀態(tài)動(dòng)態(tài)遷移優(yōu)化
下一篇：實(shí)時(shí)系統(tǒng)內(nèi)存管理方案的設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣算法研究