基于自動訪存模式分析的多OpenCL設(shè)備共享存儲設(shè)計
發(fā)布時間:2018-11-04 20:37
【摘要】:OpenCL具有良好的功能移植性,是主從結(jié)構(gòu)異構(gòu)多設(shè)備系統(tǒng)的理想編程模型。然而,要充分利用整個異構(gòu)系統(tǒng)的計算能力,程序員需要顯式的分配各個設(shè)備的負載,控制設(shè)備間的數(shù)據(jù)傳輸?shù)鹊?這些工作無疑增加了程序員的負擔。本文提出了多OpenCL設(shè)備共享存儲(OMSM),通過Runtime對共享存儲的支持使得程序員不需要顯示的控制數(shù)據(jù)傳輸。OMSM主要任務有兩個:一個是任務劃分,一個是存儲管理。這兩個任務能夠自動化的根本原因在于OpenCL編程模型中工作組的獨立性:索引空間中的工作組的獨立性使得劃分任務得以簡化成分配不同數(shù)量的工作組,同時,使得工作組寫數(shù)據(jù)區(qū)域不能重疊,從而使得工作組的訪問區(qū)域較為規(guī)則。訪存分析的自動化是整個系統(tǒng)自動化的關(guān)鍵。本文首先分析了工作組的訪存模式,結(jié)合kernel程序的特點,提出了帶約束的線性的抽象描述來刻畫kernel程序工作組的訪存模式。為了高效的操作抽象描述,我們設(shè)計了求交、歸一化、獨立變量消除、合并和求解操作,并基于LLVM開源的編譯器框架實現(xiàn)了訪存模式的自動分析工具。獲取訪存信息之后,OMSM的Runtime在執(zhí)行時有兩個階段:一個是通過對系統(tǒng)內(nèi)各個設(shè)備Profiling來使得負載均衡,另一個是通過段表來描述數(shù)據(jù)在多個設(shè)備間的分布情況,自動控制數(shù)據(jù)傳輸。實驗結(jié)果表明,OMSM的對于沒有間接訪問的kernel有很高的適用性,同時在同構(gòu)多設(shè)備和異構(gòu)多設(shè)備平臺上都獲得了較高的性能提升。
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP333
,
本文編號:2311067
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP333
,
本文編號:2311067
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2311067.html
最近更新
教材專著