基于現(xiàn)代硬件的并行內(nèi)存排序方法綜述
發(fā)布時間:2018-05-14 23:39
本文選題:現(xiàn)代硬件處理器 + 排序算法。 參考:《計算機學(xué)報》2017年09期
【摘要】:研究了現(xiàn)代硬件上的并行內(nèi)存排序方法,對其研究現(xiàn)狀與進展進行了綜述.首先簡要闡述了經(jīng)典排序算法以及排序網(wǎng)絡(luò)的優(yōu)缺點,分析其并行優(yōu)化的適用性,然后從現(xiàn)代CPU處理器設(shè)備(多核、配備大內(nèi)存)、圖形處理器(GPU)、現(xiàn)場可編程邏輯門陣列(FPGA)等新型處理器設(shè)備介紹現(xiàn)有排序方法的研究成果.處理器設(shè)備的架構(gòu)不同,對排序算法的優(yōu)化策略也不同,現(xiàn)代CPU主要利用線程的本地存儲層次優(yōu)化數(shù)據(jù)在存儲單元中的排列,以減少訪存次數(shù)及減少訪存缺失,同時利用單指令多數(shù)據(jù)流技術(shù)(SIMD),以提高算法的數(shù)據(jù)級并行度;GPU則需要將多個線程組織成線程塊,依靠共享內(nèi)存提高線程塊的訪存速度,而在線程塊內(nèi)則使用單指令多線程(SIMT)技術(shù)提高線程的執(zhí)行效率;FPGA則更靠近于硬件底層,受到自身的資源限制,FPGA的優(yōu)化策略主要依靠硬件描述語言或高級綜合語言優(yōu)化電路的設(shè)計,提高資源利用率的同時增加FPGA的吞吐量.現(xiàn)有的成果表明,GPU的并行內(nèi)存排序性能優(yōu)于CPU端上的并行內(nèi)存排序性能.作者最后對未來的研究方向進行了展望.
[Abstract]:In this paper, parallel memory sorting methods on modern hardware are studied, and their research status and progress are summarized. In this paper, the advantages and disadvantages of classical sorting algorithm and sorting network are briefly described, and the applicability of parallel optimization is analyzed. New processor devices, such as large memory, GPU, FPGA and so on, introduce the research results of existing sorting methods. The architecture of processor device is different, and the optimization strategy of sorting algorithm is also different. Modern CPU mainly uses the local storage layer of thread to optimize the arrangement of data in memory cell, in order to reduce the number of memory access and memory access missing. In order to improve the data level parallelism of the algorithm, GPU needs to organize multiple threads into thread blocks and rely on shared memory to improve the memory access speed of thread blocks. In the thread block, the single instruction multithreading (SIMT) technique is used to improve the execution efficiency of the thread and FPGA is closer to the bottom layer of the hardware. The optimization strategy of FPGA, which is limited by its own resources, mainly depends on the design of hardware description language or advanced synthesis language to improve the resource utilization and increase the throughput of FPGA. The existing results show that the parallel memory sorting performance of GPUs is better than that of parallel memory sorting on CPU. Finally, the author looks forward to the future research direction.
【作者單位】: 中國人民大學(xué)數(shù)據(jù)工程與知識工程國家教育部重點實驗室;中國人民大學(xué)信息學(xué)院;
【基金】:國家自然科學(xué)基金(61532021,61272137,61202114) 華為創(chuàng)新研究計劃(HIRP 20140507)資助~~
【分類號】:TP333
,
本文編號:1890010
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1890010.html
最近更新
教材專著