天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于現(xiàn)代硬件的并行內(nèi)存排序方法綜述

發(fā)布時(shí)間:2018-05-14 23:39

  本文選題:現(xiàn)代硬件處理器 + 排序算法 ; 參考:《計(jì)算機(jī)學(xué)報(bào)》2017年09期


【摘要】:研究了現(xiàn)代硬件上的并行內(nèi)存排序方法,對(duì)其研究現(xiàn)狀與進(jìn)展進(jìn)行了綜述.首先簡(jiǎn)要闡述了經(jīng)典排序算法以及排序網(wǎng)絡(luò)的優(yōu)缺點(diǎn),分析其并行優(yōu)化的適用性,然后從現(xiàn)代CPU處理器設(shè)備(多核、配備大內(nèi)存)、圖形處理器(GPU)、現(xiàn)場(chǎng)可編程邏輯門(mén)陣列(FPGA)等新型處理器設(shè)備介紹現(xiàn)有排序方法的研究成果.處理器設(shè)備的架構(gòu)不同,對(duì)排序算法的優(yōu)化策略也不同,現(xiàn)代CPU主要利用線程的本地存儲(chǔ)層次優(yōu)化數(shù)據(jù)在存儲(chǔ)單元中的排列,以減少訪存次數(shù)及減少訪存缺失,同時(shí)利用單指令多數(shù)據(jù)流技術(shù)(SIMD),以提高算法的數(shù)據(jù)級(jí)并行度;GPU則需要將多個(gè)線程組織成線程塊,依靠共享內(nèi)存提高線程塊的訪存速度,而在線程塊內(nèi)則使用單指令多線程(SIMT)技術(shù)提高線程的執(zhí)行效率;FPGA則更靠近于硬件底層,受到自身的資源限制,FPGA的優(yōu)化策略主要依靠硬件描述語(yǔ)言或高級(jí)綜合語(yǔ)言優(yōu)化電路的設(shè)計(jì),提高資源利用率的同時(shí)增加FPGA的吞吐量.現(xiàn)有的成果表明,GPU的并行內(nèi)存排序性能優(yōu)于CPU端上的并行內(nèi)存排序性能.作者最后對(duì)未來(lái)的研究方向進(jìn)行了展望.
[Abstract]:In this paper, parallel memory sorting methods on modern hardware are studied, and their research status and progress are summarized. In this paper, the advantages and disadvantages of classical sorting algorithm and sorting network are briefly described, and the applicability of parallel optimization is analyzed. New processor devices, such as large memory, GPU, FPGA and so on, introduce the research results of existing sorting methods. The architecture of processor device is different, and the optimization strategy of sorting algorithm is also different. Modern CPU mainly uses the local storage layer of thread to optimize the arrangement of data in memory cell, in order to reduce the number of memory access and memory access missing. In order to improve the data level parallelism of the algorithm, GPU needs to organize multiple threads into thread blocks and rely on shared memory to improve the memory access speed of thread blocks. In the thread block, the single instruction multithreading (SIMT) technique is used to improve the execution efficiency of the thread and FPGA is closer to the bottom layer of the hardware. The optimization strategy of FPGA, which is limited by its own resources, mainly depends on the design of hardware description language or advanced synthesis language to improve the resource utilization and increase the throughput of FPGA. The existing results show that the parallel memory sorting performance of GPUs is better than that of parallel memory sorting on CPU. Finally, the author looks forward to the future research direction.
【作者單位】: 中國(guó)人民大學(xué)數(shù)據(jù)工程與知識(shí)工程國(guó)家教育部重點(diǎn)實(shí)驗(yàn)室;中國(guó)人民大學(xué)信息學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(61532021,61272137,61202114) 華為創(chuàng)新研究計(jì)劃(HIRP 20140507)資助~~
【分類號(hào)】:TP333
,

本文編號(hào):1890010

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1890010.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b539d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com