DSP芯片中的高能效FFT加速器
發(fā)布時間:2018-12-31 11:25
【摘要】:快速傅里葉變換(fast Fourier transform,FFT)是數(shù)字信號處理(digital signal processing,DSP)領(lǐng)域中最耗時的核心算法,該算法的計算性能和計算效率將影響整個應(yīng)用的執(zhí)行效率.因此,在DSP芯片上設(shè)計實現(xiàn)了一個基于矩陣轉(zhuǎn)置操作的高能效可變長度FFT加速器,采用多種并行策略開發(fā)批量小規(guī)模FFT算法與大規(guī)模Cooley-Tukey FFT算法中指令級和任務(wù)級并行.設(shè)計"乒乓"多體數(shù)據(jù)存儲器,重疊數(shù)據(jù)搬移和FFT計算之間的開銷,提高FFT加速器計算效率.并基于此存儲器,提出基于基本塊的快速矩陣轉(zhuǎn)置算法,從而避免對數(shù)據(jù)矩陣的列訪問;提出混合旋轉(zhuǎn)因子產(chǎn)生策略,結(jié)合查表和基于CORDIC算法在線計算方式,最大限度降低旋轉(zhuǎn)因子產(chǎn)生的硬件開銷.實驗結(jié)果表明:FFT加速器原型的峰值能效為146GFLOPs/W,相比Intel Xeon CPU上的多線程FFTW實現(xiàn),取得2個數(shù)量級的能效提升.
[Abstract]:Fast Fourier transform (fast Fourier transform,FFT) is the most time-consuming core algorithm in the field of digital signal processing (digital signal processing,DSP). The performance and efficiency of the algorithm will affect the execution efficiency of the whole application. Therefore, a high energy efficiency variable length FFT accelerator based on matrix transposing is designed and implemented on DSP chip. A batch and small scale FFT algorithm is developed using a variety of parallel strategies, which is parallel to instruction level and task level in large scale Cooley-Tukey FFT algorithm. A ping-pong multi-body data memory is designed to improve the computational efficiency of FFT accelerator by using the overhead between overlapping data transfer and FFT computation. Based on this memory, a fast matrix transpose algorithm based on basic block is proposed to avoid the column access to the data matrix. A hybrid rotation factor generation strategy is proposed to minimize the hardware overhead generated by the rotation factor by combining the search table with the on-line calculation method based on CORDIC algorithm. The experimental results show that the peak energy efficiency of the prototype of FFT accelerator is 146 GFLOPs / W, which is two orders of magnitude higher than that of multithreaded FFTW on Intel Xeon CPU.
【作者單位】: 國防科學(xué)技術(shù)大學(xué)計算機(jī)學(xué)院;
【基金】:國家自然科學(xué)基金項目(61402499,61502508) 湖南省自然科學(xué)基金項目(2015JJ3017)~~
【分類號】:TP332
[Abstract]:Fast Fourier transform (fast Fourier transform,FFT) is the most time-consuming core algorithm in the field of digital signal processing (digital signal processing,DSP). The performance and efficiency of the algorithm will affect the execution efficiency of the whole application. Therefore, a high energy efficiency variable length FFT accelerator based on matrix transposing is designed and implemented on DSP chip. A batch and small scale FFT algorithm is developed using a variety of parallel strategies, which is parallel to instruction level and task level in large scale Cooley-Tukey FFT algorithm. A ping-pong multi-body data memory is designed to improve the computational efficiency of FFT accelerator by using the overhead between overlapping data transfer and FFT computation. Based on this memory, a fast matrix transpose algorithm based on basic block is proposed to avoid the column access to the data matrix. A hybrid rotation factor generation strategy is proposed to minimize the hardware overhead generated by the rotation factor by combining the search table with the on-line calculation method based on CORDIC algorithm. The experimental results show that the peak energy efficiency of the prototype of FFT accelerator is 146 GFLOPs / W, which is two orders of magnitude higher than that of multithreaded FFTW on Intel Xeon CPU.
【作者單位】: 國防科學(xué)技術(shù)大學(xué)計算機(jī)學(xué)院;
【基金】:國家自然科學(xué)基金項目(61402499,61502508) 湖南省自然科學(xué)基金項目(2015JJ3017)~~
【分類號】:TP332
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 朱勇;王秀芳;能昌信;王振,
本文編號:2396502
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2396502.html
最近更新
教材專著