天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向多核向量處理器的FFT算法設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-01-01 06:09

  本文關(guān)鍵詞:面向多核向量處理器的FFT算法設(shè)計(jì)與實(shí)現(xiàn) 出處:《國(guó)防科學(xué)技術(shù)大學(xué)》2014年碩士論文 論文類型:學(xué)位論文


  更多相關(guān)文章: FFT 融合乘加 多核處理器 向量化 軟件流水


【摘要】:FFT算法作為數(shù)字信號(hào)處理的主要工具,在高性能計(jì)算領(lǐng)域中扮演著重要的角色,是衡量處理器性能的重要指標(biāo)。針對(duì)多核向量X-DSP的體系結(jié)構(gòu)的特點(diǎn),研究高效的FFT向量化設(shè)計(jì)與實(shí)現(xiàn)方法具有重要的理論意義和應(yīng)用價(jià)值。本文深入分析了FFT算法的特性,成功設(shè)計(jì)并實(shí)現(xiàn)了基2 FFT、基4 FFT、大點(diǎn)數(shù)FFT和混合基FFT算法程序。本文主要研究工作包括以下幾個(gè)方面:(1)設(shè)計(jì)和實(shí)現(xiàn)了基于X-DSP的基2 FFT算法程序。針對(duì)具有融合乘加的體系結(jié)構(gòu)特點(diǎn),分析及優(yōu)化了DIT和DIF基2 FFT的蝶形單元,充分利用了融合乘加指令,提高了FLOPS吞吐率;同時(shí)將混洗請(qǐng)求與訪存請(qǐng)求相結(jié)合,且利用軟件流水的方法進(jìn)行優(yōu)化,提升了程序的執(zhí)行效率。實(shí)驗(yàn)結(jié)果表明:相比CUFFT庫(kù)的性能,單精度和雙精度基2 FFT的平均性能分別提高3.12倍和22.97倍;相比FFTW庫(kù)的性能,單精度和雙精度基2 FFT的平均性能分別提高3.52倍和25.29倍。(2)設(shè)計(jì)和實(shí)現(xiàn)了基于X-DSP的基4 FFT算法程序。充分利用融合乘加指令優(yōu)化了DIT和DIF基4 FFT的蝶形單元,同時(shí)將混洗請(qǐng)求與訪存請(qǐng)求相結(jié)合。實(shí)驗(yàn)結(jié)果表明:相比基2 FFT,DIT基4相比DIT基2的性能提升了11.46%-21.34%;DIF基4 FFT相比DIF基2 FFT的平均性能提升了9.1%。(3)設(shè)計(jì)和實(shí)現(xiàn)了基于X-DSP的大點(diǎn)數(shù)FFT算法程序。詳細(xì)分析了大點(diǎn)數(shù)FFT算法MFA和迭代FFT,設(shè)計(jì)并優(yōu)化了基于DMA雙緩沖的單核程序;提出了一種壓縮存儲(chǔ)系數(shù)因子的方法節(jié)省存儲(chǔ)空間,將并行的MFA分塊算法映射到多個(gè)核中,優(yōu)化了多核間的負(fù)載平衡,從而高效地實(shí)現(xiàn)了多核并行的大點(diǎn)數(shù)FFT算法,平均加速比達(dá)到6.43,取得了較高的性能加速比。(4)分析并優(yōu)化了DIT基3和基5 FFT的蝶形單元,設(shè)計(jì)和實(shí)現(xiàn)了基于X-DSP的混合基FFT算法程序。實(shí)驗(yàn)結(jié)果表明,單精度浮點(diǎn)1536點(diǎn)和2400點(diǎn)混合基的計(jì)算時(shí)間分別為0.00247ms和0.00348ms,取得了較高的計(jì)算性能,并且隨著點(diǎn)數(shù)的增加,混合基FFT計(jì)算性能明顯提升。
[Abstract]:The FFT algorithm as the main tool of digital signal processing in high performance computing plays an important role in the field, is an important indicator to measure the performance of processor. According to the characteristics of architecture of multi core vector X-DSP, efficient FFT to research and implement method of quantitative design has important theoretical significance and application value. This paper deeply analyzes the the characteristics of FFT algorithm, the success of the design and implementation of FFT based 2, base 4 FFT, large point FFT and mixed base FFT algorithm. The main research work includes the following aspects: (1) the design and implementation of X-DSP based on FFT algorithm. In view of the 2 architecture features fused multiply add, analysis and the optimization of DIT and DIF based FFT butterfly unit 2, make full use of the fused multiply add instruction, improve the FLOPS throughput; at the same time will shuffle and request access request combination method and the use of software pipelining and optimize And improve the execution efficiency of the program. The experimental results show that the performance of CUFFT base compared to the average performance of single and double precision 2 FFT were increased by 3.12 times and 22.97 times; the performance of FFTW base compared to the average performance of single and double precision 2 FFT were increased by 3.52 times and 25.29 times (. 2) the design and implementation of X-DSP based on 4 FFT algorithm program. Make full use of the butterfly unit fused multiply add instruction optimization DIT and DIF 4 FFT, while the shuffle request and the access request combination. The experimental results show that compared to the base 2 FFT, DIT 4 compared to DIT 2 of base to enhance the 11.46%-21.34%; DIF based 4 FFT compared to the DIF average performance of 2 FFT 9.1%. (3) to enhance the design and implementation of the program for large point FFT algorithm based on X-DSP. A detailed analysis of the large point FFT MFA and FFT iterative algorithm, the design and optimization of the single nuclear program based on DMA double buffering is proposed; pressure Method of shrink storage coefficient factor to save storage space, the parallel MFA block algorithm is mapped to multiple cores, optimization of load balancing between multiple cores, so as to efficiently implement a large point FFT multi-core parallel algorithm, the average speedup ratio reached 6.43, achieved high performance and speed ratio (4). The butterfly unit analysis and optimization of the DIT base 3 and base 5 FFT, the design and implementation of the program of hybrid based FFT algorithm based on X-DSP. The experimental results show that the computation time of 1536 points and 2400 points mixed base single precision floating point were 0.00247ms and 0.00348ms, has high computational performance, and with the increase of the number of FFT, mixed computing performance improved significantly.

【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP332

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 高振斌;王霞;;超長(zhǎng)點(diǎn)數(shù)FFT處理器的旋轉(zhuǎn)因子生成方法[J];電訊技術(shù);2007年06期

2 李新社,易亞星,李忠科;FFT中旋轉(zhuǎn)因子生成算法的研究[J];航空計(jì)算技術(shù);2000年03期

,

本文編號(hào):1363248

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1363248.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶89e3e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
99久久精品国产麻豆| 日韩国产亚洲一区二区三区| 国产亚洲精品香蕉视频播放| 日韩欧美亚洲综合在线| 久久热这里只有精品视频| 一区二区免费视频中文乱码国产| 亚洲国产91精品视频| 欧美午夜视频免费观看| 国产精品欧美激情在线播放| 东北老熟妇全程露脸被内射| 又色又爽又黄的三级视频| 色婷婷人妻av毛片一区二区三区| 99热在线播放免费观看| 午夜精品福利视频观看| 尹人大香蕉一级片免费看| 五月的丁香婷婷综合网| 亚洲国产另类久久精品| 欧美一级黄片欧美精品| 欧美在线视频一区观看| 国产原创中文av在线播放| 五月综合婷婷在线伊人| 欧美久久一区二区精品| 99热在线播放免费观看| 欧美区一区二区在线观看| 国产成人精品视频一区二区三区| 国产传媒精品视频一区| 免费黄片视频美女一区| 国产大屁股喷水在线观看视频 | 日韩欧美国产精品中文字幕| 久久精品亚洲精品一区| 精品综合欧美一区二区三区| 色婷婷在线视频免费播放| 91老熟妇嗷嗷叫太91| 正在播放玩弄漂亮少妇高潮| 日本人妻的诱惑在线观看| 日韩精品中文在线观看| 在线观看国产午夜福利| 久久精品中文扫妇内射| 视频一区中文字幕日韩| 欧美日韩国产黑人一区| 欧美日韩国产二三四区|