面向FT1000微處理器的STREAM并行計算與優(yōu)化
發(fā)布時間:2018-09-12 13:48
【摘要】:STREAM是微處理器上內(nèi)存性能的基準測試程序,在多核多線程FT1000微處理器上發(fā)揮高性能是具有挑戰(zhàn)性的研究工作;诙嗉塁ache結(jié)構(gòu),優(yōu)化STREAM四個程序的指令流水線,根據(jù)寄存器數(shù),設(shè)計了多級循環(huán)展開方法,根據(jù)指令延遲和Cache行的大小確定數(shù)據(jù)預(yù)取的數(shù)目,使用匯編語言編寫了優(yōu)化子程序;贠penMP并行環(huán)境,設(shè)計了STREAM并行程序,優(yōu)化了局部化數(shù)據(jù)分配方式。數(shù)據(jù)測試結(jié)果表明,優(yōu)化后的STREAM的性能比原始串行程序性能提高了19.2%~64.2%。優(yōu)化后,并行程序的最高訪存性能達到8.5GB/s,對比優(yōu)化前的最高訪存性能最大提高了22.7%。
[Abstract]:STREAM is a benchmark program for memory performance testing on microprocessors. It is a challenging task to perform high performance in multi-core multithreaded FT1000 microprocessors. Based on the multilevel Cache structure, the instruction pipeline of the four STREAM programs is optimized. According to the number of registers, a multistage loop expansion method is designed, and the number of data prefetching is determined according to the instruction delay and the size of the Cache row. The optimized subprogram is written in assembly language. Based on the OpenMP parallel environment, the STREAM parallel program is designed, and the localized data allocation method is optimized. The test results show that the performance of the optimized STREAM is better than that of the original serial program. After optimization, the maximum memory access performance of parallel programs reaches 8.5 GB / s, compared with that before optimization, the maximum memory access performance is improved by 22.7GB / s.
【作者單位】: 國防科學(xué)技術(shù)大學(xué)并行與分布處理重點實驗室;
【基金】:國家863計劃資助項目(2012AA01A301) 國家自然科學(xué)基金資助項目(60970033,91430218)
【分類號】:TP332
本文編號:2239191
[Abstract]:STREAM is a benchmark program for memory performance testing on microprocessors. It is a challenging task to perform high performance in multi-core multithreaded FT1000 microprocessors. Based on the multilevel Cache structure, the instruction pipeline of the four STREAM programs is optimized. According to the number of registers, a multistage loop expansion method is designed, and the number of data prefetching is determined according to the instruction delay and the size of the Cache row. The optimized subprogram is written in assembly language. Based on the OpenMP parallel environment, the STREAM parallel program is designed, and the localized data allocation method is optimized. The test results show that the performance of the optimized STREAM is better than that of the original serial program. After optimization, the maximum memory access performance of parallel programs reaches 8.5 GB / s, compared with that before optimization, the maximum memory access performance is improved by 22.7GB / s.
【作者單位】: 國防科學(xué)技術(shù)大學(xué)并行與分布處理重點實驗室;
【基金】:國家863計劃資助項目(2012AA01A301) 國家自然科學(xué)基金資助項目(60970033,91430218)
【分類號】:TP332
【相似文獻】
相關(guān)期刊論文 前10條
1 沈佩瑤;Jack;;享受·感動——本田時韻Stream音響改裝[J];音響改裝技術(shù);2010年05期
2 ;[J];;年期
3 ;[J];;年期
4 ;[J];;年期
5 ;[J];;年期
6 ;[J];;年期
7 ;[J];;年期
8 ;[J];;年期
9 ;[J];;年期
10 ;[J];;年期
相關(guān)重要報紙文章 前1條
1 劉秀明;柯達召開Stream概念型印刷機媒體見面會[N];中國包裝報;2008年
,本文編號:2239191
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2239191.html
最近更新
教材專著