支持浮點融合乘加的SIMD運算部件設計優(yōu)化及實現(xiàn)
發(fā)布時間:2018-02-02 23:52
本文關鍵詞: SIMD部件 融合乘加 地址不對齊 數(shù)據(jù)重組 掩碼 出處:《國防科學技術(shù)大學》2013年碩士論文 論文類型:學位論文
【摘要】:SIMD(Single Instruction Multiple Date,單指令多數(shù)據(jù))是提高數(shù)據(jù)并行處理 能力的重要手段。隨著超大規(guī)模集成電路的發(fā)展,主流微處理器廠商不斷地增加SIMD功能和SIMD的位寬。但SIMD仍然存在諸多性能瓶頸,如地址不對齊、數(shù)據(jù)重組和控制相關的向量化(Control flow)等問題。 論文設計了高性能微處理器中支持浮點融合乘加的SIMD運算部件,以科學計算為背景進行了優(yōu)化,并進行了綜合、驗證以及性能分析。本文的主要研究工作: 1.設計了一個7站流水的雙精度浮點乘加(Fuse Multiple Add,F(xiàn)MA)單元,并組成了基本的SIMD模塊。分析SIMD在各種應用中的性能瓶頸,針對地址不對齊、數(shù)據(jù)重組和控制相關的向量化,提出了一種可配置的SIMD改進結(jié)構(gòu)。 2.對SIMD運算部件進行模擬驗證與綜合分析。驗證結(jié)果表明浮點計算符合IEEE7542008標準,SIMD功能正確。綜合結(jié)果表明可配置的SIMD相對于基本的SIMD,,面積和功耗分別增加了2.04%和0.46%。經(jīng)綜合評估,該SIMD頻率達到2GHz。 3.以向量長度為66的DAPXY(雙精度乘加)和稀疏矩陣計算為例,分析可配置的SIMD的性能提升,結(jié)果表明與基本的SIMD相比,可配置的SIMD獲得了1.17~1.50倍的加速。
[Abstract]:SIMD(Single Instruction Multiple date (single instruction multiple data) is to improve data parallel processing With the development of VLSI, mainstream microprocessor manufacturers increase the SIMD function and the bit width of SIMD continuously. However, there are still many performance bottlenecks in SIMD. Such as address alignment, data reorganization and control related vectorization control flow and other issues. In this paper, we design a SIMD operating unit that supports floating-point fusion multiplication and addition in high-performance microprocessors, and optimize and synthesize it with the background of scientific computing. Verification and performance analysis. 1. A 7-station income double precision floating-point multiplication plus Fuse Multiple add FMA unit is designed. The performance bottleneck of SIMD in various applications is analyzed, aiming at address misalignment, data recombination and control related vectorization. A configurable SIMD structure is proposed. 2. The simulation and comprehensive analysis of the SIMD operation unit show that the floating-point calculation conforms to the IEEE7542008 standard. The results show that the area and power consumption of configurable SIMD are increased by 2.04% and 0.46, respectively. The SIMD frequency is 2 GHz. 3. Taking DAPXY (double precision multiplication plus) and sparse matrix calculation of vector length 66 as examples, the performance improvement of configurable SIMD is analyzed. The results show that the performance of configurable SIMD is higher than that of basic SIMD. The configurable SIMD gains 1.17g 1.50 times acceleration.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP332.2
【共引文獻】
相關碩士學位論文 前4條
1 宋衛(wèi)衛(wèi);考慮公共路徑的時鐘結(jié)構(gòu)重整與優(yōu)化[D];國防科學技術(shù)大學;2013年
2 謝啟華;高性能微處理器中浮點融合乘加部件的設計與實現(xiàn)[D];國防科學技術(shù)大學;2013年
3 劉元龍;基于路徑的OCV分析方法研究與實現(xiàn)[D];國防科學技術(shù)大學;2013年
4 孫秀秀;物理設計中基于復用單元的保持時間時序優(yōu)化方法的研究與實現(xiàn)[D];國防科學技術(shù)大學;2013年
本文編號:1485812
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1485812.html
最近更新
教材專著