YHFT-Matrix處理器BP部件及shuffle單元的設計與實現(xiàn)
發(fā)布時間:2018-01-07 10:20
本文關鍵詞:YHFT-Matrix處理器BP部件及shuffle單元的設計與實現(xiàn) 出處:《國防科學技術大學》2012年碩士論文 論文類型:學位論文
更多相關文章: YHFT-Matrix 移位器 位處理 shuffle SIMD 打包解包 模擬驗證 綜合
【摘要】:數(shù)字信號處理器(Digital Signal Processor,DSP)是一種專門用于數(shù)字信號處理的處理器,在無線通信系統(tǒng)和社會生活的其它領域得到廣泛應用,研制具有我國自主知識產(chǎn)權的DSP芯片不僅具有巨大的經(jīng)濟利益,而且能夠為構建安全的通信設施提供基礎保障。 YHFT-Matrix DSP是國防科技大學自主研發(fā)的一款高性能32位浮點DSP,它采用VLIW技術,一拍可以發(fā)射10條指令。本文在深入研究了目前主流DSP處理器體系結構與指令集系統(tǒng)的基礎上,設計實現(xiàn)了YHFT-Matrix DSP的位處理部件(bit process,BP)和混洗(shuffle)單元。 BP部件是YHFT-Matrix DSP內核三大運算部件之一,主要執(zhí)行移位指令、位處理指令和打包解包指令,該部件采用SIMD技術實現(xiàn),可以充分挖掘程序的數(shù)據(jù)級并行。Shuffle單元用于實現(xiàn)向量運算單元中各個VPE之間的數(shù)據(jù)交換,它采用獨立的SRAM來存放混洗模式,應用程序在執(zhí)行過程中可以與寄存器文件或訪存帶寬等系統(tǒng)的關鍵資源分離,提高了混洗單元的執(zhí)行效率。 本文在設計的各個階段對BP部件及shuffle單元進行了模擬驗證,先后進行了RTL模擬、綜合后模擬及布圖后模擬,并使用Synopsys公司的NC_Verilog工具對設計完成了覆蓋率驗證和反標后模擬,,保證了設計的正確性。對混洗單元進行了性能測評,結果顯示:該混洗單元對應用程序的性能提升了14.3%~27.6%,而額外面積開銷僅為0.6%。 同時,我們在TSMC65nm工藝下采用Synopsys公司的Design Compiler工具分別對BP部件和shuffle單元進行綜合,結果顯示:BP部件的總面積為581856um2,占單核面積的3.7%,關鍵路徑延時為0.8ns;shuffle單元的總面積為352326um2,占僅單核面積的2.2%,關鍵路徑延時為1.59ns,均能滿足YHFT-Matrix DSP預期500MHz的頻率要求。
[Abstract]:The digital signal processor (Digital Signal, Processor, DSP) is a kind of special processor for digital signal processing, has been widely used in other fields of wireless communication system and the social life, the DSP chip developed with independent intellectual property in China has enormous economic benefits, but also can provide the basis for building a secure communications infrastructure.
YHFT-Matrix DSP of National University of Defense Technology is a self-developed high-performance 32 bit floating-point DSP, it uses VLIW technology, a film can issue 10 instructions. Based on the in-depth study of the current mainstream DSP processor architecture and instruction set system, the design and implementation of YHFT-Matrix DSP (bit process, a processing unit and BP) shuffle (shuffle) unit.
BP YHFT-Matrix DSP is one of the core components of three operational components, mainly the implementation of shift instruction, a processing instruction and packing and unpacking instructions, this part uses the SIMD technology, can fully exploit the data level parallelism of a program for the realization of.Shuffle unit between the VPE vector arithmetic unit in the data exchange, it adopts independent SRAM to store shuffle model, applications can be separated from the key resource register file or memory bandwidth of the system in the implementation process, improve the efficiency of the shuffle unit.
In every stage of the design of BP components and shuffle unit simulation, has carried out the RTL simulation, comprehensive simulation and post layout simulation, and use the Synopsys NC_Verilog tools to complete the coverage verification and Simulation of anti standard design, to ensure the correctness of the design of shuffle unit. The performance evaluation, the results show that the shuffle unit for the application to improve the performance of 14.3%~27.6%, while the cost of extra area is only 0.6%.
At the same time, we use Synopsys Design Compiler tool in the TSMC65nm process respectively for BP components and the shuffle unit are integrated, the results showed that the total area of BP parts for 581856um2, accounting for 3.7% of the area of single nucleus, the delay of the critical path is 0.8ns; the total surface of the shuffle unit area is 352326um2, accounted for only 2.2% of the area of single nuclear the critical path delay, 1.59ns, YHFT-Matrix DSP 500MHz can meet the expected frequency requirements.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP332
【參考文獻】
相關期刊論文 前1條
1 萬江華;劉勝;周鋒;王耀華;陳書明;;具有高效混洗模式存儲器的可編程混洗單元[J];國防科技大學學報;2011年06期
本文編號:1392139
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1392139.html
最近更新
教材專著