自主XDSP中軟件流水循環(huán)緩沖部件的設計與實現(xiàn)
發(fā)布時間:2018-07-22 14:25
【摘要】:DSP算法中存在大量的循環(huán)操作,而開發(fā)循環(huán)體間的指令級并行是提高處理器性能的重要方法之一。循環(huán)體調(diào)度技術包括循環(huán)展開和軟件流水等。本文基于自主X DSP,研究軟件流水技術提高X DSP中循環(huán)程序的執(zhí)行效率,設計并實現(xiàn)了軟件流水循環(huán)緩沖部件。 論文詳細分析了循環(huán)展開和軟件流水技術,基于X DSP的需求和特點,設計了一種基于軟件流水模調(diào)度算法的循環(huán)緩沖。該部件位于流水線的指令派發(fā)棧,,用于存儲和派發(fā)循環(huán)體指令,減少執(zhí)行循環(huán)程序時的訪存次數(shù),從而減少訪存延遲對DSP性能的影響。本文的主要工作如下: 1)在分析了for循環(huán)和while循環(huán)執(zhí)行特點的基礎上設計了循環(huán)緩沖的總體結構,并完成了循環(huán)緩沖控制模塊和存儲派發(fā)模塊的詳細設計。 2)設計了一種循環(huán)指令的跟蹤比較機制,完成了循環(huán)指令的裝載、排空和重載,實現(xiàn)循環(huán)指令的準確存儲和派發(fā)。 3)設計了計數(shù)器比較機制和中斷排空機制,實現(xiàn)了循環(huán)程序的精確中斷。 4)研究了模擬驗證方法,構建了循環(huán)緩沖的模擬驗證平臺,對循環(huán)緩沖進行了全面的系統(tǒng)級驗證。 5)利用一個矩陣乘加程序和三個典型的DSP圖像算法等典型程序評測了循環(huán)緩沖的性能,通過實際的模擬測試,循環(huán)緩沖在上述程序中的使用率分別達到了95.34%、90.61%、88.85%和89.94%,大大減少了指令訪存頻率,降低了訪存功耗。 6)基于45nm工藝,完成了循環(huán)緩沖的邏輯綜合。該部件工作頻率可達1GHz,面積為76778.69平方微米,動態(tài)功耗為28.99mW,靜態(tài)功耗為1.83mW。 該循環(huán)緩沖可以存儲112條32位的循環(huán)體指令,在循環(huán)專用指令的控制下完成循環(huán)體指令的存儲和派發(fā)。顯著提高了循環(huán)程序的執(zhí)行效率。
[Abstract]:There are a lot of cyclic operations in the DSP algorithm, and the development of instruction level parallelism between the cycle bodies is one of the most important ways to improve the performance of the processor. The cycle body scheduling technology includes cyclic expansion and software pipelining. Based on the autonomous X DSP, this paper studies the software pipelining technology to improve the execution efficiency of the circular program in the X DSP, and designs and implements the software. Running water cycle buffer.
In this paper, cyclic deployment and software pipelining are analyzed in detail. Based on the requirements and characteristics of X DSP, a cyclic buffer based on software pipelining scheduling algorithm is designed. The component is located in the pipelined instruction stack, which is used to store and distribute circulant instructions, reduce the number of memory visits in the execution of the circulant program, and reduce the memory delay. The effect of DSP performance. The main work of this article is as follows:
1) on the basis of the analysis of the characteristics of the for cycle and the while cycle, the overall structure of the cyclic buffer is designed, and the detailed design of the cycle buffer control module and the storage dispatch module is completed.
2) design a tracking and comparing mechanism for cyclic instructions, which completes the loading of cyclic instructions, emptying and overloading, and achieving accurate storage and distribution of circular instructions.
3) the counter comparison mechanism and interrupt emptying mechanism were designed to achieve the precise interruption of the cyclic procedure.
4) we studied the simulation verification method, built the simulation platform for cyclic buffering, and carried out a comprehensive system level verification of cyclic buffers.
5) using a matrix multiplier program and three typical DSP image algorithms, the performance of cyclic buffer is evaluated. Through the actual simulation test, the utilization rate of cyclic buffer in the above programs is 95.34%, 90.61%, 88.85% and 89.94% respectively, which greatly reduces the frequency of instruction memory and reduces the memory loss.
6) based on the 45nm process, the logic synthesis of cyclic buffer is completed. The working frequency of the component is up to 1GHz, the area is 76778.69 square microns, the dynamic power is 28.99mW, and the static power is 1.83mW.
The cyclic buffer can store 112 32 bit cyclic instructions and complete the storage and distribution of the circulation instruction under the control of the special instruction. The efficiency of the cycle program is greatly improved.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP332
本文編號:2137754
[Abstract]:There are a lot of cyclic operations in the DSP algorithm, and the development of instruction level parallelism between the cycle bodies is one of the most important ways to improve the performance of the processor. The cycle body scheduling technology includes cyclic expansion and software pipelining. Based on the autonomous X DSP, this paper studies the software pipelining technology to improve the execution efficiency of the circular program in the X DSP, and designs and implements the software. Running water cycle buffer.
In this paper, cyclic deployment and software pipelining are analyzed in detail. Based on the requirements and characteristics of X DSP, a cyclic buffer based on software pipelining scheduling algorithm is designed. The component is located in the pipelined instruction stack, which is used to store and distribute circulant instructions, reduce the number of memory visits in the execution of the circulant program, and reduce the memory delay. The effect of DSP performance. The main work of this article is as follows:
1) on the basis of the analysis of the characteristics of the for cycle and the while cycle, the overall structure of the cyclic buffer is designed, and the detailed design of the cycle buffer control module and the storage dispatch module is completed.
2) design a tracking and comparing mechanism for cyclic instructions, which completes the loading of cyclic instructions, emptying and overloading, and achieving accurate storage and distribution of circular instructions.
3) the counter comparison mechanism and interrupt emptying mechanism were designed to achieve the precise interruption of the cyclic procedure.
4) we studied the simulation verification method, built the simulation platform for cyclic buffering, and carried out a comprehensive system level verification of cyclic buffers.
5) using a matrix multiplier program and three typical DSP image algorithms, the performance of cyclic buffer is evaluated. Through the actual simulation test, the utilization rate of cyclic buffer in the above programs is 95.34%, 90.61%, 88.85% and 89.94% respectively, which greatly reduces the frequency of instruction memory and reduces the memory loss.
6) based on the 45nm process, the logic synthesis of cyclic buffer is completed. The working frequency of the component is up to 1GHz, the area is 76778.69 square microns, the dynamic power is 28.99mW, and the static power is 1.83mW.
The cyclic buffer can store 112 32 bit cyclic instructions and complete the storage and distribution of the circulation instruction under the control of the special instruction. The efficiency of the cycle program is greatly improved.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP332
【參考文獻】
相關期刊論文 前5條
1 魏曉云,陳杰,曾云;DSP技術的最新發(fā)展及其應用現(xiàn)狀[J];半導體技術;2003年09期
2 李文龍,劉利,湯志忠;軟件流水中的循環(huán)展開優(yōu)化[J];北京航空航天大學學報;2004年11期
3 廖繼榮,董海濤;利用循環(huán)展開最大化軟件流水線性能(英文)[J];純粹數(shù)學與應用數(shù)學;2004年03期
4 張虎堂;;數(shù)字信號處理器(DSP)的發(fā)展趨勢與應用研究[J];硅谷;2010年23期
5 王大鳴,丁志強,黃慧群;新一代SHARC結構的DSP-ADSP21160[J];微處理機;1999年04期
本文編號:2137754
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2137754.html
最近更新
教材專著