基于GCC的Matrix2 DSP編譯優(yōu)化關(guān)鍵技術(shù)研究與實現(xiàn)
發(fā)布時間:2018-08-18 13:58
【摘要】:Matrix2 DSP處理器是由國防科學(xué)技術(shù)大學(xué)計算機學(xué)院微電子所設(shè)計的擁有自主知識產(chǎn)權(quán)的高性能64位浮點數(shù)字信號處理器,具有強大的數(shù)據(jù)運算能力、高運行速度以及強大的并行處理能力,主要應(yīng)用于天氣預(yù)報、圖形圖像處理等數(shù)字信號處理領(lǐng)域。為了支持基于Matrix2 DSP處理器的高級語言應(yīng)用程序開發(fā),課題組基于開源編譯器GCC-4.7.0開發(fā)了Matrix2 DSP編譯器。Matrix2 DSP處理器采用的是VLIW體系結(jié)構(gòu),其計算能力的發(fā)揮在很大程度上取決于編譯器優(yōu)化的性能。論文結(jié)合Matrix2 DSP處理器的體系結(jié)構(gòu)特征和指令集特點,主要在候選功能單元分配、分支延遲槽調(diào)度以及不規(guī)則指令映射三個方面對Matrix2編譯器的編譯性能進行了優(yōu)化改進,使得Matrix2 DSP編譯器的編譯性能有較大提高。本文的主要研究內(nèi)容和貢獻如下:設(shè)計和實現(xiàn)了Matrix2 DSP編譯器候選功能單元分配算法。Matrix2 DSP處理器硬件不支持功能單元的分配,而是要求編譯器能夠從候選功能單元中為指令分配合適的執(zhí)行單元。本文以GCC指令約束匹配機制為基礎(chǔ),提出了以指令字為基本分配單元,綜合考慮當(dāng)前指令候選功能單元和空閑資源情況的分配方案,并在Matrix2 DSP編譯器中予以實現(xiàn)。候選功能單元分配算法的實現(xiàn)彌補了GCC的不足,有助于編譯器更好挖掘指令級并行,提高了Matrix2 DSP處理器的硬件利用率和程序執(zhí)行性能。設(shè)計和實現(xiàn)了Matrix2 DSP編譯器分支延遲槽調(diào)度優(yōu)化算法。Matrix2 DSP指令集中的條件分支指令、無條件分支指令、函數(shù)調(diào)用指令以及函數(shù)調(diào)用返回指令均有六個延遲槽,因此實現(xiàn)延遲槽的最大化填充對提升處理器性能有非常重要的意義。論文基于GCC的分支延遲槽調(diào)度,提出了以修改候選填充指令搜索區(qū)域、放寬延遲槽填充指令限制、添加調(diào)度實現(xiàn)函數(shù)為主要內(nèi)容的分支延遲槽調(diào)度優(yōu)化算法,并在Matrix2 DSP編譯器中予以實現(xiàn)。分支延遲槽調(diào)度優(yōu)化算法的實現(xiàn)提高了分支指令延遲槽的填充率,有效降低了因分支引起的延遲開銷。設(shè)計和實現(xiàn)了Matrix2 DSP編譯器對不規(guī)則指令映射的支持。Matrix2 DSP指令集中存在大量操作數(shù)類型不規(guī)整的不規(guī)則指令,現(xiàn)有GCC不支持不規(guī)則指令的映射。論文以GCC指令映射機制為基礎(chǔ),結(jié)合不規(guī)則指令的特征,修改了C標(biāo)準(zhǔn)算術(shù)運算類型一致性檢測與轉(zhuǎn)換規(guī)則,添加了RTL指令擴展器對不規(guī)則指令映射的支持,實現(xiàn)了Matrix2 DSP編譯器對不規(guī)則指令正確、高效的映射。
[Abstract]:Matrix2 DSP processor is a high performance 64-bit floating-point digital signal processor with independent intellectual property, which is designed by Microelectronics, College of computer Science and Technology University of National Defense. High speed and powerful parallel processing ability, mainly used in weather forecast, graphics and image processing and other digital signal processing fields. In order to support the development of high-level language application based on Matrix2 DSP processor, we developed Matrix2 DSP compiler. Matrix2 DSP processor based on open source compiler GCC-4.7.0. The exertion of its computing power depends to a great extent on the performance of compiler optimization. Based on the architecture characteristics of Matrix2 DSP processor and the characteristics of instruction set, this paper optimizes the compilation performance of Matrix2 compiler in three aspects: candidate function unit allocation, branch delay slot scheduling and irregular instruction mapping. The compilation performance of Matrix2 DSP compiler is greatly improved. The main contents and contributions of this paper are as follows: design and implement the candidate function unit allocation algorithm of Matrix2 DSP compiler. Matrix2 DSP processor hardware does not support the allocation of functional units. Instead, the compiler is required to assign the appropriate execution unit to the instruction from the candidate functional unit. Based on the GCC instruction constraint matching mechanism, this paper proposes an assignment scheme which takes instruction word as the basic allocation unit and synthetically considers the current instruction candidate function unit and free resources, and it is implemented in the Matrix2 DSP compiler. The implementation of candidate functional unit allocation algorithm makes up for the deficiency of GCC, helps the compiler to mine instruction level parallelism better, and improves the hardware utilization and program execution performance of Matrix2 DSP processor. This paper designs and implements the Matrix2 DSP compiler branch delay slot scheduling optimization algorithm. Matrix2 DSP instruction set has six delay slots, including conditional branch instruction, unconditional branch instruction, function call instruction and function call return instruction. Therefore, it is very important to maximize the filling of delay slot to improve processor performance. Based on the branch delay slot scheduling of GCC, a branch delay slot scheduling optimization algorithm is proposed based on modifying candidate fill instruction search area, relaxing the restriction of delay slot filling instruction, adding scheduling implementation function as the main content. And it is implemented in Matrix2 DSP compiler. The implementation of the branch delay slot scheduling optimization algorithm improves the filling rate of the branch instruction delay slot and effectively reduces the delay overhead caused by the branch. The Matrix2 DSP compiler supports irregular instruction mapping. Matrix2 DSP instruction set contains a large number of irregular Operand types. The existing GCC does not support irregular instruction mapping. Based on the GCC instruction mapping mechanism and the characteristics of irregular instructions, this paper modifies the consistency detection and conversion rules of C standard arithmetic operation types, and adds the support of RTL instruction extender to irregular instruction mapping. The Matrix2 DSP compiler can map the irregular instructions correctly and efficiently.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP314
,
本文編號:2189683
[Abstract]:Matrix2 DSP processor is a high performance 64-bit floating-point digital signal processor with independent intellectual property, which is designed by Microelectronics, College of computer Science and Technology University of National Defense. High speed and powerful parallel processing ability, mainly used in weather forecast, graphics and image processing and other digital signal processing fields. In order to support the development of high-level language application based on Matrix2 DSP processor, we developed Matrix2 DSP compiler. Matrix2 DSP processor based on open source compiler GCC-4.7.0. The exertion of its computing power depends to a great extent on the performance of compiler optimization. Based on the architecture characteristics of Matrix2 DSP processor and the characteristics of instruction set, this paper optimizes the compilation performance of Matrix2 compiler in three aspects: candidate function unit allocation, branch delay slot scheduling and irregular instruction mapping. The compilation performance of Matrix2 DSP compiler is greatly improved. The main contents and contributions of this paper are as follows: design and implement the candidate function unit allocation algorithm of Matrix2 DSP compiler. Matrix2 DSP processor hardware does not support the allocation of functional units. Instead, the compiler is required to assign the appropriate execution unit to the instruction from the candidate functional unit. Based on the GCC instruction constraint matching mechanism, this paper proposes an assignment scheme which takes instruction word as the basic allocation unit and synthetically considers the current instruction candidate function unit and free resources, and it is implemented in the Matrix2 DSP compiler. The implementation of candidate functional unit allocation algorithm makes up for the deficiency of GCC, helps the compiler to mine instruction level parallelism better, and improves the hardware utilization and program execution performance of Matrix2 DSP processor. This paper designs and implements the Matrix2 DSP compiler branch delay slot scheduling optimization algorithm. Matrix2 DSP instruction set has six delay slots, including conditional branch instruction, unconditional branch instruction, function call instruction and function call return instruction. Therefore, it is very important to maximize the filling of delay slot to improve processor performance. Based on the branch delay slot scheduling of GCC, a branch delay slot scheduling optimization algorithm is proposed based on modifying candidate fill instruction search area, relaxing the restriction of delay slot filling instruction, adding scheduling implementation function as the main content. And it is implemented in Matrix2 DSP compiler. The implementation of the branch delay slot scheduling optimization algorithm improves the filling rate of the branch instruction delay slot and effectively reduces the delay overhead caused by the branch. The Matrix2 DSP compiler supports irregular instruction mapping. Matrix2 DSP instruction set contains a large number of irregular Operand types. The existing GCC does not support irregular instruction mapping. Based on the GCC instruction mapping mechanism and the characteristics of irregular instructions, this paper modifies the consistency detection and conversion rules of C standard arithmetic operation types, and adds the support of RTL instruction extender to irregular instruction mapping. The Matrix2 DSP compiler can map the irregular instructions correctly and efficiently.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP314
,
本文編號:2189683
本文鏈接:http://sikaile.net/falvlunwen/zhishichanquanfa/2189683.html
最近更新
教材專著