天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于GCC的Matrix2 DSP編譯優(yōu)化關(guān)鍵技術(shù)研究與實現(xiàn)

發(fā)布時間:2018-08-18 13:58
【摘要】:Matrix2 DSP處理器是由國防科學(xué)技術(shù)大學(xué)計算機學(xué)院微電子所設(shè)計的擁有自主知識產(chǎn)權(quán)的高性能64位浮點數(shù)字信號處理器,具有強大的數(shù)據(jù)運算能力、高運行速度以及強大的并行處理能力,主要應(yīng)用于天氣預(yù)報、圖形圖像處理等數(shù)字信號處理領(lǐng)域。為了支持基于Matrix2 DSP處理器的高級語言應(yīng)用程序開發(fā),課題組基于開源編譯器GCC-4.7.0開發(fā)了Matrix2 DSP編譯器。Matrix2 DSP處理器采用的是VLIW體系結(jié)構(gòu),其計算能力的發(fā)揮在很大程度上取決于編譯器優(yōu)化的性能。論文結(jié)合Matrix2 DSP處理器的體系結(jié)構(gòu)特征和指令集特點,主要在候選功能單元分配、分支延遲槽調(diào)度以及不規(guī)則指令映射三個方面對Matrix2編譯器的編譯性能進行了優(yōu)化改進,使得Matrix2 DSP編譯器的編譯性能有較大提高。本文的主要研究內(nèi)容和貢獻如下:設(shè)計和實現(xiàn)了Matrix2 DSP編譯器候選功能單元分配算法。Matrix2 DSP處理器硬件不支持功能單元的分配,而是要求編譯器能夠從候選功能單元中為指令分配合適的執(zhí)行單元。本文以GCC指令約束匹配機制為基礎(chǔ),提出了以指令字為基本分配單元,綜合考慮當(dāng)前指令候選功能單元和空閑資源情況的分配方案,并在Matrix2 DSP編譯器中予以實現(xiàn)。候選功能單元分配算法的實現(xiàn)彌補了GCC的不足,有助于編譯器更好挖掘指令級并行,提高了Matrix2 DSP處理器的硬件利用率和程序執(zhí)行性能。設(shè)計和實現(xiàn)了Matrix2 DSP編譯器分支延遲槽調(diào)度優(yōu)化算法。Matrix2 DSP指令集中的條件分支指令、無條件分支指令、函數(shù)調(diào)用指令以及函數(shù)調(diào)用返回指令均有六個延遲槽,因此實現(xiàn)延遲槽的最大化填充對提升處理器性能有非常重要的意義。論文基于GCC的分支延遲槽調(diào)度,提出了以修改候選填充指令搜索區(qū)域、放寬延遲槽填充指令限制、添加調(diào)度實現(xiàn)函數(shù)為主要內(nèi)容的分支延遲槽調(diào)度優(yōu)化算法,并在Matrix2 DSP編譯器中予以實現(xiàn)。分支延遲槽調(diào)度優(yōu)化算法的實現(xiàn)提高了分支指令延遲槽的填充率,有效降低了因分支引起的延遲開銷。設(shè)計和實現(xiàn)了Matrix2 DSP編譯器對不規(guī)則指令映射的支持。Matrix2 DSP指令集中存在大量操作數(shù)類型不規(guī)整的不規(guī)則指令,現(xiàn)有GCC不支持不規(guī)則指令的映射。論文以GCC指令映射機制為基礎(chǔ),結(jié)合不規(guī)則指令的特征,修改了C標(biāo)準(zhǔn)算術(shù)運算類型一致性檢測與轉(zhuǎn)換規(guī)則,添加了RTL指令擴展器對不規(guī)則指令映射的支持,實現(xiàn)了Matrix2 DSP編譯器對不規(guī)則指令正確、高效的映射。
[Abstract]:Matrix2 DSP processor is a high performance 64-bit floating-point digital signal processor with independent intellectual property, which is designed by Microelectronics, College of computer Science and Technology University of National Defense. High speed and powerful parallel processing ability, mainly used in weather forecast, graphics and image processing and other digital signal processing fields. In order to support the development of high-level language application based on Matrix2 DSP processor, we developed Matrix2 DSP compiler. Matrix2 DSP processor based on open source compiler GCC-4.7.0. The exertion of its computing power depends to a great extent on the performance of compiler optimization. Based on the architecture characteristics of Matrix2 DSP processor and the characteristics of instruction set, this paper optimizes the compilation performance of Matrix2 compiler in three aspects: candidate function unit allocation, branch delay slot scheduling and irregular instruction mapping. The compilation performance of Matrix2 DSP compiler is greatly improved. The main contents and contributions of this paper are as follows: design and implement the candidate function unit allocation algorithm of Matrix2 DSP compiler. Matrix2 DSP processor hardware does not support the allocation of functional units. Instead, the compiler is required to assign the appropriate execution unit to the instruction from the candidate functional unit. Based on the GCC instruction constraint matching mechanism, this paper proposes an assignment scheme which takes instruction word as the basic allocation unit and synthetically considers the current instruction candidate function unit and free resources, and it is implemented in the Matrix2 DSP compiler. The implementation of candidate functional unit allocation algorithm makes up for the deficiency of GCC, helps the compiler to mine instruction level parallelism better, and improves the hardware utilization and program execution performance of Matrix2 DSP processor. This paper designs and implements the Matrix2 DSP compiler branch delay slot scheduling optimization algorithm. Matrix2 DSP instruction set has six delay slots, including conditional branch instruction, unconditional branch instruction, function call instruction and function call return instruction. Therefore, it is very important to maximize the filling of delay slot to improve processor performance. Based on the branch delay slot scheduling of GCC, a branch delay slot scheduling optimization algorithm is proposed based on modifying candidate fill instruction search area, relaxing the restriction of delay slot filling instruction, adding scheduling implementation function as the main content. And it is implemented in Matrix2 DSP compiler. The implementation of the branch delay slot scheduling optimization algorithm improves the filling rate of the branch instruction delay slot and effectively reduces the delay overhead caused by the branch. The Matrix2 DSP compiler supports irregular instruction mapping. Matrix2 DSP instruction set contains a large number of irregular Operand types. The existing GCC does not support irregular instruction mapping. Based on the GCC instruction mapping mechanism and the characteristics of irregular instructions, this paper modifies the consistency detection and conversion rules of C standard arithmetic operation types, and adds the support of RTL instruction extender to irregular instruction mapping. The Matrix2 DSP compiler can map the irregular instructions correctly and efficiently.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP314
,

本文編號:2189683

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/falvlunwen/zhishichanquanfa/2189683.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9fe56***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
东京热加勒比一区二区三区| 国产传媒欧美日韩成人精品| 午夜精品成年人免费视频| 国产精品一区二区三区日韩av| 中文字幕久热精品视频在线 | 东京热一二三区在线免| 日韩不卡一区二区在线| 日本精品中文字幕在线视频| 欧美成人免费夜夜黄啪啪| 欧美六区视频在线观看| 国产精品成人一区二区在线| 尹人大香蕉中文在线播放| 久久精视频免费视频观看| 中文字幕人妻一区二区免费| 亚洲熟女少妇精品一区二区三区| 久久精品欧美一区二区三不卡| 爱在午夜降临前在线观看| 欧美中文日韩一区久久| 少妇熟女精品一区二区三区| 99国产高清不卡视频| 日本一区二区三区黄色| 国产日韩中文视频一区| 99精品人妻少妇一区二区人人妻| 午夜国产成人福利视频| 久久人妻人人澡人人妻| 亚洲精品欧美精品日韩精品| 又黄又色又爽又免费的视频| 午夜免费精品视频在线看| 东京热加勒比一区二区| 激情图日韩精品中文字幕| 沐浴偷拍一区二区视频| 91超精品碰国产在线观看| 久久午夜福利精品日韩| 日本高清视频在线播放| 国产人妻精品区一区二区三区| 偷拍美女洗澡免费视频| 免费观看潮喷到高潮大叫| 中文字幕亚洲精品在线播放| 亚洲欧美日产综合在线网| 中文字幕精品一区二区年下载| 老司机精品在线你懂的|