天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

YHFT-Matrix編譯器全局指令調(diào)度相關(guān)技術(shù)的研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-03-19 10:57

  本文選題:編譯器 切入點(diǎn):全局指令調(diào)度 出處:《國(guó)防科學(xué)技術(shù)大學(xué)》2013年碩士論文 論文類型:學(xué)位論文


【摘要】:Matrix DSP處理器是一款由國(guó)防科學(xué)技術(shù)大學(xué)計(jì)算機(jī)學(xué)院微電子所研發(fā)的有自主知識(shí)產(chǎn)權(quán)的高性能DSP處理。該處理器有較強(qiáng)的數(shù)據(jù)計(jì)算能力,因此可以應(yīng)用于軟基站無線通信、水聲計(jì)算等領(lǐng)域。為了能夠推廣這款處理器,一套正確的、性能優(yōu)越的編譯器系統(tǒng)是必須的。為了使所開發(fā)的Matrix編譯器性能更優(yōu),就必須要做好Matrix編譯器的優(yōu)化措施,特別是針對(duì)于Matrix體系結(jié)構(gòu)的優(yōu)化措施會(huì)更有效。 本文根據(jù)Matrix體系結(jié)構(gòu)的特點(diǎn),提出了提出了幾種適合Matrix編譯器的優(yōu)化措施,有的已經(jīng)在Matrix編譯器中實(shí)現(xiàn)并根據(jù)Matrix體系結(jié)構(gòu)做了相應(yīng)的改進(jìn),在很大程度上提高了Matrix編譯器的優(yōu)化性能。本文主要介紹和實(shí)現(xiàn)的優(yōu)化措施如下: (1)基于選擇調(diào)度的全局指令調(diào)度。Matrix處理器是一款能夠同時(shí)發(fā)射10條指令的VLIW DSP,所以指令級(jí)的并行可以充分挖掘Matrix處理器的性能。全局指令調(diào)度能夠使編譯器更好的實(shí)現(xiàn)指令級(jí)的并行。在基于GCC選擇調(diào)度的基礎(chǔ)上,Matrix編譯器中實(shí)現(xiàn)了正確的選擇調(diào)度算法,并且根據(jù)自身體系結(jié)構(gòu)改進(jìn)后的算法效果更加明顯。 (2)if轉(zhuǎn)換。if轉(zhuǎn)換能夠把控制流圖轉(zhuǎn)換為數(shù)據(jù)流圖,進(jìn)而可以服務(wù)于后續(xù)的優(yōu)化,特別是對(duì)于指令調(diào)度有關(guān)的優(yōu)化。Matrix處理器可以支持全謂詞執(zhí)行的,所以為Matrix編譯器開發(fā)if轉(zhuǎn)換可以更好的利用Matrix體系結(jié)構(gòu)的特點(diǎn)挖掘處理器的性能。在基于GCC if轉(zhuǎn)換實(shí)現(xiàn)的基礎(chǔ)上,Matrix編譯器中實(shí)現(xiàn)了同GCC一樣的幾種if轉(zhuǎn)換情況,,并且根據(jù)特定的應(yīng)用程序添加了一些新的能夠if轉(zhuǎn)換的情況。通過添加if轉(zhuǎn)換之后,Matrix編譯器的性能得到了進(jìn)一步提升,特別是在添加了一些新的能夠if轉(zhuǎn)換的情況之后,一些特定應(yīng)用程序的執(zhí)行效率有很大的提高。 (3)分支延遲調(diào)度。Matrix指令集中所有的分支指令、跳轉(zhuǎn)指令、函數(shù)調(diào)用指令都有四個(gè)延遲槽。如果在程序中不對(duì)這些延遲槽進(jìn)行填充,就會(huì)造成流水線的空轉(zhuǎn),浪費(fèi)了硬件資源。在基于GCC分支延遲調(diào)度實(shí)現(xiàn)的基礎(chǔ)上,Matrix編譯器正確實(shí)現(xiàn)了分支延遲調(diào)度功能,并且根據(jù)Matrix體系結(jié)構(gòu)改進(jìn)后的分支延遲調(diào)度算法,調(diào)度效果更好,延遲槽填充更加充分。
[Abstract]:Matrix DSP processor is a kind of high performance DSP processing developed by Microelectronics Institute of National University of Science and Technology. The processor has strong data computing ability, so it can be used in soft base station wireless communication. In order to popularize this processor, a set of correct and superior performance compiler system is necessary. In order to make the developed Matrix compiler perform better, we must do a good job of optimizing the Matrix compiler. In particular, the optimization measures for Matrix architecture will be more effective. According to the characteristics of Matrix architecture, this paper puts forward several optimization measures suitable for Matrix compilers, some of which have been implemented in Matrix compilers and improved accordingly according to Matrix architecture. The optimization performance of Matrix compiler is improved to a great extent. This article mainly introduces and implements the following optimization measures:. Global instruction scheduling based on selective scheduling. Matrix processor is a VLIW DSP that can transmit 10 instructions simultaneously, so the parallelism of instruction level can fully exploit the performance of Matrix processor. Global instruction scheduling can make the compiler better. On the basis of GCC selection scheduling, the correct selection scheduling algorithm is implemented in the matrix compiler. And the improved algorithm is more effective according to its own architecture. The control flow diagram can be converted into a data flow diagram, which can serve subsequent optimizations, especially for instruction scheduling related optimizations. Matrix processors can support full predicate execution. Therefore, the development of if transformation for Matrix compiler can make better use of the characteristics of Matrix architecture to mine the performance of the processor. On the basis of the implementation of GCC if transformation, several kinds of if conversions are implemented in the Matrix compiler just like GCC. The performance of the Matrix compiler has been further improved by adding if transformations, especially after adding new ones that can be converted if. The execution efficiency of some specific applications has been greatly improved. All branch instructions, jump instructions, and function call instructions in the Matrix instruction set have four delay slots. Based on the implementation of GCC branch delay scheduling, the GCC compiler correctly implements the branch delay scheduling function, and according to the improved branch delay scheduling algorithm of Matrix architecture, the scheduling effect is better. The delay slot is more fully filled.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP314;TP332

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 吳承勇,連瑞琦,張兆慶,喬如良;協(xié)作式全局指令調(diào)度與寄存器分配[J];計(jì)算機(jī)學(xué)報(bào);2000年05期

2 田祖?zhèn)?趙克佳,汪小飛;GCC基于IA-64謂詞執(zhí)行的IF轉(zhuǎn)換技術(shù)研究[J];微電子學(xué)與計(jì)算機(jī);2005年06期



本文編號(hào):1633968

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/falvlunwen/zhishichanquanfa/1633968.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶121a9***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
婷婷激情五月天丁香社区| 午夜精品久久久99热连载| 十八禁日本一区二区三区| 国产午夜福利一区二区| 国产a天堂一区二区专区| 国产精品一区二区不卡中文| 五月激情婷婷丁香六月网| 97精品人妻一区二区三区麻豆| 日本欧美视频在线观看免费| 久久免费精品拍拍一区二区 | 黑人粗大一区二区三区| 91久久精品在这里色伊人| 国产精品视频一区二区秋霞 | 国产精品一区二区成人在线| 午夜福利黄片免费观看| 91福利视频日本免费看看 | 91人妻人人澡人人人人精品| 99久久精品国产日本| 又色又爽又黄的三级视频| 日本欧美一区二区三区在线播| 国产一区二区三区午夜精品| 国产又粗又猛又爽又黄| 空之色水之色在线播放| 国产中文字幕一区二区| 91亚洲人人在字幕国产| 女同伦理国产精品久久久| 精品少妇人妻一区二区三区| 亚洲一区二区三区三区| 91精品日本在线视频| 美女激情免费在线观看| 夫妻激情视频一区二区三区| 国产一区二区不卡在线播放| 尹人大香蕉中文在线播放| 精品女同在线一区二区| 国产又猛又大又长又粗| 日韩夫妻午夜性生活视频| 日韩一区二区三区在线欧洲| 亚洲香艳网久久五月婷婷| 亚洲熟妇av一区二区三区色堂| 午夜传媒视频免费在线观看| 久久这里只有精品中文字幕|