EDGE體系結(jié)構(gòu)指令靜態(tài)映射算法研究
發(fā)布時(shí)間:2018-04-05 07:11
本文選題:EDGE 切入點(diǎn):靜態(tài)映射 出處:《哈爾濱工業(yè)大學(xué)》2012年碩士論文
【摘要】:隨著現(xiàn)代半導(dǎo)體工業(yè)的發(fā)展,芯片的集成度不斷提高,處理器設(shè)計(jì)朝著分片式的方向發(fā)展。對(duì)處理器性能的急切需求使充分挖掘程序的指令級(jí)并行(ILP)成為一種趨勢(shì)。在這種背景下,出現(xiàn)了顯式數(shù)據(jù)流執(zhí)行模型,被業(yè)界稱(chēng)為EDGE(Explicit Data Graph Execution)體系結(jié)構(gòu)。EDGE體系結(jié)構(gòu)有塊原子執(zhí)行、靜態(tài)放置動(dòng)態(tài)發(fā)射的特點(diǎn)。分片式的結(jié)構(gòu)需要有將指令映射到硬件上的機(jī)制,如何設(shè)計(jì)這個(gè)映射方法使性能達(dá)到最優(yōu)對(duì)于EDGE體系結(jié)構(gòu)有非常重大的意義。 本文總結(jié)了現(xiàn)有映射算法的優(yōu)缺點(diǎn)并分析了對(duì)于性能有影響的各個(gè)因素,并且根據(jù)增加節(jié)點(diǎn)上的旁路來(lái)減少通信延時(shí)的原理提出并實(shí)現(xiàn)了一種相關(guān)優(yōu)先放置算法,即DF(Dependenece First)算法。測(cè)試結(jié)果表明,DF調(diào)度算法比現(xiàn)有的最優(yōu)算法性能最多提升13%,平均提升2%,該方法顯著加快了應(yīng)用程序的執(zhí)行速度。本文還對(duì)DF算法進(jìn)行了改進(jìn),形成了DF2算法。經(jīng)過(guò)分析,DF算法的復(fù)雜度與空間路徑調(diào)度算法(SPS)相同,均為O(i2)。DF算法在不增加算法復(fù)雜度以及硬件開(kāi)銷(xiāo)的情況下,提升了程序的執(zhí)行性能。 本文還將DF算法應(yīng)用于不同的硬件,以探討硬件結(jié)構(gòu)與DF算法之間的關(guān)系,探尋在DF算法下處理器性能的瓶頸。本文分別將DF算法產(chǎn)生的代碼應(yīng)用于2倍旁路帶寬、2倍網(wǎng)絡(luò)帶寬的硬件上。通過(guò)研究發(fā)現(xiàn),,在DF算法中,旁路帶寬對(duì)DF算法的性能有很大的影響。經(jīng)過(guò)分析,本文認(rèn)為硬件旁路帶寬限制了DF算法的性能增長(zhǎng)。并指出,與網(wǎng)絡(luò)帶寬相比,旁路帶寬是影響算法性能的關(guān)鍵因素。使用由DF算法產(chǎn)生的同一個(gè)二進(jìn)制代碼,僅僅通過(guò)將旁路帶寬加倍,本文獲得了額外的10%性能提升。
[Abstract]:With the development of modern semiconductor industry, the integration of chips has been improved.The urgent need for processor performance makes it a trend to fully mine instruction-level parallel ILP programs.In this context, an explicit data stream execution model appears, which is called EDGE(Explicit Data Graph execution) architecture. Edge architecture has the characteristics of block atomic execution, static placement and dynamic emission.Split architecture requires a mechanism to map instructions to hardware. How to design this mapping method to optimize performance is of great significance for EDGE architecture.This paper summarizes the advantages and disadvantages of the existing mapping algorithms and analyzes the factors that affect the performance, and proposes and implements a related priority placement algorithm, DF(Dependenece first algorithm, according to the principle of increasing the bypass on the nodes to reduce the communication delay.The test results show that the performance of DF scheduling algorithm is up to 13% and the average increase is 2% compared with the existing optimal algorithm. This method significantly speeds up the execution speed of the application.In this paper, the DF algorithm is improved to form the DF2 algorithm.It is analyzed that the complexity of DF-algorithm is the same as that of the spatial path scheduling algorithm (SPSs), and the O(i2).DF algorithm improves the performance of the program without increasing the complexity of the algorithm and the hardware overhead.This paper also applies DF algorithm to different hardware to discuss the relationship between hardware structure and DF algorithm and to find out the bottleneck of processor performance under DF algorithm.In this paper, the code generated by DF algorithm is applied to the hardware with 2 times bypass bandwidth and 2 times network bandwidth respectively.It is found that the bypass bandwidth has great influence on the performance of DF algorithm.After analysis, this paper thinks that hardware bypass bandwidth limits the performance growth of DF algorithm.Compared with the network bandwidth, the bypass bandwidth is the key factor to affect the performance of the algorithm.Using the same binary code generated by the DF algorithm, this paper gains an additional 10% performance improvement by doubling the bypass bandwidth.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP301.6;TP332
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 路璐;安虹;王莉;王耀彬;曾斌;;基于加權(quán)路徑的指令調(diào)度算法[J];計(jì)算機(jī)工程與科學(xué);2009年11期
本文編號(hào):1713712
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1713712.html
最近更新
教材專(zhuān)著