EDGE體系結(jié)構(gòu)指令動態(tài)映射算法研究
[Abstract]:The lumped structure widely existing in scrambled superscalar processors has seriously restricted the performance improvement of microprocessors. Edge (Explicit Data Graph Execution) is one of the models to deal with the bottleneck of microprocessor performance enhancement. The lumped structure with large energy consumption in superscalar is abandoned from the structural model. In a distributed EDGE architecture, instructions are mapped to multiple slices to execute simultaneously. The transmission of operands between slices requires delay, which results in performance degradation. The instruction mapping algorithm tries to eliminate the performance loss caused by fragmentation by carefully weighing the program parallelism and inter-slice communication delay. The TRIPS microprocessor adopts asymmetric distribution of critical resource topology and static reference. Mapping algorithm (SPDI, Static Placement Dynamic Issue). This will lead to a large load imbalance and Operand network communication hot spots on the ET (Execute Tile), thus causing a decrease in IPC. In this paper, a EDGE structure similar to TRIPS is implemented in the M5-EDGE simulator to study the instruction dynamic Deep mapping algorithm. In the absence of compiler scheduling, the Deep algorithm using cyclic mapping is 85% of SPDI and 98.3% of SPDI when the transmission width is 1 and 2, respectively. According to the topological position of RT (Register Tile) and DT (Data-cache Tile), three kinds of optimization of Deep mapping are carried out: according to the order of et numbering, the glyph order of "its" and the sum of calculating the number of leapfrogging steps in the global communication of very block to select ETs first. When the launch width is 1, the average jump steps of the three optimizations are 2.63% and 4.70% less than those of the basic Deep algorithm, respectively, while the IPC increases by 1.07% and 2.11%, respectively. This shows that optimizing the jump number of inter-instruction communication under Deep mapping can significantly increase the number of jump steps. In the Deep mapping algorithm, more than 90% of the operands are transferred by the optograph bypass, which greatly reduces the load of the operands network. When the bypass width is 2 times the transmit width, the local Operand transfer delay is almost reduced to 0. 0. Increasing the local bypass width can effectively reduce the delay of Operand transfer. RT is assigned to et by number, and the IPC of basic Deep mapping algorithm increases by 1.77. For the DT position optimization, the et near DT and the sum of calculated VBS hops are selected first. These two optimizations are 1.17% and 1.89% higher than the basic Deep mapping IPC, respectively. The RT and DT are tiled into the et to form the topological structure of 4x4. When the emission width is 1 and 2, the IPC of Deep map is 97.18% of SPDI and 113.42% of SPDI, respectively. The ratio of ETs was 97.32% and 114.06% respectively. When the topology distance becomes smaller or the Deep mapping algorithm optimizes the number of communication hops, the system IPCs can be improved significantly.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP332;TP301.6
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 裴頌文;吳小東;唐作其;熊乃學(xué);;異構(gòu)千核處理器系統(tǒng)的統(tǒng)一內(nèi)存地址空間訪問方法[J];國防科技大學(xué)學(xué)報(bào);2015年01期
2 楊文頂;覃志東;;基于NoC的眾核處理器可靠性仿真分析研究[J];智能計(jì)算機(jī)與應(yīng)用;2015年02期
3 劉東;張進(jìn)寶;廖小飛;金海;;面向混合內(nèi)存體系結(jié)構(gòu)的模擬器[J];華東師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年05期
4 謝子超;佟冬;黃明凱;;A General Low-Cost Indirect Branch Prediction Using Target Address Pointers[J];Journal of Computer Science and Technology;2014年06期
5 李凌達(dá);陸俊林;程旭;;Retention Benefit Based Intelligent Cache Replacement[J];Journal of Computer Science and Technology;2014年06期
6 李笑天;殷淑娟;何虎;;一種DSP周期精度高效建模方法[J];計(jì)算機(jī)應(yīng)用研究;2015年01期
7 劉雨辰;王佳;陳云霽;焦帥;;計(jì)算機(jī)系統(tǒng)模擬器研究綜述[J];計(jì)算機(jī)研究與發(fā)展;2015年01期
8 黃明凱;劉先華;譚明星;謝子超;程旭;;一種面向解釋器的間接轉(zhuǎn)移預(yù)測技術(shù)[J];計(jì)算機(jī)研究與發(fā)展;2015年01期
9 黃永兵;陳明宇;;移動設(shè)備應(yīng)用程序的體系結(jié)構(gòu)特征分析[J];計(jì)算機(jī)學(xué)報(bào);2015年02期
10 楊群;李笑天;何虎;;面向Superscalar與VLIW混合架構(gòu)處理器的調(diào)試器設(shè)計(jì)[J];計(jì)算機(jī)應(yīng)用與軟件;2015年05期
相關(guān)博士學(xué)位論文 前2條
1 章鐵飛;基于程序訪存模式的存儲系統(tǒng)節(jié)能技術(shù)研究[D];浙江大學(xué);2013年
2 修思文;MPSoC性能估計(jì)技術(shù)研究[D];浙江大學(xué);2015年
相關(guān)碩士學(xué)位論文 前10條
1 王勛;面向非易失存儲器PCM的節(jié)能技術(shù)研究[D];浙江工業(yè)大學(xué);2013年
2 辛愿;面向嵌入式系統(tǒng)的自調(diào)數(shù)據(jù)預(yù)取[D];浙江大學(xué);2013年
3 胡妍;結(jié)合結(jié)構(gòu)級和門級的多核處理器功耗評估方法[D];湖南大學(xué);2013年
4 劉雨辰;基于多維數(shù)組的高速片上網(wǎng)絡(luò)模擬器的設(shè)計(jì)與實(shí)現(xiàn)[D];內(nèi)蒙古大學(xué);2014年
5 單磊;大規(guī)模并行片上系統(tǒng)的分布式并行模擬關(guān)鍵技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2012年
6 佘超杰;基于多核的片上網(wǎng)絡(luò)低延遲與低功耗的研究[D];北京工業(yè)大學(xué);2014年
7 艾天鵬;基于通訊感知的片上網(wǎng)絡(luò)加速機(jī)制研究[D];浙江工業(yè)大學(xué);2014年
8 陸yN;基于計(jì)算模型的體系結(jié)構(gòu)模擬器研究[D];復(fù)旦大學(xué);2013年
9 張浪;面向異構(gòu)集成的NoC路由算法研究[D];武漢理工大學(xué);2014年
10 繆旭陽;復(fù)雜體系結(jié)構(gòu)的計(jì)算特征分類研究[D];武漢理工大學(xué);2014年
本文編號:2141553
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2141553.html