天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向多核微體系結(jié)構(gòu)模擬的采樣加速策略研究

發(fā)布時間:2018-03-14 16:40

  本文選題:微體系結(jié)構(gòu)模擬 切入點:多核處理器 出處:《華中科技大學》2016年博士論文 論文類型:學位論文


【摘要】:計算機微體系結(jié)構(gòu)模擬在計算機體系結(jié)構(gòu)設(shè)計過程中扮演了重要角色。無論在工業(yè)界還是學術(shù)界,微體系結(jié)構(gòu)模擬都是設(shè)計中必不可少的技術(shù),因為設(shè)計者們需要利用該技術(shù)去探索廣闊的設(shè)計空間,評估大量的設(shè)計方案,從而接近或達到最優(yōu)設(shè)計。遺憾的是,幾十年來,模擬速度緩慢一直是該技術(shù)的瓶頸,令設(shè)計者們?nèi)珲喸诤怼.斢嬎銠C進入多核/眾核時代,模擬速度緩慢的問題更加突出,其原因大致有二:(1)結(jié)構(gòu)部件更多且設(shè)計更精細的多核系統(tǒng)帶來了更龐大的設(shè)計空間需要探索;(2)為了對多核/眾核系統(tǒng)進行更好的評估驗證與壓力測試,規(guī)模更大且更復(fù)雜的多核多線程基準測試程序需要模擬。所以,多核微體系結(jié)構(gòu)模擬加速的研究具有重要的學術(shù)意義與應(yīng)用價值。采樣模擬是一種普遍流行并被廣泛使用的有效模擬加速策略。該策略通過模擬精心挑選的小部分程序樣本來推斷整個程序在系統(tǒng)中的運行性能,從而大幅縮短模擬評估周期,提高設(shè)計方案驗證速度。目前,面向單核系統(tǒng)的采樣模擬加速技術(shù)已經(jīng)比較成熟。該技術(shù)依據(jù)程序運行過程中的動態(tài)指令數(shù)選取樣本,例如,一個樣本通常被定義為固定數(shù)量的指令。因此,該類技術(shù)叫作基于指令的采樣模擬加速技術(shù)(Instruction-Based Sampling, IBS)。然而,當面向多核系統(tǒng)的模擬時,IBS技術(shù)效果不佳甚至會導致錯誤的評估,原因是多核多線程測試程序在運行過程中線程之間的同步交互會造成其運行時動態(tài)指令的數(shù)量具有不確定性,導致IBS技術(shù)失去其應(yīng)用的基本原則。所以,一種基于程序執(zhí)行時間的采樣模擬加速技術(shù)(Time-Based Sampling, TBS)應(yīng)運而生。不同于IBS, TBS技術(shù)通過選取固定長度的執(zhí)行時間作為樣本進行采樣模擬,可以更好地完成多核系統(tǒng)運行多線程測試程序的性能評估。然而,相比于傳統(tǒng)的IBS, TBS技術(shù)遠未成熟,面臨樣本精確選擇困難,單一采樣策略效果不佳,功能預(yù)熱代價較大等諸多具有挑戰(zhàn)性的問題。針對這些問題,對面向多核微體系結(jié)構(gòu)模擬的TBS技術(shù)展開深入的研究。首先,針對TBS技術(shù)的樣本精確選擇困難問題,提出利用多線程基準測試程序的分形行為來指導樣本選擇的采樣策略PCantorSim。PCantorSim規(guī)避傳統(tǒng)樣本選取策略中的復(fù)雜預(yù)處理過程,提升了采樣效率并具有廣泛適用性。具體來說,PCantorSim發(fā)現(xiàn)多線程基準測試程序在執(zhí)行過程中除了具有階段性的周期行為之外還存在自相似性的分形行為,即程序的運行時行為特征在不同的時間尺度下的觀察結(jié)果具有自相似性。基于這個發(fā)現(xiàn),提出的PCantorSim采樣策略可以快速精準地選取具有代表性的樣本片段,大幅縮短采樣模擬時間。在對PCantorSim的測試評估中,將多核基準測試程序集PARSEC中的程序運行在模擬的8核系統(tǒng)上,相比于未采樣的全詳細模擬,PCantorSim采樣模擬的模擬速度提高了20倍,且測試程序的平均執(zhí)行時間預(yù)測誤差僅為5.3%。其次,針對單一采樣策略難以充分發(fā)揮TBS的技術(shù)優(yōu)勢問題,提出基于分段-分形的多層采樣策略THS (Two-level Hybrid Sampling). THS通過對TBS技術(shù)中多個單一采樣策略的詳細分析對比揭露了一系列之前尚未發(fā)現(xiàn)的現(xiàn)象。例如,(1)相比于預(yù)測詳細模擬階段的IPC (Instructions Per Cycle),準確預(yù)測快速模擬階段的IPC更為重要;(2)快速模擬階段的IPC預(yù)測準確性由樣本選取策略以及快速模擬IPC預(yù)測算法共同決定;(3)當選取的樣本片段長度較小時,基于分形的采樣策略(Cantor Sampling)更準確,而當選取的樣本片段長度較大時,基于分段的周期性采樣策略(Periodic Sampling)更準確:(4)隨機采樣策略(Random Sampling)不適合應(yīng)用到TBS技術(shù)中;谶@些發(fā)現(xiàn),THS精心設(shè)計了基于分段-分形的多層采樣策略,可以利用不同單一采樣策略的優(yōu)點并規(guī)避它們各自的缺點,從而更好地發(fā)揮TBS技術(shù)的性能評估準確性和模擬速度加速比優(yōu)勢。實驗評測結(jié)果表明,THS的程序平均執(zhí)行時間預(yù)測誤差為4%,模擬速度加速比為40倍。對THS進一步地評估表明,它還有較高的跨微體系結(jié)構(gòu)評估準確性,可以有效指導多核微體系結(jié)構(gòu)設(shè)計方案的選擇。最后,針對TBS技術(shù)中功能預(yù)熱代價大的問題,提出實時在線的功能預(yù)熱加速機制SOL (Shorter On-Line Warmup)。SOL機制采用兩階段預(yù)熱設(shè)計,首先第一階段的Prime策略選取適當長度的功能預(yù)熱模擬片段,然后在第一階段選取的預(yù)熱片段內(nèi)再實施經(jīng)過擴展優(yōu)化的NSL (No-State-Loss)預(yù)熱策略,從而減少功能預(yù)熱代價且保持較好預(yù)熱效果。通過對SOL參數(shù)的探索調(diào)優(yōu),確定合理的功能預(yù)熱參數(shù)組合,達到性能評估準確度以及模擬速度加速比的有效均衡。實驗結(jié)果表明,SOL機制具有廣泛適用性,可以集成到現(xiàn)有的多個TBS策略中,快速預(yù)熱采樣模擬中的功能部件,并在保持模擬精度的前提下提高模擬速度加速比。
[Abstract]:Computer microarchitecture simulation in computer architecture design process plays an important role in both industry and academia, microarchitecture simulation design is essential, because the designers need to use this technique to explore the design of broad space, evaluation of design plans of which close to or reach the optimal design unfortunately, for decades, slow simulation speed has been the bottleneck of the technology, the designers of lump in my throat. When the computer into multi-core / many core era, the problem of slow simulation speed is more prominent, the reasons are: (1) two parts more and more precise design of multi-core systems bring the larger design space needs to be explored; (2) in order to evaluate better the multi-core and many core system verification and pressure test, multithreaded benchmarks to larger and more complex The program needs to be simulated. Therefore, multi core microarchitecture simulation research has an important academic significance and applied value. The sampling simulation is a popular and effective simulation is widely used to accelerate the strategy. The operation performance of a part of the program sample carefully selected the strategy adopted by the simulation to push off the whole process in the system. Thus greatly shorten the simulation cycle, improve the design verification speed. At present, the single core system sampling simulation technology has been mature. The technology based on the number of dynamic instructions in a program sample, for example, a sample is usually defined as a fixed number of instructions. Therefore, instruction sampling simulation acceleration technology based on this kind of technology is called (Instruction-Based Sampling, IBS). However, when the simulation for multi core system, IBS technology is ineffective or even lead to wrong rating The reason is estimated, synchronous interaction between multi-core and multi thread thread testing procedures in the operation process will cause the operation number of dynamic instructions are uncertain, resulting in the basic principles of IBS technology lost its application. Therefore, a program execution time based on the sampling simulation acceleration technology (Time-Based Sampling TBS) came into being. Unlike IBS, TBS by selecting the fixed length of the execution time as a sample for sampling simulation, performance evaluation can better accomplish the multi-core system running multi-threaded test program. However, compared with the traditional IBS, TBS technology is far from mature, facing the difficult choice of sample accurate, single sampling strategies ineffective, large cost of preheating function many other challenging problems. To solve these problems, further research is carried out on multi-core micro architecture simulation technology of TBS. First of all, based on TBS Technology Sample accurate selection problem, put forward to guide the choice of sample fractal behavior using multi-threaded benchmark sampling strategies to avoid complex PCantorSim.PCantorSim pretreatment process strategy selection in the traditional sample, enhance the sampling efficiency and wide applicability. Specifically, PCantorSim found multi-threaded benchmarks in the implementation process in addition to cycle the behavior has a stage has self similar fractal behavior, namely the program running results were observed at different time scales of the behavior has self similarity. Based on this discovery, the proposed PCantorSim sampling strategy can quickly and accurately select a representative sample of fragments, greatly shorten the sampling in simulation time. The test and evaluation of PCantorSim in the multi-core benchmarks in the PARSEC program running in the simulation of 8 core systems Compared to the full, detailed simulation without sampling, PCantorSim sampling simulation simulation speed was increased by 20 times, the average execution time prediction error and the test program is only 5.3%. second, for a single sampling strategy to make full use of technical advantages of TBS, the proposed multi-layer piecewise fractal sampling strategy based on THS (Two-level Hybrid Sampling) THS. Through a number of single sampling strategy with comparative analysis in TBS revealed a series of yet to be discovered before. For example, (1) compared with the IPC simulation to predict the phase (Instructions Per Cycle), to predict the rapid simulation stage of IPC is more important; (2) the fast simulation of IPC forecast accuracy the stage of sample selection strategy and fast simulation of IPC prediction algorithm is determined; (3) when the sample is small fragment length selection, sampling strategy based on fractal (Cantor Sampling) more accurate Indeed, when the sample fragment length is large, periodic sampling strategy based on segmentation (Periodic Sampling) more accurately: (4) random sampling strategy (Random Sampling) is not suitable for the application to TBS technology. Based on these findings, THS designed a multi sampling strategy based on segmentation and shape, can be used different single sampling strategy advantages and avoid their shortcomings, in order to better play the performance evaluation accuracy of TBS technology and simulation speed-up advantage. Experimental results show that the average THS program execution time prediction error is 4% and the simulation speed of more than 40 times on THS. Further evaluation shows that it has higher the cross micro architecture evaluation accuracy, can effectively guide the multi processor micro architecture design scheme selection. Finally, according to the function of TBS technology in the high cost of preheating, a real-time power Can accelerate the mechanism of SOL (Shorter On-Line pre Warmup).SOL mechanism adopts two stage preheating function design, the first phase of the preheating Prime selection strategy for the appropriate length of the simulated fragments, and then select the preheating fragment in the first phase of the expansion in after the implementation of the optimized NSL (No-State-Loss) preheating strategy, thereby reducing the cost and maintain the good function of preheating and preheating results. By exploring the tuning of the SOL parameters, to determine the function of preheating the reasonable parameters, to achieve performance evaluation accuracy and simulation speed of speedup balance effectively. The experimental results show that the SOL mechanism has wide applicability, can be integrated into the existing multiple TBS strategy in fast warm-up in the simulation of sampling function components, and in the premise of keeping the simulation precision and improve the simulation speed.

【學位授予單位】:華中科技大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:TP303

【相似文獻】

相關(guān)期刊論文 前10條

1 ;解析英特爾“酷睿”微體系結(jié)構(gòu) 設(shè)立高能效表現(xiàn)新標準[J];個人電腦;2006年07期

2 馬鵬;盧景芬;龔令侃;;32位嵌入式CPU的微體系結(jié)構(gòu)設(shè)計[J];計算機工程;2008年S1期

3 易會戰(zhàn),楊學軍;高性能微處理器的微體系結(jié)構(gòu)能量有效性[J];計算機學報;2004年07期

4 王永文,張民選;高性能微處理器微體系結(jié)構(gòu)級功耗模型及分析[J];計算機學報;2004年10期

5 龐九鳳;李險峰;謝勁松;佟冬;程旭;;基于支持向量機的微體系結(jié)構(gòu)設(shè)計空間探索(英文)[J];北京大學學報(自然科學版);2010年01期

6 ;肉嫩皮滑 “扣肉”第一印象[J];現(xiàn)代計算機(普及版);2006年08期

7 王沁;王磊;羅新強;;周期級精確的微體系結(jié)構(gòu)模擬器開發(fā)環(huán)境[J];系統(tǒng)仿真學報;2012年11期

8 肖燦文;戴澤福;張民選;;新型適應(yīng)性路由器微體系結(jié)構(gòu)研究[J];計算機工程與科學;2013年11期

9 王宇;劉宏偉;;基于FPGA的微體系結(jié)構(gòu)驗證平臺[J];智能計算機與應(yīng)用;2013年03期

10 謝倫國;劉德峰;;存儲級并行與處理器微體系結(jié)構(gòu)[J];計算機學報;2011年04期

相關(guān)會議論文 前1條

1 李鑫;竇勇;鄧林;張勁;;多核平臺下事務(wù)處理類應(yīng)用性能分析及評價[A];2010年第16屆全國信息存儲技術(shù)大會(IST2010)論文集[C];2010年

相關(guān)重要報紙文章 前2條

1 宋家雨;安騰路線圖披露 高端之爭愈演愈烈[N];網(wǎng)絡(luò)世界;2007年

2 本報記者 謝作昱;核心技術(shù)自主必然經(jīng)歷風雨[N];中國知識產(chǎn)權(quán)報;2005年

相關(guān)博士學位論文 前3條

1 姜春濤;面向多核微體系結(jié)構(gòu)模擬的采樣加速策略研究[D];華中科技大學;2016年

2 喻之斌;處理器微體系結(jié)構(gòu)模擬加速策略研究[D];華中科技大學;2008年

3 劉揚帆;硬件事務(wù)存儲微體系結(jié)構(gòu)及其驗證研究[D];浙江大學;2012年

相關(guān)碩士學位論文 前4條

1 馬志偉;1GHz向量執(zhí)行部件的設(shè)計與優(yōu)化[D];國防科學技術(shù)大學;2014年

2 譚霜;基于GPU微體系結(jié)構(gòu)的高性能計算研究[D];國防科學技術(shù)大學;2009年

3 盧仕聽;基于微體系結(jié)構(gòu)分析的旁道攻擊及其防御技術(shù)研究[D];復(fù)旦大學;2010年

4 侯進永;低功耗TLB設(shè)計關(guān)鍵技術(shù)研究[D];國防科學技術(shù)大學;2005年

,

本文編號:1612054

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1612054.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶64cbf***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美一级日韩中文字幕| 激情爱爱一区二区三区| 日韩精品一区二区毛片| 亚洲天堂有码中文字幕视频| 国产精品成人一区二区在线| 熟女白浆精品一区二区| 国产精品免费视频视频| 久久福利视频视频一区二区| 国产成人人人97超碰熟女| 久久大香蕉一区二区三区| 欧美国产日产在线观看| 伊人欧美一区二区三区| 亚洲精品小视频在线观看| 欧美人妻免费一区二区三区| 性欧美唯美尤物另类视频| 国产韩国日本精品视频| 欧美午夜国产在线观看| 亚洲男人天堂成人在线视频 | 欧美日韩亚洲国产综合网 | 99精品人妻少妇一区二区人人妻| 国产精品午夜性色视频| 精品高清美女精品国产区| 麻豆亚州无矿码专区视频| 日韩精品中文字幕在线视频| 久久精品欧美一区二区三不卡| 国产精品一区欧美二区| 中文字幕人妻综合一区二区| 久草视频这里只是精品| 六月丁香六月综合缴情| 狠狠干狠狠操亚洲综合| 亚洲综合天堂一二三区| 国产精品激情在线观看| 插进她的身体里在线观看骚| 精品国产成人av一区二区三区| 五月天丁香婷婷狠狠爱| 激情亚洲一区国产精品久久| 一本色道久久综合狠狠躁| 成人日韩在线播放视频| 亚洲天堂国产精品久久精品| 亚洲高清欧美中文字幕| 日韩精品中文在线观看|