面向混合片上高速存儲器的數(shù)據(jù)布局方法研究
本文選題:SPM數(shù)據(jù)分配 + Cache行為分析。 參考:《山東大學》2014年碩士論文
【摘要】:近年來,隨著物聯(lián)網(wǎng)技術(shù)和嵌入式智能設備的增長,嵌入式系統(tǒng)得到了迅速的發(fā)展。嵌入式技術(shù)越來越多的應用到無線通訊、智能電話、醫(yī)療技術(shù)和智能樓宇等深深的影響著人們?nèi)粘I畹念I(lǐng)域和行業(yè)中。當今的嵌入式設備對嵌入式系統(tǒng)的運行效率、持續(xù)運行時間、穩(wěn)定性等都提出了更高的要求,因此在嵌入式系統(tǒng)的設計中,針對系統(tǒng)的計算性能和能耗的優(yōu)化是需要考慮的重要問題。 為了緩解CPU運算速度與內(nèi)存讀寫速度不匹配的矛盾,計算機系統(tǒng)引入了片上緩存技術(shù),而當今常用的片上靜態(tài)隨機存儲器包括片上高速緩存(Cache)和便箋式存儲器(SPM,Scratchpad Memory)已經(jīng)廣泛應用到嵌入式系統(tǒng)中。在數(shù)據(jù)密集型的程序中,內(nèi)存子系統(tǒng)是整個系統(tǒng)的性能和能耗瓶頸,在高性能和高能效嵌入式系統(tǒng)設計中,內(nèi)存子系統(tǒng)的優(yōu)化是一個關(guān)鍵的考慮因素。雖然現(xiàn)在很多嵌入式系統(tǒng)中已經(jīng)開始使用Cache和SPM作為片上RAM的混合設計,但許多現(xiàn)有SPM數(shù)據(jù)優(yōu)化算法只針對純SPM的架構(gòu),不適用于使用SPM和Cache混合存儲架構(gòu)。本文以片上SPM和Cache混合緩存架構(gòu)為背景,圍繞混合片上存儲器性能和能耗優(yōu)化這一主題,提出了基于Cache行為分析的混合片上高速存儲器SPM和Cache數(shù)據(jù)分配優(yōu)化算法。 論文的主要工作包括: (1)通過研究混合SRAM架構(gòu)下的SPM數(shù)據(jù)分配問題來優(yōu)化嵌入式系統(tǒng)性能和能耗。本文提出一種基于整數(shù)線性規(guī)劃的最優(yōu)化解決方案,方案不但考慮數(shù)據(jù)在Cache中的訪問頻率,而且考慮內(nèi)存塊在Cache中未命中時的沖突行為,最終使用整數(shù)線性規(guī)劃來求取使性能最高或能耗最低的SPM分配方案。對比純SPM的架構(gòu),實驗結(jié)果顯示本文的混合存儲器優(yōu)化算法能更好的利用片上存儲器的優(yōu)勢。 (2)提出一種基于數(shù)據(jù)Cache跟蹤的Cache行為分析模型。本文采用并擴充了時空沖突集(TCS,Temporal Conflict Set)的理論作為精確分析Cache行為的模型。該模型與基于Cache沖突圖的分析模型相比,本文模型使用TCS作為Cache分析的基礎,算法對于每一次Cache未命中計算一個詳細的沖突序列,通過ILP算法精確的計算出由于內(nèi)存塊的不同SPM分配對Cache行為造成的不同影響。 (3)為最大限度的利用SPM的優(yōu)勢,本文最后提出基于內(nèi)存塊的數(shù)組細粒度分割算法。在數(shù)組分割算法中,每個數(shù)組可以被分為多個不同的部分,有些部分會被映射到SPM中,有些被分配到外存中,這種細粒度的數(shù)組分割方法能更大程度的提高系統(tǒng)性能和降低系統(tǒng)能耗。 (4)優(yōu)化方案整合到一個統(tǒng)一編譯框架中,從ILP優(yōu)化器中優(yōu)化完的結(jié)果會被轉(zhuǎn)換成一個鏈接腳本文件,這個優(yōu)化腳本會重新被編譯器編譯成一個優(yōu)化后的執(zhí)行體。本文的工作基于改進的SimpleScalar仿真工具。
[Abstract]:In recent years, with the growth of Internet of things technology and embedded intelligent devices, embedded systems have been rapidly developed. Embedded technology is more and more used in wireless communication, smart phone, medical technology and intelligent building, which deeply affect the fields and industries of people's daily life. Nowadays, embedded devices have put forward higher requirements for the running efficiency, running time and stability of embedded system, so in the design of embedded system, The optimization of system performance and energy consumption is an important problem to be considered. In order to alleviate the contradiction between CPU operation speed and memory read / write speed, the technology of on-chip cache is introduced in computer system. Nowadays, on-chip static random access memory (SRAM), including on-chip cache (Cache) and note memory (SPM), has been widely used in embedded systems. In data-intensive programs, the memory subsystem is the bottleneck of the performance and energy consumption of the whole system, and the optimization of the memory subsystem is a key consideration in the design of high-performance and energy-efficient embedded systems. Although many embedded systems have started to use cache and SPM as the hybrid design of on-chip RAM, but many existing SPM data optimization algorithms are only for pure SPM architecture, not for the use of SPM and cache hybrid storage architecture. In this paper, the SPM and cache hybrid cache architecture is taken as the background, and the optimization algorithm of SPM and cache data allocation based on Cache behavior analysis is proposed around the optimization of memory performance and energy consumption on the hybrid chip. The main work of this paper is as follows: (1) the performance and energy consumption of embedded system are optimized by studying the SPM data allocation problem in hybrid SRAM architecture. In this paper, an optimization solution based on integer linear programming is proposed. The scheme not only considers the access frequency of data in cache, but also considers the collision behavior of memory block when it is missed in cache. Finally, integer linear programming is used to obtain the SPM allocation scheme with the highest performance or the lowest energy consumption. Compared with the pure SPM architecture, the experimental results show that the hybrid memory optimization algorithm can make better use of the advantages of on-chip memory. (2) A Cache behavior analysis model based on data cache tracking is proposed. In this paper, the theory of temporal conflict set (TCSC) is adopted and extended as a model for the accurate analysis of cache behavior. Compared with the analysis model based on cache collision graph, this model uses TCS as the basis of cache analysis, and the algorithm calculates a detailed conflict sequence for each cache miss. The ILP algorithm is used to accurately calculate the different effects of different SPM allocation on cache behavior. (3) in order to maximize the advantage of SPM, this paper proposes an array fine-grained segmentation algorithm based on memory block. In an array segmentation algorithm, each array can be divided into different parts, some of which are mapped to the SPM, some are assigned to external memory, This fine-grained array segmentation method can greatly improve system performance and reduce system energy consumption. (4) the optimization scheme is integrated into a unified compilation framework. The optimized results from the ILP optimizer are converted into a linked script file, which is recompiled by the compiler into an optimized execution. The work of this paper is based on the improved SimpleScalar simulation tool.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP333
【共引文獻】
相關(guān)期刊論文 前7條
1 張保民;吳國偉;姚琳;;程序最壞執(zhí)行時間極值統(tǒng)計方法[J];計算機工程與應用;2010年26期
2 謝曉園;許蕾;徐寶文;聶長海;史亮;;演化測試技術(shù)的研究[J];計算機科學與探索;2008年05期
3 黃濤;王晶;管雪濤;鐘祺;王克義;;采用分區(qū)域管理的軟硬件協(xié)作高能效末級高速緩存設計[J];計算機輔助設計與圖形學學報;2013年11期
4 劉鵬;方磊;黃巍;;DEAM: Decoupled, Expressive, Area-Efficient Metadata Cache[J];Journal of Computer Science & Technology;2014年04期
5 魏海濤;秦明康;于俊清;范東睿;;一種面向眾核架構(gòu)的數(shù)據(jù)流編譯框架[J];計算機學報;2014年07期
6 劉輝;張立臣;許陽;;信息物理系統(tǒng)實時任務WCET的研究[J];計算機技術(shù)與發(fā)展;2012年04期
7 何炎祥;李清安;陳勇;吳偉;徐超;江南;;基于函數(shù)調(diào)用圖的靜態(tài)數(shù)據(jù)分配[J];武漢大學學報(理學版);2013年06期
相關(guān)博士學位論文 前6條
1 姬孟洛;實時系統(tǒng)最差情況執(zhí)行時間分析的研究[D];國防科學技術(shù)大學;2006年
2 朱素霞;面向多核處理器確定性重演的內(nèi)存競爭記錄機制研究[D];哈爾濱工業(yè)大學;2013年
3 董勇;大規(guī)模并行計算系統(tǒng)軟件低功耗關(guān)鍵技術(shù)研究[D];國防科學技術(shù)大學;2012年
4 陳芳園;基于多核處理器平臺的實時系統(tǒng)WCET分析研究[D];國防科學技術(shù)大學;2011年
5 李清安;面向非易失性片上存儲的編譯技術(shù)研究[D];武漢大學;2013年
6 項曉燕;體系結(jié)構(gòu)級Cache功耗優(yōu)化技術(shù)研究[D];浙江大學;2013年
相關(guān)碩士學位論文 前8條
1 趙修偉;基于抽象解釋的實時軟件WCET研究[D];大連理工大學;2009年
2 劉基軍;基于ScratchPad Memory的實時性研究[D];中南大學;2010年
3 劉敏娜;智能網(wǎng)絡磁盤存儲系統(tǒng)中IND任務調(diào)度模型的研究[D];華南理工大學;2010年
4 曾憲彬;MIPS仿真器設計與應用[D];杭州電子科技大學;2013年
5 黃品豐;面向異構(gòu)處理器的代價模型及存儲優(yōu)化技術(shù)研究[D];解放軍信息工程大學;2013年
6 張駿;便箋存儲嵌入式系統(tǒng)中多層存儲上的數(shù)據(jù)分配算法研究[D];湖南大學;2013年
7 劉曉慶;嵌入式SRAM編譯器設計與IP驗證[D];安徽大學;2014年
8 劉雨辰;基于多維數(shù)組的高速片上網(wǎng)絡模擬器的設計與實現(xiàn)[D];內(nèi)蒙古大學;2014年
,本文編號:2102726
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2102726.html