高性能CPU存儲(chǔ)控制器優(yōu)化設(shè)計(jì)
發(fā)布時(shí)間:2018-06-17 00:07
本文選題:存儲(chǔ)控制器 + 地址映射 ; 參考:《國(guó)防科學(xué)技術(shù)大學(xué)》2012年碩士論文
【摘要】:存儲(chǔ)器的訪問(wèn)速度對(duì)處理器性能的發(fā)揮起著不可忽視的作用,在多核多線程處理器中尤甚。存儲(chǔ)器訪問(wèn)的速度受存儲(chǔ)控制器的制約。存儲(chǔ)控制器決定計(jì)算機(jī)系統(tǒng)所能使用的最大內(nèi)存容量、存儲(chǔ)體數(shù)、內(nèi)存的類型和速度、內(nèi)存顆粒的數(shù)據(jù)深度和數(shù)據(jù)寬度等重要參數(shù)。存儲(chǔ)控制器設(shè)計(jì)的好壞直接影響處理器性能的高低。本文的研究對(duì)象是X處理器中存儲(chǔ)控制器的優(yōu)化設(shè)計(jì)。X處理器是一款高性能處理器,可支持多線程和SIMD。它內(nèi)部集成16個(gè)核,每個(gè)核擁有4個(gè)線程,運(yùn)算部件由兩套整數(shù)處理部件,一套向量處理部件,一套浮點(diǎn)處理部件和一套存取部件構(gòu)成。該處理器片上內(nèi)集成了4個(gè)雙通道存儲(chǔ)控制器,可支持并行訪存。當(dāng)處理的運(yùn)算集非常大時(shí),運(yùn)算數(shù)據(jù)量會(huì)十分龐大,加大內(nèi)存的訪存壓力;雖然多個(gè)存儲(chǔ)控制器并行執(zhí)行,在一定程度上緩解了訪存壓力,但是訪存地址流會(huì)比較分散,使得存儲(chǔ)控制器的功能無(wú)法充分發(fā)揮。 本文在深入研究X處理器和DDR3SDRAM的基礎(chǔ)上,,以降低訪存延時(shí)為目的,仔細(xì)分析了現(xiàn)有存儲(chǔ)控制器的基本結(jié)構(gòu),做了優(yōu)化改進(jìn)。為了提高程序局部性、訪存體并行性和行局部性,本文設(shè)計(jì)了全異或地址映射方式;為了增加訪存命令行命中率,減少讀寫(xiě)切換延遲,本文設(shè)計(jì)了分層訪存調(diào)度器,分別在體內(nèi)調(diào)度和體間調(diào)度兩個(gè)層次對(duì)請(qǐng)求重新排序,設(shè)置了防餓死機(jī)制,盡可能的提高了存儲(chǔ)器帶寬利用率;為了降低活躍頁(yè)頻繁開(kāi)啟和關(guān)閉所帶來(lái)的延遲,本文在片上緩沖和存儲(chǔ)控制器之間增加了虛擬緩沖行模塊,達(dá)到了增加活躍頁(yè)個(gè)數(shù)的目的。 本文采用verilog描述語(yǔ)言對(duì)存儲(chǔ)控制器優(yōu)化設(shè)計(jì)進(jìn)行了邏輯描述,對(duì)優(yōu)化后整體結(jié)構(gòu)進(jìn)行了全面的功能驗(yàn)證,保證了存儲(chǔ)控制器工作的正確性。最后,對(duì)優(yōu)化前后的結(jié)構(gòu)進(jìn)行了詳細(xì)的性能測(cè)試和對(duì)比,優(yōu)化后帶寬從原來(lái)的5.88GB/s達(dá)到了18.55GB/s,體現(xiàn)了本文優(yōu)化設(shè)計(jì)的優(yōu)越性。
[Abstract]:Memory access speed plays an important role in the performance of processors, especially in multi-core multithreaded processors. The speed of memory access is limited by the memory controller. The memory controller determines the maximum memory capacity, the number of memory bodies, the type and speed of memory, the data depth and width of memory particles, and so on. The quality of memory controller design directly affects the processor performance. The research object of this paper is the optimized design of storage controller in X processor. X processor is a high performance processor which can support multithreading and SIMD. It consists of 16 cores, each of which has 4 threads. The operation unit consists of two sets of integer processing units, a set of vector processing units, a set of floating-point processing units and a set of access components. Four dual-channel memory controllers are integrated on the chip to support parallel memory access. When the operation set is very large, the amount of computing data will be very large, which will increase the memory access pressure. Although several memory controllers execute in parallel, to some extent, the memory access pressure will be alleviated, but the memory access address stream will be scattered. The function of the storage controller can not be brought into full play. Based on the in-depth study of X processor and DDR3 SDRAM, in order to reduce the memory access delay, the basic structure of the existing memory controller is analyzed in detail, and the optimization improvement is made. In order to improve program locality, memory access parallelism and row locality, this paper designs a total XOR address mapping method, and in order to increase the hit rate of access command line and reduce the delay of read / write switch, a hierarchical memory access scheduler is designed. In order to reduce the delay caused by the frequent opening and closing of active pages, the request is reordered at the two levels of internal scheduling and inter-body scheduling, and the mechanism of preventing starvation is set up to improve the utilization of memory bandwidth as much as possible. In this paper, a virtual buffer line module is added between the on-chip buffer and the memory controller to increase the number of active pages. In this paper, the verilog description language is used to describe the optimal design of the storage controller, and the function of the optimized whole structure is verified, which ensures the correctness of the memory controller. Finally, the structure before and after optimization is tested and compared in detail. The optimized bandwidth reaches 18.55 GB / s from 5.88 GB / s, which shows the superiority of the optimization design in this paper.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP332
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 王斌;熊志輝;陳立棟;譚樹(shù)人;張茂軍;;具有時(shí)間隱藏特性的數(shù)據(jù)塊讀寫(xiě)SDRAM控制器[J];計(jì)算機(jī)工程;2009年04期
2 遲學(xué)斌;趙毅;;高性能計(jì)算技術(shù)及其應(yīng)用[J];中國(guó)科學(xué)院院刊;2007年04期
本文編號(hào):2028650
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2028650.html
最近更新
教材專著