高性能CPU存儲控制器優(yōu)化設(shè)計
發(fā)布時間:2018-06-17 00:07
本文選題:存儲控制器 + 地址映射; 參考:《國防科學(xué)技術(shù)大學(xué)》2012年碩士論文
【摘要】:存儲器的訪問速度對處理器性能的發(fā)揮起著不可忽視的作用,在多核多線程處理器中尤甚。存儲器訪問的速度受存儲控制器的制約。存儲控制器決定計算機系統(tǒng)所能使用的最大內(nèi)存容量、存儲體數(shù)、內(nèi)存的類型和速度、內(nèi)存顆粒的數(shù)據(jù)深度和數(shù)據(jù)寬度等重要參數(shù)。存儲控制器設(shè)計的好壞直接影響處理器性能的高低。本文的研究對象是X處理器中存儲控制器的優(yōu)化設(shè)計。X處理器是一款高性能處理器,可支持多線程和SIMD。它內(nèi)部集成16個核,每個核擁有4個線程,運算部件由兩套整數(shù)處理部件,一套向量處理部件,一套浮點處理部件和一套存取部件構(gòu)成。該處理器片上內(nèi)集成了4個雙通道存儲控制器,可支持并行訪存。當(dāng)處理的運算集非常大時,運算數(shù)據(jù)量會十分龐大,加大內(nèi)存的訪存壓力;雖然多個存儲控制器并行執(zhí)行,在一定程度上緩解了訪存壓力,但是訪存地址流會比較分散,使得存儲控制器的功能無法充分發(fā)揮。 本文在深入研究X處理器和DDR3SDRAM的基礎(chǔ)上,,以降低訪存延時為目的,仔細(xì)分析了現(xiàn)有存儲控制器的基本結(jié)構(gòu),做了優(yōu)化改進。為了提高程序局部性、訪存體并行性和行局部性,本文設(shè)計了全異或地址映射方式;為了增加訪存命令行命中率,減少讀寫切換延遲,本文設(shè)計了分層訪存調(diào)度器,分別在體內(nèi)調(diào)度和體間調(diào)度兩個層次對請求重新排序,設(shè)置了防餓死機制,盡可能的提高了存儲器帶寬利用率;為了降低活躍頁頻繁開啟和關(guān)閉所帶來的延遲,本文在片上緩沖和存儲控制器之間增加了虛擬緩沖行模塊,達到了增加活躍頁個數(shù)的目的。 本文采用verilog描述語言對存儲控制器優(yōu)化設(shè)計進行了邏輯描述,對優(yōu)化后整體結(jié)構(gòu)進行了全面的功能驗證,保證了存儲控制器工作的正確性。最后,對優(yōu)化前后的結(jié)構(gòu)進行了詳細(xì)的性能測試和對比,優(yōu)化后帶寬從原來的5.88GB/s達到了18.55GB/s,體現(xiàn)了本文優(yōu)化設(shè)計的優(yōu)越性。
[Abstract]:Memory access speed plays an important role in the performance of processors, especially in multi-core multithreaded processors. The speed of memory access is limited by the memory controller. The memory controller determines the maximum memory capacity, the number of memory bodies, the type and speed of memory, the data depth and width of memory particles, and so on. The quality of memory controller design directly affects the processor performance. The research object of this paper is the optimized design of storage controller in X processor. X processor is a high performance processor which can support multithreading and SIMD. It consists of 16 cores, each of which has 4 threads. The operation unit consists of two sets of integer processing units, a set of vector processing units, a set of floating-point processing units and a set of access components. Four dual-channel memory controllers are integrated on the chip to support parallel memory access. When the operation set is very large, the amount of computing data will be very large, which will increase the memory access pressure. Although several memory controllers execute in parallel, to some extent, the memory access pressure will be alleviated, but the memory access address stream will be scattered. The function of the storage controller can not be brought into full play. Based on the in-depth study of X processor and DDR3 SDRAM, in order to reduce the memory access delay, the basic structure of the existing memory controller is analyzed in detail, and the optimization improvement is made. In order to improve program locality, memory access parallelism and row locality, this paper designs a total XOR address mapping method, and in order to increase the hit rate of access command line and reduce the delay of read / write switch, a hierarchical memory access scheduler is designed. In order to reduce the delay caused by the frequent opening and closing of active pages, the request is reordered at the two levels of internal scheduling and inter-body scheduling, and the mechanism of preventing starvation is set up to improve the utilization of memory bandwidth as much as possible. In this paper, a virtual buffer line module is added between the on-chip buffer and the memory controller to increase the number of active pages. In this paper, the verilog description language is used to describe the optimal design of the storage controller, and the function of the optimized whole structure is verified, which ensures the correctness of the memory controller. Finally, the structure before and after optimization is tested and compared in detail. The optimized bandwidth reaches 18.55 GB / s from 5.88 GB / s, which shows the superiority of the optimization design in this paper.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP332
【參考文獻】
相關(guān)期刊論文 前2條
1 王斌;熊志輝;陳立棟;譚樹人;張茂軍;;具有時間隱藏特性的數(shù)據(jù)塊讀寫SDRAM控制器[J];計算機工程;2009年04期
2 遲學(xué)斌;趙毅;;高性能計算技術(shù)及其應(yīng)用[J];中國科學(xué)院院刊;2007年04期
本文編號:2028650
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2028650.html
最近更新
教材專著