X-DSP一級數(shù)據(jù)Cache的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-04-14 18:26
本文選題:DSP + Cache。 參考:《國防科學(xué)技術(shù)大學(xué)》2013年碩士論文
【摘要】:隨著集成電路技術(shù)的發(fā)展,,DSP性能的進(jìn)一步提升面臨著越來越嚴(yán)重的“存儲墻”問題!癈ache+RAM”的存儲結(jié)構(gòu)是解決這個(gè)問題的重要途徑之一。設(shè)計(jì)一種高效、靈活的一級數(shù)據(jù)Cache(L1D Cache),對提高DSP的訪存效率和整體性能具有重要作用。 X-DSP是國防科大微電子所自主研發(fā)的32位高性能DSP,采用超長指令字結(jié)構(gòu)(VLIW),支持兩路并行的Load/Store訪存請求。X-DSP采用片內(nèi)二級存儲結(jié)構(gòu),即一級存儲和二級存儲,其中一級存儲包括一級指令Cache和一級數(shù)據(jù)Cache,二級存儲器為Cache/SRAM可配置共享存儲器。本文圍繞L1D Cache的設(shè)計(jì)實(shí)現(xiàn)進(jìn)行研究,主要工作包括以下幾方面: 1、在分析X-DSP總體結(jié)構(gòu)和存儲層次設(shè)計(jì)需求的基礎(chǔ)上,設(shè)計(jì)實(shí)現(xiàn)了一種可根據(jù)應(yīng)用需求靈活配置容量的L1D Cache。該L1D Cache采用兩路組相聯(lián)的映象規(guī)則、偽LRU替換算法、寫回和不按寫分配策略,支持兩路并行訪存。 2、設(shè)計(jì)實(shí)現(xiàn)了一種軟硬件結(jié)合的L1D Cache數(shù)據(jù)一致性維護(hù)機(jī)制:L1D Cache既支持來自二級存儲器(L2SRAM)的偵聽操作,以保持與DMA讀寫L2SRAM時(shí)的數(shù)據(jù)一致性;又為程序員提供了豐富的控制寄存器,可對L1D Cache進(jìn)行全局或者部分的寫回或者作廢操作。同時(shí)還設(shè)計(jì)實(shí)現(xiàn)了對一級數(shù)據(jù)存儲器及控制寄存器的保護(hù)機(jī)制,保證只有符合權(quán)限配置的請求才能訪問存儲空間以及對寄存器進(jìn)行讀寫操作。 3、針對X-DSP訪存指令特點(diǎn),提出了一種支持跨邊界訪問的解決方案,即把一個(gè)跨邊界的非對齊訪問拆分為兩個(gè)對齊的訪問,該方案具有效率高、硬件開銷小且不會增加編譯器的額外負(fù)擔(dān)等特點(diǎn)。 4、針對L1D Cache處理訪存指令的命中與缺失特點(diǎn)設(shè)計(jì)實(shí)現(xiàn)了訪存流水線和缺失流水線,并設(shè)計(jì)了一個(gè)寬度為128bit,深度為4且支持寫合并的寫缺失緩沖隊(duì)列,有效地減少了寫缺失的等待時(shí)間。 最后進(jìn)行了模塊級功能驗(yàn)證和邏輯綜合,結(jié)果表明,L1D Cache功能正確,主頻達(dá)到了1GHz,滿足X-DSP的設(shè)計(jì)要求。
[Abstract]:With the development of integrated circuit technology and the further improvement of the performance of Cache, the problem of "storage wall" is becoming more and more serious. The storage structure of "Cache RAM" is one of the important ways to solve this problem.The design of an efficient and flexible primary data Cache(L1D is very important to improve the memory access efficiency and overall performance of DSP.X-DSP is a 32-bit high-performance DSP developed by the Institute of Microelectronics of National Defense University of Science and Technology. It uses ultra-long instruction word structure and supports two parallel Load/Store memory access requests. X-DSP uses in-chip secondary storage architecture, namely, primary storage and secondary storage.The first level storage includes one level instruction Cache and one level data Cache. the second level memory is Cache/SRAM configurable shared memory.This paper focuses on the design and implementation of L1D Cache, the main work includes the following aspects:1. On the basis of analyzing the overall structure and storage hierarchy design requirements of X-DSP, a L1D Cache-based system is designed and implemented, which can flexibly configure the capacity according to the application requirements.The L1D Cache uses two sets of associated mapping rules, pseudo LRU replacement algorithm, write-back and non-write-assignment strategies, and supports two parallel memory access.2. We design and implement a consistency maintenance mechanism of L1D Cache data:: L1D Cache, which combines hardware and software, not only supports the listening operation from the secondary memory (L2SRAM), but also provides the programmer with abundant control registers, so as to keep the consistency with the data when DMA reads and writes L2SRAM.L 1D Cache can be global or partial write-back or invalidated.At the same time, the protection mechanism of the first level data memory and control register is designed and implemented, which ensures that only the request according to the permission configuration can access the storage space and read and write the register.3. In view of the characteristics of X-DSP memory access instruction, a solution is proposed to support cross-boundary access, that is, the unaligned access across a boundary is divided into two aligned access, which has high efficiency.The hardware cost is small and does not increase the additional burden on the compiler and so on.4. According to the hit and absence characteristics of L1D Cache processing memory access instruction, this paper designs and implements pipeline and pipeline, and designs a write deletion buffer queue with a width of 128 bits, a depth of 4 and support for write merging.Effectively reduces the write missing wait time.Finally, the module level function verification and logic synthesis are carried out. The results show that the L1D Cache functions correctly and the main frequency reaches 1 GHz, which meets the design requirements of X-DSP.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 湯偉;李俊峰;;基于總線監(jiān)聽的Cache一致性協(xié)議分析[J];福建電腦;2009年07期
2 楊文華,羅曉沛;專用集成電路的設(shè)計(jì)驗(yàn)證方法及一種實(shí)際的通用微處理器設(shè)計(jì)的多級驗(yàn)證體系[J];計(jì)算機(jī)研究與發(fā)展;1999年06期
3 彭軍;楊樂;稂嬋新;盛立琨;;基于總線偵聽Cache一致性協(xié)議算法與實(shí)現(xiàn)[J];計(jì)算機(jī)與現(xiàn)代化;2007年10期
本文編號:1750490
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1750490.html
最近更新
教材專著