當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

基于dsp的兩級(jí)cache低功耗研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-01-18 01:34

本文關(guān)鍵詞：基于dsp的兩級(jí)cache低功耗研究與實(shí)現(xiàn)　出處：《南昌大學(xué)》2012年碩士論文　論文類型：學(xué)位論文

【摘要】：DSP (digital signal processor),是一種高速處理數(shù)字信號(hào)的微處理器。工作原理是把接收到的模擬信號(hào),轉(zhuǎn)化成數(shù)字信號(hào),再對(duì)數(shù)字信號(hào)進(jìn)行一系列處理(如削弱,加強(qiáng),刪除),最后再把數(shù)字信號(hào)解譯回模擬信號(hào)或具體環(huán)境。DSP已在交通、航空、網(wǎng)絡(luò).醫(yī)療等各個(gè)領(lǐng)域得到了廣泛的應(yīng)用。然后隨著集成電路不斷飛躍前進(jìn),其處理速度越來(lái)越快,工藝不斷提高,集成度越來(lái)越大,而相對(duì)于微處理器而言,存儲(chǔ)器讀寫(xiě)速度的緩慢提高,兩者之間的速度差距越來(lái)越大,以至于存儲(chǔ)器的笨拙,嚴(yán)重導(dǎo)致了瓶頸問(wèn)題的產(chǎn)生,從而制約著系統(tǒng)整體性能的提高。在微處理器和和存儲(chǔ)器之間加入一個(gè)容量小但速度快的高速緩存(Cache)能有效解決此問(wèn)題。本文的主要研究工作是設(shè)計(jì)并實(shí)現(xiàn)一款DSP芯片的二級(jí)低功耗高速緩存。通過(guò)深入學(xué)習(xí)G1000的體系結(jié)構(gòu)和片內(nèi)兩級(jí)存儲(chǔ)結(jié)構(gòu),研究了現(xiàn)代Cache的相關(guān)設(shè)計(jì)技術(shù)和低功耗理論,完成了兩級(jí)低功耗高速緩存(Cache)的設(shè)計(jì)與實(shí)現(xiàn)。其中,一級(jí)Cache采用哈佛結(jié)構(gòu),把指令和數(shù)據(jù)分開(kāi),即一級(jí)指令Cache(L1P)和一級(jí)數(shù)據(jù)Cache(L1D),對(duì)L1P而言,GPU只能對(duì)其進(jìn)行讀操作,沒(méi)有修改的權(quán)限；而對(duì)LID, CPU采用兩組讀寫(xiě)通路對(duì)其訪問(wèn),L1D模塊的組織結(jié)構(gòu)為二組相聯(lián)映射結(jié)構(gòu),采用偽LRU替換策略和寫(xiě)回的寫(xiě)策略,這種設(shè)計(jì)思路可有效提高了Cache命中率,提高讀寫(xiě)速度；L2為二級(jí)cache,采用普林斯頓結(jié)構(gòu),即指令和數(shù)據(jù)可以混合存放在一起,動(dòng)態(tài)有效地分配存儲(chǔ)空間,可在不增加容量的情況下提高命中率,為保證數(shù)據(jù)的一致性,利用Snoop查詢請(qǐng)求來(lái)維護(hù)LID、LIP與L2數(shù)據(jù)一致性。為降低Cache的功耗,本設(shè)計(jì)采用了基于偽LRU和Valid位的組預(yù)測(cè)算法和基于時(shí)間戳監(jiān)控的可重構(gòu)算法。最后對(duì)設(shè)計(jì)綜合優(yōu)化,系統(tǒng)仿真,上板調(diào)試,兩級(jí)Cache控制器很好的完成了其在整個(gè)芯片中的所承擔(dān)的功能。本文的創(chuàng)新之處： Cache設(shè)計(jì)中常用的替換算法,提出了偽LRU替換算法,該算法是基于最近最少使用算法(LRU)的改進(jìn)算法,可以有效地避免使用計(jì)數(shù)器,僅用8位的寄存器就可以達(dá)到記錄訪問(wèn)次數(shù)的計(jì)數(shù)器。 Write buffer的引用：由于L1D是讀miss分配空間,寫(xiě)miss不分配空間的Cache,若把寫(xiě)miss的數(shù)據(jù)直接寫(xiě)進(jìn)L2中,由于L2的數(shù)據(jù)傳輸速度慢和處理的請(qǐng)求多且周期長(zhǎng),這將會(huì)嚴(yán)重影響CPU的處理速度。引用了Write buffer,則可以把寫(xiě)miss的數(shù)據(jù)先暫存,可把寫(xiě)miss的任務(wù)獨(dú)立出來(lái),解脫CPU對(duì)寫(xiě)miss的處理,進(jìn)而可提高CPU的處理速度。充分利用Cache的工作原理-時(shí)間局限性和空間局限性,提出了基于偽LRU和Valid位的組預(yù)測(cè)算法,有效地提高了組預(yù)測(cè)的命中率。達(dá)到了降低功耗,但又不降低性能的目的。通過(guò)運(yùn)用時(shí)間戳來(lái)有效地監(jiān)控Cache的命中率,以此來(lái)動(dòng)態(tài)地配置SRAM/Cache的容量。做到了降低功耗又能保證命中率的效果。
[Abstract]:DSP digital signal processor is a microprocessor that processes digital signals at high speed. Convert into digital signal, then carry on a series of digital signal processing (such as weakening, strengthening, deleting, finally interpreting digital signal back to analog signal or specific environment. DSP has been in traffic, aviation. Network, medical treatment and other fields have been widely used. Then with the rapid progress of integrated circuits, the processing speed is faster and faster, the technology is improving, and the integration level is increasing, compared with the microprocessor. With the slow improvement of the speed of reading and writing, the gap between the speed of the two is increasing, so that the clumsiness of the memory leads to the bottleneck problem. This limits the overall performance of the system. Add a small but fast cache between the microprocessor and memory, Cache). The main research work of this paper is to design and implement a DSP chip with low power consumption cache. Through in-depth study of the G1000 architecture and in-chip two-level storage structure. This paper studies the design technology and low power theory of modern Cache, and completes the design and implementation of two-stage low power cache. Among them, one stage Cache adopts Harvard structure. The instruction is separated from the data, that is, the first-level instruction CacheL1P) and the first-level data Cache-L1DU. For L1P, the GPU can only read it and has no authority to modify it. For id, CPU uses two groups of read and write paths to access the L1D module. The organizational structure of L1D module is two sets of associative mapping structure, pseudo-#en1# replacement strategy and write-back strategy are adopted. This design idea can effectively improve the hit rate of Cache and improve the speed of reading and writing. L2 is a second level cache. it adopts Princeton structure, that is, instruction and data can be mixed together to allocate storage space dynamically and efficiently, which can increase hit rate without increasing capacity. In order to ensure the consistency of data, the Snoop query request is used to maintain the consistency of LID-LIP and L2 data, and to reduce the power consumption of Cache. The group prediction algorithm based on pseudo-#en0# and Valid bit and the reconfigurable algorithm based on timestamp monitoring are adopted in this design. Finally, the design is optimized synthetically, the system is simulated and debugged on the board. The two-stage Cache controller performs well in the whole chip. The innovations of this paper are as follows: This paper presents a pseudo LRU replacement algorithm, which is based on the least recently used algorithm, and can effectively avoid the use of counters. A counter that records the number of visits can be reached with a mere 8-bit register. Reference to Write buffer: since L1D is a read miss allocation, write a miss that does not allocate space, if you write miss data directly into L2. Because of the slow data transmission speed of L2 and the number of requests processed and the long period, this will seriously affect the processing speed of CPU. Reference is made to Write buffer. Then the data of writing miss can be stored temporarily, the task of writing miss can be independent out, the processing of writing miss can be relieved by CPU, and the processing speed of CPU can be improved. A group prediction algorithm based on pseudo LRU and Valid bits is proposed by making full use of the working principle of Cache-time limitation and space limitation. The hit ratio of group prediction is improved effectively, and the power consumption is reduced, but the performance is not reduced. By using timestamp to monitor the hit ratio of Cache effectively, the capacity of SRAM/Cache can be dynamically configured, which can reduce power consumption and ensure hit ratio.
【學(xué)位授予單位】：南昌大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP332

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 王恒娜;訪問(wèn)局部性原理在Cache系統(tǒng)優(yōu)化及設(shè)計(jì)中的應(yīng)用[J];安徽師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年04期

2 湯偉;黃永燦;;單芯片多處理器中Cache一致性的分析[J];安陽(yáng)師范學(xué)院學(xué)報(bào);2009年02期

3 殷婧;;Cache結(jié)構(gòu)的低功耗可重構(gòu)技術(shù)研究[J];單片機(jī)與嵌入式系統(tǒng)應(yīng)用;2009年01期

4 周學(xué)海;余潔;李曦;王志剛;;基于指令行為的Cache可靠性評(píng)估研究[J];計(jì)算機(jī)研究與發(fā)展;2007年04期

5 汪東;陳書(shū)明;;DSCF:一種面向共享存儲(chǔ)多核DSP的數(shù)據(jù)流分簇前向技術(shù)[J];計(jì)算機(jī)研究與發(fā)展;2008年08期

6 楊文華,羅曉沛;專用集成電路的設(shè)計(jì)驗(yàn)證方法及一種實(shí)際的通用微處理器設(shè)計(jì)的多級(jí)驗(yàn)證體系[J];計(jì)算機(jī)研究與發(fā)展;1999年06期

7 周謙;馮曉兵;張兆慶;;Cache Profiling技術(shù)[J];計(jì)算機(jī)工程;2006年13期

8 黃安文;張民選;;多核處理器Cache一致性協(xié)議關(guān)鍵技術(shù)研究[J];計(jì)算機(jī)工程與科學(xué);2009年S1期

9 賈寶鋒;高德遠(yuǎn);丁雙喜;;低功耗動(dòng)態(tài)可配置Cache設(shè)計(jì)[J];計(jì)算機(jī)測(cè)量與控制;2008年07期

10 高效偉;多處理器并行處理的新發(fā)展[J];渤海大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年03期

相關(guān)碩士學(xué)位論文前1條

1 劉勝;DSP高效片內(nèi)二級(jí)Cache控制器的設(shè)計(jì)與實(shí)現(xiàn)[D];國(guó)防科學(xué)技術(shù)大學(xué);2008年

，

本文編號(hào)：1438897

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1438897.html

上一篇：基于知識(shí)圖譜的國(guó)際協(xié)同計(jì)算領(lǐng)域可視化分析
下一篇：基于云計(jì)算平臺(tái)的電信業(yè)務(wù)支撐系統(tǒng)中資源提供策略的研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于dsp的兩級(jí)cache低功耗研究與實(shí)現(xiàn)