多核處理器層次化存儲(chǔ)體系研究
本文選題:多核處理器 + 嵌入式應(yīng)用; 參考:《復(fù)旦大學(xué)》2012年碩士論文
【摘要】:近年來,以平板電腦、智能手機(jī)為代表的手持式消費(fèi)電子產(chǎn)品獲得了前所未有的快速發(fā)展機(jī)遇,隨著產(chǎn)品的不斷升級(jí),不斷提升的硬件配置水平帶動(dòng)功耗需求不斷走高。處理器作為消費(fèi)電子產(chǎn)品的核心部件,其技術(shù)需求特征逐漸從高性能轉(zhuǎn)向高性能與低功耗并舉。另一方面,隨著工藝更新的步伐逐漸放緩,依靠提高時(shí)鐘頻率以獲取性能增長(zhǎng)的做法已經(jīng)被證明不可持續(xù),具有內(nèi)在并行性與靈活性的多核架構(gòu)已經(jīng)成為處理器的主流架構(gòu)。對(duì)于功耗敏感、種類繁多的嵌入式應(yīng)用而言,多核處理器內(nèi)在的并行處理能力、可擴(kuò)展性和潛在的低功耗特征顯得尤其適用。 本文旨在通過研究面向嵌入式應(yīng)用的多核處理器的層次化存儲(chǔ)體系,在已有的典型處理器存儲(chǔ)架構(gòu)設(shè)計(jì)方案的基礎(chǔ)之上,提出了一種更為適用嵌入式多核處理器的存儲(chǔ)架構(gòu)。論文的研究目標(biāo)是通過層次化存儲(chǔ)架構(gòu)的創(chuàng)新設(shè)計(jì),統(tǒng)籌考慮嵌入式應(yīng)用的高性能與低功耗需求,以滿足嵌入式應(yīng)用的技術(shù)需求特征。 論文的創(chuàng)新研究可以歸納為以下幾點(diǎn): (1)簇狀結(jié)構(gòu)層次化存儲(chǔ)體系 本文提出了一類基于簇狀結(jié)構(gòu)的層次化存儲(chǔ)體系。該存儲(chǔ)體系針對(duì)嵌入式應(yīng)用的需求特征,優(yōu)化了存儲(chǔ)體系中各層次的權(quán)重:通過擴(kuò)展寄存器文件設(shè)計(jì)增加了數(shù)據(jù)局部性,通過緩存缺省設(shè)計(jì)降低了存儲(chǔ)系統(tǒng)的硬件開銷,通過私有與共享數(shù)據(jù)存儲(chǔ)器的劃分提升了數(shù)據(jù)局部性,增強(qiáng)了存儲(chǔ)系統(tǒng)的層次性。 (2)擴(kuò)展寄存器文件設(shè)計(jì) 在簇狀結(jié)構(gòu)層次化存儲(chǔ)體系中,本文提出了兼容32位指令位寬的寄存器文件擴(kuò)展方案,將寄存器的數(shù)目擴(kuò)展了一倍達(dá)到64個(gè),增強(qiáng)了數(shù)據(jù)的局部性,提升了處理器的整體性能。同時(shí),本文創(chuàng)新地利用了擴(kuò)展寄存器文件所提供的地址映射空間,改進(jìn)并優(yōu)化了消息傳遞核間通信機(jī)制,驗(yàn)證結(jié)果表明該方案可以使與核間通信相關(guān)的指令數(shù)目減少達(dá)50%,有效提升了核間通信效率。 (3)緩存缺省設(shè)計(jì) 在簇狀結(jié)構(gòu)層次化存儲(chǔ)體系中,本文在處理器內(nèi)部采用了緩存缺省設(shè)計(jì)方案,取而代之的為私有存儲(chǔ)單元,節(jié)省了芯片面積并降低了系統(tǒng)的功耗開銷。本文同時(shí)提出了基于私有存儲(chǔ)單元的核間直接通信策略,通過對(duì)數(shù)據(jù)包頭格式的指定,消息傳遞核間通信可以不需要處理器核的參與,進(jìn)一步提升了核間通信效率以及處理器的運(yùn)算效率。 (4)簇內(nèi)共享存儲(chǔ)單元 在簇狀結(jié)構(gòu)層次化存儲(chǔ)體系中,本文設(shè)計(jì)了可以被簇內(nèi)所有處理器節(jié)點(diǎn)共享的存儲(chǔ)單元結(jié)構(gòu),并在該結(jié)構(gòu)基礎(chǔ)上提出了一種共享存儲(chǔ)核間通信機(jī)制以及相應(yīng)的信箱同步機(jī)制。通過將存儲(chǔ)單元?jiǎng)澐譃樗接写鎯?chǔ)單元與共享存儲(chǔ)單元,數(shù)據(jù)的局部性得到提升,處理器訪存延遲問題得到優(yōu)化。 (5)芯片實(shí)現(xiàn)與應(yīng)用實(shí)例 采用該簇狀層次化存儲(chǔ)體系的一款16核處理器采用TSMC65納米低功耗CMOS制造工藝流程,芯片中包含兩個(gè)簇單元,每個(gè)簇單元包含八個(gè)處理器單元與一個(gè)簇內(nèi)共享存儲(chǔ)器單元。處理器芯片面積為9.1mm2,其中單個(gè)處理器核面積為0.43mm2,在1.2V供電電壓下最大時(shí)鐘頻率為750MHz;谠摱嗪颂幚砥,我們實(shí)現(xiàn)了3780點(diǎn)快速傅里葉變換模塊以評(píng)估層次化存儲(chǔ)體系對(duì)性能的提升效果及實(shí)際的功耗水平。測(cè)試結(jié)果表明單個(gè)處理器核的典型功耗為34mW,顯著低于其他同類型多核處理器。
[Abstract]:In recent years, handheld consumer electronic products, such as tablet computers and smartphones, have obtained unprecedented rapid development opportunities. With the continuous upgrading of products, the increasing hardware configuration level drives the power demand to be higher and higher. As the core component of the consumer electronic products, the technology demand features gradually from high sex. On the other hand, with the gradual slowdown in the pace of process updates, the practice of improving the clock frequency to gain performance has been proved unsustainable. The multi-core architecture with inherent parallelism and flexibility has become the main stream architecture of the processor. In terms of applications, multi-core processors are especially suitable for their parallel processing capability, scalability and low power consumption.
The purpose of this paper is to study the hierarchical storage system of multi core processors for embedded applications. On the basis of the existing design of typical processor storage architecture, a storage architecture which is more suitable for embedded multi-core processors is proposed. The research goal of this paper is to pass the innovative design of hierarchical storage architecture and take a comprehensive examination. Consider the high performance and low power requirements of embedded applications to meet the technical requirements of embedded applications.
The innovative research of this paper can be summarized as follows:
(1) hierarchical storage system of cluster structure
A hierarchical storage system based on cluster structure is proposed in this paper. This storage system optimizes the weight of all levels in the storage system according to the requirements of the embedded application. By extending the register file design, the data locality is increased, and the hardware overhead of the storage system is reduced by the default design of the cache. The division of shared data memory improves the locality of data and enhances the hierarchy of storage system.
(2) the design of the extended register file
In the hierarchical storage system of cluster structure, this paper proposes a register file extension scheme compatible with 32 bit instruction bit width, which extends the number of registers to 64, enhances the locality of the data and improves the overall performance of the processor. At the same time, this article innovally uses the address mapping provided by the extended register file. In addition, the communication mechanism of message transfer kernel is improved and optimized. The verification results show that the scheme can reduce the number of instructions related to inter nuclear communication by 50%, and effectively improves the efficiency of inter nuclear communication.
(3) cache default design
In the cluster structure hierarchical storage system, this paper uses the cache default design in the processor, instead of the private storage unit, saves the chip area and reduces the power consumption of the system. At the same time, this paper puts forward a direct connection communication strategy based on private storage unit, and specifies the data Baotou format. The message passing inter core communication can enhance the efficiency of inter core communication and the computing efficiency of the processor without the need of processor core.
(4) a shared memory cell in a cluster
In the cluster structure hierarchical storage system, this paper designs a storage unit that can be shared by all the processor nodes in the cluster. On the basis of this structure, a shared memory inter kernel communication mechanism and the corresponding mailbox synchronization mechanism are proposed. By dividing the storage unit into private storage unit and shared memory unit, the data is divided into a private storage unit and a shared memory unit. The locality of the processor is improved, and the delay of processor access is optimized.
(5) chip implementation and application examples
A 16 core processor using the hierarchical storage system uses a TSMC65 nano low power CMOS manufacturing process. The chip contains two cluster units, each cluster unit contains eight processor units and a shared memory unit in a cluster. The processor chip area is 9.1mm2, with a single core area of 0.43mm2, in 1.2V The maximum clock frequency of the power supply voltage is 750MHz. based on the multi core processor. We implement the 3780 point fast Fu Liye transform module to evaluate the performance enhancement effect and the actual power consumption level of the hierarchical storage system. The test results show that the typical power of the single processor core is 34mW, significantly lower than the other types of multi core processors.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP332
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 ;英特爾公司推出新一代Pentium Pro處理器[J];中國電子商情;1996年02期
2 ;Altera宣布為Nios Ⅱ處理器系統(tǒng)提供新的C語言至硬件加速工具[J];電子與電腦;2006年05期
3 ;汽車用GPS導(dǎo)航系統(tǒng)解決方案[J];世界電子元器件;2006年09期
4 徐鳳英;;Quad FX反戈一擊[J];新電腦;2007年02期
5 ;MCU應(yīng)用新世界:Cortex-M1微控制器和FPGA[J];世界電子元器件;2008年05期
6 岳陽;;領(lǐng)略英特爾“超線程”技術(shù)[J];電腦采購周刊;2002年46期
7 付漢杰;;利用NIOS Ⅱ處理器構(gòu)建節(jié)省成本的嵌入式系統(tǒng)[J];今日電子;2007年05期
8 ;要聞速遞[J];電腦采購周刊;2001年34期
9 劉磊;;對(duì)片上多核系統(tǒng)的系統(tǒng)結(jié)構(gòu)的研究[J];電腦知識(shí)與技術(shù);2008年29期
10 張?jiān)?;圖形工作站 升級(jí)雙核 Dell Precision 670[J];個(gè)人電腦;2006年02期
相關(guān)會(huì)議論文 前10條
1 單書暢;胡瑜;李曉維;;多核處理器的核級(jí)冗余容錯(cuò)技術(shù)[A];第六屆中國測(cè)試學(xué)術(shù)會(huì)議論文集[C];2010年
2 張曉輝;程歸鵬;從明;;龍芯處理器上的TLB性能優(yōu)化技術(shù)[A];2010年第16屆全國信息存儲(chǔ)技術(shù)大會(huì)(IST2010)論文集[C];2010年
3 祁舒U,
本文編號(hào):1853343
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1853343.html