同構通用流多核處理器存儲部件關鍵技術研究
發(fā)布時間:2018-10-18 19:22
【摘要】:人們對處理器不斷增長的應用需求促進處理器體系結構的不斷發(fā)展,也促使新型處理器體系結構的誕生。多核流處理器是針對流式數據處理和流應用的新型多核處理器,以數量眾多的簡單核構成。其對于計算密集型應用,數據吞吐率大,,資源利用率高,但是對于訪存密集型和稀疏類應用性能較差。傳統(tǒng)多核結構適用于訪存密集型和稀疏類應用,然而對于流應用,其Cache結構不能高效捕獲流應用的數據局部性。為了滿足流應用與傳統(tǒng)多核應用的綜合需求,為了實現(xiàn)多核流處理器與傳統(tǒng)多核處理器的融合,我們提出了同構通用流多核處理器體系結構:片內集成多個同構的流多核,流多核可根據具體應用配置為傳統(tǒng)多核或流多核的一部分。傳統(tǒng)多核與流多核主要的區(qū)別在于訪存部件,前者是以Cache結構為主的片上緩存結構,后者則是由寄存器文件和片上便簽存儲器構成。通過配置流多核內部的共享的片上存儲資源,調節(jié)便簽存儲器和Cache結構所占的比例,實現(xiàn)同構通用流多核處理器對多種應用需求的適用性。其中Cache結構針對傳統(tǒng)多核的應用,解決其數據上的時間和空間局部性,便簽存儲器主要捕捉流應用中數據的生產者-消費者局部性。 本課題對流多核體系結構訪存部件關鍵技術進行了深入研究,主要工作和創(chuàng)新點包括: 1、提出了一種可配置的片上共享SPM/L2Cache結構。同構通用流處理器的應用范圍包括傳統(tǒng)應用和流應用,其基本組成單元流多核面向不同應用時可分別按片上SMP執(zhí)行模式和SIMT執(zhí)行模式運行。在不同的運行模式下對片上共享存儲結構進行合理配置,以滿足處理器對存儲部件的需求。 2、設計了針對流多核片上緩存結構特點的數據一致性維護協(xié)議。流多核一級私有Cache是寫穿透策略,二級共享Cache的寫策略是寫回,在此基礎上,通過作廢被修改的Cacheline的拷貝來維護兩級緩存之間數據一致性。 3、設計了流核心私有的一級數據緩存。在Microblaze軟核Cache模塊的基礎上,通過數據寬度64位擴展和增加支持一致性維護的邏輯電路,完成了流多核架構中的最內層緩存結構的設計。 4、基于Xilinx公司的軟件開發(fā)平臺下,對流多核存儲部件的關鍵邏輯設計進行了行為仿真,并進行了一定的性能分析。驗證結果顯示所有設計均實現(xiàn)了預定的功能,同時性能分析顯示了本文設計的有效性。
[Abstract]:The increasing demand for processor applications promotes the development of processor architecture and the birth of new processor architecture. Multi-core stream processor is a new type of multi-core processor for streaming data processing and streaming applications, which consists of a large number of simple cores. It has high data throughput and high resource utilization for computationally intensive applications, but it has poor performance for memory access intensive and sparse applications. Traditional multicore architecture is suitable for memory access intensive and sparse class applications. However, for stream applications, the Cache structure can not capture the data localization of stream applications efficiently. In order to meet the integrated requirements of streaming applications and traditional multi-core applications, and to integrate multi-core stream processors with traditional multi-core processors, we propose an isomorphic universal stream multi-core processor architecture, in which multiple isomorphic streams and multi-cores are integrated on a chip. Stream multicore can be configured as part of traditional multicore or stream multicore according to specific application. The main difference between traditional multi-core and streaming multi-core is memory access. The former is based on Cache structure and the latter is composed of register file and on-chip note memory. By configuring the shared on-chip storage resources within the stream multi-core and adjusting the proportion of the note memory and the Cache structure, the applicability of the isomorphic universal stream multi-core processor to various application requirements is realized. The Cache structure solves the temporal and spatial localization of the data for the traditional multi-core applications, and the note memory mainly captures the producer-consumer locality of the data in the stream application. In this paper, the key technologies of memory access components in convection multicore architecture are deeply studied. The main work and innovations are as follows: 1. A configurable shared SPM/L2Cache architecture is proposed. The application scope of isomorphic general flow processor includes traditional application and stream application. Its basic component, cell stream multi-core, can be run according to on-chip SMP execution mode and SIMT execution mode respectively when it is oriented to different applications. In order to meet the memory requirements of the processor, a data consistency maintenance protocol is designed for the characteristics of streaming multi-core on-chip buffer structure. Stream multi-core primary private Cache is write penetration strategy, secondary shared Cache write strategy is write back, on this basis, The data consistency between the two levels of cache is maintained by canceling the modified copy of Cacheline. 3. The primary data cache which is private to the stream core is designed. On the basis of Microblaze soft core Cache module, through the data width of 64-bit expansion and add support for consistency maintenance of the logic circuit, The design of the innermost buffer structure in the stream multi-core architecture is completed. 4. Based on the software development platform of Xilinx, the behavior simulation of the key logic design of the convection multi-core storage unit is carried out, and the performance analysis is given. The verification results show that all the designs achieve the intended function, and the performance analysis shows the effectiveness of the design.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP332
本文編號:2280127
[Abstract]:The increasing demand for processor applications promotes the development of processor architecture and the birth of new processor architecture. Multi-core stream processor is a new type of multi-core processor for streaming data processing and streaming applications, which consists of a large number of simple cores. It has high data throughput and high resource utilization for computationally intensive applications, but it has poor performance for memory access intensive and sparse applications. Traditional multicore architecture is suitable for memory access intensive and sparse class applications. However, for stream applications, the Cache structure can not capture the data localization of stream applications efficiently. In order to meet the integrated requirements of streaming applications and traditional multi-core applications, and to integrate multi-core stream processors with traditional multi-core processors, we propose an isomorphic universal stream multi-core processor architecture, in which multiple isomorphic streams and multi-cores are integrated on a chip. Stream multicore can be configured as part of traditional multicore or stream multicore according to specific application. The main difference between traditional multi-core and streaming multi-core is memory access. The former is based on Cache structure and the latter is composed of register file and on-chip note memory. By configuring the shared on-chip storage resources within the stream multi-core and adjusting the proportion of the note memory and the Cache structure, the applicability of the isomorphic universal stream multi-core processor to various application requirements is realized. The Cache structure solves the temporal and spatial localization of the data for the traditional multi-core applications, and the note memory mainly captures the producer-consumer locality of the data in the stream application. In this paper, the key technologies of memory access components in convection multicore architecture are deeply studied. The main work and innovations are as follows: 1. A configurable shared SPM/L2Cache architecture is proposed. The application scope of isomorphic general flow processor includes traditional application and stream application. Its basic component, cell stream multi-core, can be run according to on-chip SMP execution mode and SIMT execution mode respectively when it is oriented to different applications. In order to meet the memory requirements of the processor, a data consistency maintenance protocol is designed for the characteristics of streaming multi-core on-chip buffer structure. Stream multi-core primary private Cache is write penetration strategy, secondary shared Cache write strategy is write back, on this basis, The data consistency between the two levels of cache is maintained by canceling the modified copy of Cacheline. 3. The primary data cache which is private to the stream core is designed. On the basis of Microblaze soft core Cache module, through the data width of 64-bit expansion and add support for consistency maintenance of the logic circuit, The design of the innermost buffer structure in the stream multi-core architecture is completed. 4. Based on the software development platform of Xilinx, the behavior simulation of the key logic design of the convection multi-core storage unit is carried out, and the performance analysis is given. The verification results show that all the designs achieve the intended function, and the performance analysis shows the effectiveness of the design.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP332
【參考文獻】
相關期刊論文 前7條
1 鄧讓鈺;陳海燕;竇強;徐煒遐;謝倫國;戴澤福;李永進;夏軍;羅莉;張民選;;一種異構多核處理器的并行流存儲結構[J];電子學報;2009年02期
2 林宏;多處理器系統(tǒng)Cache一致性協(xié)議的探討[J];閩江學院學報;2004年02期
3 王光;沈緒榜;;多媒體流處理器中緩沖器的體系結構設計[J];北京航空航天大學學報;2006年01期
4 袁愛東,董建萍;基于目錄的一致性協(xié)議淺析[J];計算機工程;2004年S1期
5 潘國騰;竇強;謝倫國;;基于目錄的Cache一致性協(xié)議的可擴展性研究[J];計算機工程與科學;2008年06期
6 林一松;楊學軍;唐滔;王桂彬;徐新海;;一種基于并行度分析模型的GPU功耗優(yōu)化技術[J];計算機學報;2011年04期
7 薛燕,樊曉椏,李瑛;多處理機系統(tǒng)中數據Cache的一種優(yōu)化設計[J];微電子學與計算機;2004年12期
本文編號:2280127
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2280127.html
最近更新
教材專著