片上多核處理器緩存子系統(tǒng)優(yōu)化的研究
[Abstract]:the current slice-on-chip multi-core processor requires a high-capacity caching system to reduce the performance gap between a fast processor and a slow chip. It is considered that the performance and power consumption of the cache sub-system can be optimized by using and digging the characteristics of the multi-core processor on the chip. In this paper, the mechanism of multi-core processor cache sub-system performance on several optimization slices is studied. In particular, the research topic in this paper includes three aspects: 1) research and design efficient multicast routing algorithm to improve the performance of the network on the chip; 2) use the current new non-volatile memory to design a low-power cache system for the multi-core processor on the chip; and 3) mining the progress information of the utilization thread to design a more efficient cache coherence protocol. For the first subject of the study, we propose a high-efficiency, on-chip, network-multicast routing machine for multi-core processors with more and more cores, the on-chip network provides an efficient, scalable communication infrastructure architecture. For an on-chip network under a multi-core architecture, a large number of communication modes are common. Without the support of a valid multicast routing mechanism, conventional unicast-based on-chip networks are inefficient in handling these multicast communications This paper presents a network-based multicast routing mechanism, called DPM. DPM can effectively reduce the average transmission delay of the network packets in the network and reduce the work of the network on the chip in particular, DPM can dynamically route that route in accordance with the load balance level in the current network and the link share characteristics of the multicast communication The second subject of this paper is to use a new non-volatile memory (spin transfer moment random access memory, STT-RAM) to design low power consumption for multi-core processors on the chip The cache. STT-RAM has a fast access speed, a high storage density, and a negligible drain however, large-scale application of STT-RAM as that cache of the multi-core processor is subject to a longer write delay of the STT-RAM and high write power consumption The recent study has shown that the data retention time of a memory cell (magnetic tunnel junction MTJ) that has reduced the STT-RAM can effectively increase it Write performance. However, the STT-RAM with reduced retention time is easy to lose, and it is necessary to avoid the number by periodically refreshing its storage unit It is lost. When such STT-RAM is used for the last-level cache (LLC) of a multi-core, frequent refresh operations will also negatively impact the performance of the system while increasing energy consumption The text provides a high-efficiency refresh scheme (CCear) that minimizes the brush on this class of STT-RAM The new operation. The CCear eliminates unnecessary brush by interacting with the cache coherency protocol and the cache management algorithm New operation. Finally, we put forward an efficient consistency protocol adjustment mechanism to optimize the parallelism of the multi-core processor running on the chip The performance of the program. One of the main objectives of the multi-core processor on the chip is to continue to improve the application by digging the parallelism of the thread level the performance of a program. However, for a multi-threaded program running on this class of systems, different threads typically present different threads due to the non-uniform task assignment and the collision of the shared resource The progress of the execution of the progress. The non-uniformity of this progress is the maximum of the multi-threaded program performance One of the bottlenecks in a multi-threaded program, such as a memory barrier and a lock, and the kernel running a thread with a faster progress must stop and wait for entry a relatively slow core. Such an air, etc., will not only reduce the performance of the system, but also This paper presents a thread progress-aware consistency adjusting mechanism, called TEACA. The TEACA dynamically adjusts the consistency of each thread with the thread's progress information. The purpose of this paper is to improve the utilization efficiency of network bandwidth resources on the slice. in particular, that TEACA divide the thread into two types: leader thread and the latter thread. The TEACA then provides a specific request for its consistency request based on the thread's class information
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP332
【共引文獻(xiàn)】
相關(guān)期刊論文 前4條
1 劉軼;吳名瑜;王永會(huì);錢德沛;;一種硬件事務(wù)存儲(chǔ)系統(tǒng)中的事務(wù)嵌套處理方案[J];電子學(xué)報(bào);2014年01期
2 Muhammad Abid Mughal;Hai-Xia Wang;Dong-Sheng Wang;;The Case of Using Multiple Streams in Streaming[J];International Journal of Automation and Computing;2013年06期
3 張駿;田澤;梅魁志;趙季中;;基于節(jié)點(diǎn)預(yù)測(cè)的直接Cache一致性協(xié)議[J];計(jì)算機(jī)學(xué)報(bào);2014年03期
4 馮超超;張民選;李晉文;戴藝;;一種可配置雙向鏈路的片上網(wǎng)絡(luò)容錯(cuò)偏轉(zhuǎn)路由器[J];計(jì)算機(jī)研究與發(fā)展;2014年02期
相關(guān)博士學(xué)位論文 前5條
1 王慶;面向嵌入式多核系統(tǒng)的并行程序優(yōu)化技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年
2 朱素霞;面向多核處理器確定性重演的內(nèi)存競(jìng)爭(zhēng)記錄機(jī)制研究[D];哈爾濱工業(yè)大學(xué);2013年
3 楊兵;分簇超標(biāo)量處理器關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2009年
4 馮超超;片上網(wǎng)絡(luò)無緩沖路由器關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2012年
5 陳銳忠;非對(duì)稱多核處理器的若干調(diào)度問題研究[D];華南理工大學(xué);2013年
相關(guān)碩士學(xué)位論文 前5條
1 閔銀皮;同構(gòu)通用流多核處理器存儲(chǔ)部件關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2012年
2 張岐;基于CMP的硬件事務(wù)存儲(chǔ)系統(tǒng)優(yōu)化技術(shù)研究[D];哈爾濱工程大學(xué);2013年
3 張杰;基于CMP的共享L2Cache管理策略研究[D];哈爾濱工程大學(xué);2013年
4 馬超;徽商銀行基金代銷自動(dòng)賬戶系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];大連理工大學(xué);2013年
5 王勛;面向非易失存儲(chǔ)器PCM的節(jié)能技術(shù)研究[D];浙江工業(yè)大學(xué);2013年
,本文編號(hào):2412699
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2412699.html