面向多線程應(yīng)用的Cache優(yōu)化策略及并行模擬研究
發(fā)布時(shí)間:2019-06-24 14:55
【摘要】:片上多核處理器(Chip Multi-Processor, CMP)相對(duì)于傳統(tǒng)的單核處理器具有復(fù)雜度小、擴(kuò)展性好以及性價(jià)比高等優(yōu)勢(shì),在工藝和應(yīng)用等因素的推動(dòng)下,CMP已經(jīng)成為高性能微處理器的發(fā)展潮流。多核處理器設(shè)計(jì)復(fù)雜度和性能瓶頸大部分集中在片內(nèi)存儲(chǔ)系統(tǒng)上,提高緩存(Cache)命中率、避免延時(shí)較大的片外訪存對(duì)系統(tǒng)的整體性能至關(guān)重要,因此片上層次Cache系統(tǒng)已成為多核處理器的研究重點(diǎn)之一。目前學(xué)術(shù)界對(duì)CMP的緩存優(yōu)化做了很多工作,但這些工作大部分是面向多道程序的,對(duì)于多線程應(yīng)用程序,已有的Cache優(yōu)化技術(shù)是否能提高程序性能或者說如何提高性能,依然是開放的問題。本文的研究主要針對(duì)多核處理器的Cache性能優(yōu)化及并行模擬展開,論文的貢獻(xiàn)與創(chuàng)新點(diǎn)如下: 1.研究了分片式多核處理器的緩存優(yōu)化機(jī)制。在分片式片上多核處理器中,每個(gè)分片之間的通信流量和二級(jí)Cache的容量利用率都存在不均衡的現(xiàn)象。針對(duì)這一現(xiàn)象,本文提出一種面向多線程應(yīng)用程序的自適應(yīng)復(fù)制策略ARP,綜合私有二級(jí)Cache和共享Cache的優(yōu)點(diǎn),通過周期性的權(quán)衡Cache數(shù)據(jù)復(fù)制帶來的收益與消耗,動(dòng)態(tài)地控制數(shù)據(jù)在二級(jí)Cache之間的復(fù)制數(shù)量。實(shí)驗(yàn)表明,在16核的配置中,ARP機(jī)制在最好情況下能降低52%的網(wǎng)絡(luò)流量,提高容量利用率到58%,此外在優(yōu)化平均訪問距離方面也有較好效果。 2.研究了面向多線程應(yīng)用的基于效用的緩存優(yōu)化策略。傳統(tǒng)的緩存劃分方案大多是面向多道程序的,忽略了多線程負(fù)載中共享數(shù)據(jù)和私有數(shù)據(jù)訪問模式的差別,使得共享數(shù)據(jù)的使用效率降低。針對(duì)多線程程序中不同類型數(shù)據(jù)的訪問特性,本文提出了一種面向多線程程序的Cache管理機(jī)制UPP,通過監(jiān)控共享Cache中共享、私有數(shù)據(jù)的效用信息為每個(gè)線程以及共享數(shù)據(jù)分配Cache空間,再結(jié)合改進(jìn)后的數(shù)據(jù)插入、提升策略,達(dá)到數(shù)據(jù)總體效用最大化、過濾低重用數(shù)據(jù)等目的。實(shí)驗(yàn)表明,UPP性能相對(duì)于基于LRU的純共享Cache結(jié)構(gòu)、基于公平的靜態(tài)Cache劃分結(jié)構(gòu)性能的提升約為4.5%和5.2%。 3.研究了多核處理器的并行模擬技術(shù)。隨著片上多核處理器(CMP)中處理器核數(shù)目及核之間互聯(lián)復(fù)雜度的增加,多核處理器模擬器將變得更加龐大、復(fù)雜、緩慢。針對(duì)這一問題,本文利用多線程技術(shù)開發(fā)了一種模塊化、可擴(kuò)展的并行仿真模塊ParaNSim,既可以作為獨(dú)立的片上網(wǎng)絡(luò)模擬器使用,也可以添加其它模塊作為分片式CMP模擬器或者嵌入其它模擬器中作為一個(gè)子模塊使用。實(shí)驗(yàn)表明,ParaNSim在4個(gè)子線程和8個(gè)子線程的配置下分別能取得1.44和2.42倍的最高加速比。
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文編號(hào):2505146
[Abstract]:Compared with the traditional single-core processor, on-chip multi-core processor (Chip Multi-Processor, CMP) has the advantages of low complexity, good expansibility and high performance-price ratio. CMP has become the development trend of high-performance microprocessor driven by process and application. The design complexity and performance bottleneck of multi-core processor are mostly concentrated on-chip storage system. Improving the hit ratio of cache (Cache) and avoiding off-chip access with large delay are very important to the overall performance of the system. Therefore, on-chip hierarchical Cache system has become one of the research priorities of multi-core processor. At present, the academic circles have done a lot of work on the cache optimization of CMP, but most of these work is oriented to multi-program. For multithreaded applications, whether the existing Cache optimization technology can improve the performance of the program or how to improve the performance is still an open question. The research in this paper is mainly aimed at the performance optimization and parallel simulation of multi-core processor Cache. The contributions and innovations of this paper are as follows: 1. The cache optimization mechanism of split multi-core processor is studied. In the sliced on-chip multi-core processor, the communication traffic between each slice and the capacity utilization of the two-level Cache are uneven. In view of this phenomenon, this paper proposes an adaptive replication strategy for multithreaded applications, ARP, which combines the advantages of private secondary Cache and shared Cache, and dynamically controls the number of replication between secondary Cache by periodically weighing the benefits and consumption of Cache data replication. The experimental results show that in the 16-core configuration, the ARP mechanism can reduce the network traffic by 52% and increase the capacity utilization to 58%. In addition, it also has a good effect in optimizing the average access distance. two銆,
本文編號(hào):2505146
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2505146.html
最近更新
教材專著