一種無目錄的共享高速緩存一致性協(xié)議
發(fā)布時(shí)間:2018-08-24 20:31
【摘要】:針對(duì)使用目錄記錄各共享緩存塊在各核心的私有備份信息的多核和眾核并行系統(tǒng)共享高速緩存一致性協(xié)議因使用目錄造成性能下降的問題進(jìn)行了研究。研究發(fā)現(xiàn),實(shí)際應(yīng)用的多核和眾核系統(tǒng)可以不存儲(chǔ)共享緩存塊的共享信息,因?yàn)槎嗪撕捅姾讼到y(tǒng)大都采用弱一致性協(xié)議,根據(jù)這種協(xié)議,某個(gè)核心的寫操作不需要立即被其他核心觀察到,可以延遲到下一個(gè)同步點(diǎn)觀察到。基于這一發(fā)現(xiàn),提出了一種不用記錄共享信息的無目錄的(DirectoryLess)共享高速緩存(Shared cache)一致性協(xié)議,簡稱DLS協(xié)議。該協(xié)議通過在同步點(diǎn)對(duì)不確定是否被其他核心更改的緩存塊主動(dòng)無效的方法,在不需要存儲(chǔ)共享信息的目錄的情況下來保證多核系統(tǒng)符合弱一致性。用并行程序測(cè)試集SPLASH-2對(duì)一個(gè)16核處理器進(jìn)行了試驗(yàn),試驗(yàn)結(jié)果表明,相比基于目錄的MESI協(xié)議,DLS不僅可以完全消除目錄及其電路面積,而且可平均提高11.08%的程序性能,減少28.83%的片上網(wǎng)絡(luò)通訊,以及減少15.65%的功耗。而這一切,只需要改變處理器的設(shè)計(jì),并不需要改變編程語言和編譯器,因此,該協(xié)議無需更改或重新編譯即可以兼容現(xiàn)有的代碼。
[Abstract]:In this paper, the performance degradation caused by using directory to record private backup information of each shared cache block in multi-core and multi-core parallel systems is studied. It is found that practical multi-core and multi-core systems can not store the shared information of the shared buffer block, because most of the multi-core and multi-core systems adopt weak consistency protocol, according to this protocol, A core write operation does not need to be immediately observed by other cores and can be delayed to the next synchronization point. Based on this discovery, a directory free (DirectoryLess) shared cache (Shared cache) conformance protocol (DLS protocol) is proposed. This protocol can guarantee the weak consistency of multi-core systems by initiatively invalidating buffer blocks that are uncertain whether or not they are changed by other cores at the synchronization point without having to store directories with shared information. A 16-core processor is tested with parallel program test set (SPLASH-2). The experimental results show that compared with MESI protocol based on directory, it can not only completely eliminate the directory and its circuit area, but also improve the program performance by 11.08% on average. Reduction of 28.83% in on-chip network communications and 15.65% in power consumption. All this needs to change the design of the processor, not the programming language and compiler, so the protocol can be compatible with existing code without changing or recompiling.
【作者單位】: 計(jì)算機(jī)體系結(jié)構(gòu)國家重點(diǎn)實(shí)驗(yàn)室;中國科學(xué)院計(jì)算技術(shù)研究所;中國科學(xué)院研究生院;
【基金】:國家自然科學(xué)基金(61100163,61133004,61222204,61221062,61303158,61432016,61472396,61473275) 863計(jì)劃(2012AA012202) 中國科學(xué)院戰(zhàn)略性先導(dǎo)科技專項(xiàng)(XDA06010403),中國科學(xué)院國際合作(171111KYSB20130002)資助項(xiàng)目
【分類號(hào)】:TP333
,
本文編號(hào):2201957
[Abstract]:In this paper, the performance degradation caused by using directory to record private backup information of each shared cache block in multi-core and multi-core parallel systems is studied. It is found that practical multi-core and multi-core systems can not store the shared information of the shared buffer block, because most of the multi-core and multi-core systems adopt weak consistency protocol, according to this protocol, A core write operation does not need to be immediately observed by other cores and can be delayed to the next synchronization point. Based on this discovery, a directory free (DirectoryLess) shared cache (Shared cache) conformance protocol (DLS protocol) is proposed. This protocol can guarantee the weak consistency of multi-core systems by initiatively invalidating buffer blocks that are uncertain whether or not they are changed by other cores at the synchronization point without having to store directories with shared information. A 16-core processor is tested with parallel program test set (SPLASH-2). The experimental results show that compared with MESI protocol based on directory, it can not only completely eliminate the directory and its circuit area, but also improve the program performance by 11.08% on average. Reduction of 28.83% in on-chip network communications and 15.65% in power consumption. All this needs to change the design of the processor, not the programming language and compiler, so the protocol can be compatible with existing code without changing or recompiling.
【作者單位】: 計(jì)算機(jī)體系結(jié)構(gòu)國家重點(diǎn)實(shí)驗(yàn)室;中國科學(xué)院計(jì)算技術(shù)研究所;中國科學(xué)院研究生院;
【基金】:國家自然科學(xué)基金(61100163,61133004,61222204,61221062,61303158,61432016,61472396,61473275) 863計(jì)劃(2012AA012202) 中國科學(xué)院戰(zhàn)略性先導(dǎo)科技專項(xiàng)(XDA06010403),中國科學(xué)院國際合作(171111KYSB20130002)資助項(xiàng)目
【分類號(hào)】:TP333
,
本文編號(hào):2201957
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2201957.html
最近更新
教材專著