當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

CMP上結(jié)合bank一致性技術(shù)的NUCA任意步長(zhǎng)數(shù)據(jù)提升技術(shù)

發(fā)布時(shí)間：2019-03-11 16:03

【摘要】：目前,計(jì)算機(jī)已經(jīng)成為人們生活和工作必不可少的工具,在使用中,人們對(duì)計(jì)算機(jī)的要求也越來(lái)越高,希望計(jì)算機(jī)能擁有更高的處理速度、更大的存儲(chǔ)能力、更方便友好的使用方法等等。為了提高處理器的速度,制造商不斷的提高處理器的主頻,但是隨之而來(lái)的是更大的功耗,成為了處理器速度提高的瓶頸。在這種情況下,片上多核處理器CMP (Chip Multi-Processor)隨之誕生,它將多個(gè)處理器內(nèi)核集成在一個(gè)處理器芯片上以此來(lái)提高計(jì)算能力。CMP已經(jīng)成為市場(chǎng)的主流,對(duì)CMP處理芯片的研究也成為了必要。同時(shí),集成電路制造工藝迅速發(fā)展起來(lái),片上cache的容量被制造的越來(lái)越大,但隨著cache體積的增大,大容量片上cache的線延遲也隨著變長(zhǎng),越來(lái)越長(zhǎng)的線延遲對(duì)CPU的處理速度產(chǎn)生了很大的影響。因此,Kim C等人提出非一致性cache (NUCA),它允許cache的不同的bank具有不同的訪問(wèn)延遲,從而比之從前的一致性cache (UCA)具有更小的平均訪問(wèn)延遲。在動(dòng)態(tài)非一致cache (DNUCA)中,cache支持cache line(即數(shù)據(jù)塊)的遷移,即可以將被命中的數(shù)據(jù)向距離訪問(wèn)處理器更近的bank中移動(dòng),從而減少CPU再次訪問(wèn)同一個(gè)數(shù)據(jù)時(shí)的訪問(wèn)延遲。這種數(shù)據(jù)在cache中的移動(dòng)就叫做數(shù)據(jù)提升或塊遷移。數(shù)據(jù)提升需要找到目標(biāo)bank來(lái)存放要提升的數(shù)據(jù),但是目前的一些數(shù)據(jù)提升技術(shù)不考慮目標(biāo)bank的實(shí)際狀態(tài),并且采用的固定的提升步長(zhǎng),在數(shù)據(jù)提升的過(guò)程中,可能將目標(biāo)bank中更有用的數(shù)據(jù)替換出cache,或替換到離CPU更遠(yuǎn)的bank中,產(chǎn)生cache污染問(wèn)題,使得數(shù)據(jù)提升不能達(dá)到良好的作用。在CMP結(jié)構(gòu)的基礎(chǔ)上對(duì)提升技術(shù)進(jìn)行改進(jìn)的同時(shí),我們還要考慮一個(gè)重要的問(wèn)題,就是共享數(shù)據(jù)的問(wèn)題。多個(gè)核在一個(gè)芯片上,共享某一個(gè)L2級(jí)或L3級(jí)的cache,一定會(huì)有同時(shí)訪問(wèn)某個(gè)共享數(shù)據(jù)的情況產(chǎn)生。但數(shù)據(jù)提升技術(shù)就是要將當(dāng)前CPU訪問(wèn)的數(shù)據(jù)提升到離自己更近的bank中,來(lái)達(dá)到下次訪問(wèn)同一個(gè)數(shù)據(jù)的時(shí)候能更快的訪問(wèn)到。那么當(dāng)多個(gè)CPU訪問(wèn)同一個(gè)共享數(shù)據(jù)的時(shí)候,就會(huì)出現(xiàn)共享數(shù)據(jù)被“拉”到NUCA的中間部分中,從而限制了數(shù)據(jù)提升技術(shù)帶來(lái)的優(yōu)勢(shì)。因此,這里在提升技術(shù)的改進(jìn)中,結(jié)合了bank一致性技術(shù),就是允許共享數(shù)據(jù)在NUCA中擁有多個(gè)副本,每個(gè)副本屬于不同的CPU,再通過(guò)bank一致性技術(shù)來(lái)維護(hù)NUCA中不同副本數(shù)據(jù)的一致,從而解決數(shù)據(jù)的競(jìng)爭(zhēng)所帶來(lái)的問(wèn)題,提高了CPU訪問(wèn)共享數(shù)據(jù)的速度。維護(hù)數(shù)據(jù)一致性需要記錄數(shù)據(jù)的不同狀態(tài),而本文提出的數(shù)據(jù)提升策略則剛好利用cache line的不同狀態(tài)來(lái)選擇將要遷移到的目標(biāo)bank,從而提出了一種CMP上的結(jié)合了bank一致性的任意步長(zhǎng)數(shù)據(jù)提升技術(shù)。本文首先對(duì)研究背景和相關(guān)的技術(shù)進(jìn)行了簡(jiǎn)單的介紹,又介紹了幾種系統(tǒng)結(jié)構(gòu)研究方面的幾種基本的仿真工具,并詳細(xì)介紹了本文所用的仿真工具Simics.然后,對(duì)現(xiàn)有的固定步長(zhǎng)的數(shù)據(jù)提升技術(shù)及其問(wèn)題進(jìn)行了介紹,介紹了本文結(jié)合的bank一致性技術(shù)。之后,詳細(xì)地描述了本文所提出的CMP上的結(jié)合了bank一致性的任意步長(zhǎng)數(shù)據(jù)提升技術(shù)。最后,利用全系統(tǒng)仿真,使用NAS Parallel Benchmark (NPB)基準(zhǔn)測(cè)試程序,對(duì)該技術(shù)進(jìn)行了測(cè)試,并且得到了理想的試驗(yàn)結(jié)果。該技術(shù)能有效降低處理器訪問(wèn)共享cache的訪問(wèn)延遲,相比Kim C等人提出的設(shè)計(jì)使IPC平均提高了8.19%,減少了提升發(fā)生的次數(shù),改善了系統(tǒng)性能。
[Abstract]:At present, the computer has become an indispensable tool for people's life and work. In use, people's demands on the computer are getting higher and higher, and it is hoped that the computer can have higher processing speed, more storage capacity, more convenient and friendly use method and so on. In order to improve the speed of the processor, the manufacturer keeps increasing the processor's main frequency, but it comes with more power consumption and becomes the bottleneck of the processor's speed. In this case, on-chip multi-core processor CMP (Chip Multi-Processor) is born, which integrates multiple processor cores on a processor chip to improve computing power. CMP has become the mainstream of the market, and the research of the CMP processing chip is also necessary. At the same time, the manufacturing process of the integrated circuit is rapidly developed, and the capacity of the on-chip cache is more and more large, but with the increase of the cache volume, the line delay of the high-capacity on-chip cache also increases with the increase of the cache volume, and the increasing line delay has a great effect on the processing speed of the CPU. In response, Kim C et al. proposed a non-consistent cache (NUCA), which allows different banks of cache to have different access delays, thus having a smaller average access delay than the previous consistency cache (UCA) late. In dynamic non-consistent cache (DNUCA), cache supports the migration of cache line (i.e., data block), that is, the hit data can be moved to the bank closer to the access processor, thereby reducing the follow-up of the CPU when the same data is accessed again by the CPU Ask for a delay. The movement of this kind of data in cache is called data promotion or Block migration. The data upgrade requires the target bank to be found to store the data to be upgraded, but some of the current data-lifting techniques do not take into account the actual state of the target bank, and the fixed lift steps used are likely to replace the more useful data in the target bank during the data upgrade cache, or replace to a bank farther from the CPU, cause cache pollution problems, so that data enhancement cannot be reached Good effect. On the basis of the structure of the CMP, we need to consider an important issue, namely, the improvement of the lifting technology, that is, The problem of sharing data. Multiple cores on a single chip share a cache of an L2 or L3 level and will have access to a share at the same time Data is the case. But the data-raising technology is to raise the data accessed by the current CPU to the bank that is closer to its own, to reach the same data next time faster access to. Then, when multiple CPUs access the same shared data, the shared data is "Lula" into the middle of the NUCA, thereby limiting the data promotion The benefits of technology. So, in the improvement of the upgrade technology, the bank consistency technology is combined to allow shared data to have multiple copies in the NUCA, each of which belongs to a different CPU, and is maintained in the NUCA by the bank consistency technology The consistency of the data of the different copies, thus solving the problems caused by the competition of the data, and improving the CPU. The speed of the access to the shared data. The consistency of the maintenance data needs to record the different states of the data, and the data promotion strategy proposed in this paper just uses the different states of the cache line to select the target bank to be migrated, so that the consistency of bank is proposed. In this paper, a brief introduction to the research background and the related technologies is given, and several basic simulation tools for the research of the system structure are introduced, and the paper is introduced in detail. Simics, a simulation tool, is introduced and the existing fixed step size data lifting technology and its problems are introduced in this paper. After the combined bank consistency, the combined bank-one on the CMP is described in detail. And finally, using the whole system simulation, the NAS Parallel Benchmark (NPB) benchmark test program is used to carry out the technology. The technology can effectively reduce the access delay of the access shared cache by the processor. Compared with the design made by Kim C and the like, the average of the IPC is increased by 8.19%, and the result is reduced.
【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP332

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 劉磊;;對(duì)片上多核系統(tǒng)的系統(tǒng)結(jié)構(gòu)的研究[J];電腦知識(shí)與技術(shù);2008年29期

2 喻之斌;金海;;多核處理器體系結(jié)構(gòu)軟件仿真技術(shù)：研究綜述[J];計(jì)算機(jī)科學(xué);2007年10期

3 何軍;王飆;;多核處理器的結(jié)構(gòu)設(shè)計(jì)研究[J];計(jì)算機(jī)工程;2007年16期

4 黃安文;高軍;張民選;;多核處理器片上存儲(chǔ)系統(tǒng)研究[J];計(jì)算機(jī)工程;2010年04期

5 吳俊杰;潘曉輝;楊學(xué)軍;;面向非一致Cache的智能多跳提升技術(shù)[J];計(jì)算機(jī)學(xué)報(bào);2009年10期

6 王軍;高速緩沖存儲(chǔ)器Cache簡(jiǎn)介[J];計(jì)算機(jī)與通信;1997年10期

7 吳俊杰;潘曉輝;;面向多核NUCA共享數(shù)據(jù)競(jìng)爭(zhēng)問(wèn)題的Bank一致性技術(shù)[J];計(jì)算機(jī)工程與科學(xué);2009年11期

8 吳俊杰;楊學(xué)軍;;非一致Cache體系結(jié)構(gòu)技術(shù)綜述[J];計(jì)算機(jī)工程與科學(xué);2011年02期

9 高翔;張福新;湯彥;章隆兵;胡偉武;唐志敏;;基于龍芯CPU的多核全系統(tǒng)模擬器SimOS-Goodson[J];軟件學(xué)報(bào);2007年04期

10 黃琨;馬可;曾洪博;張戈;章隆兵;;一種分片式多核處理器的用戶級(jí)模擬器[J];軟件學(xué)報(bào);2008年04期

相關(guān)重要報(bào)紙文章前2條

1 江南計(jì)算技術(shù)研究所王飆陳皖蘇;[N];計(jì)算機(jī)世界;2006年

2 阿戈;[N];中國(guó)計(jì)算機(jī)報(bào);2007年

相關(guān)碩士學(xué)位論文前5條

1 曹皓;多核處理器體系結(jié)構(gòu)下Linux調(diào)度機(jī)制的研究[D];內(nèi)蒙古大學(xué);2011年

2 劉佳;多核結(jié)構(gòu)下片內(nèi)存儲(chǔ)系統(tǒng)的模型模擬技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2010年

3 史莉雯;雙核處理器多級(jí)Cache的研究[D];西北工業(yè)大學(xué);2007年

4 信磊;對(duì)稱多核處理器中Cache一致性的研究與實(shí)現(xiàn)[D];合肥工業(yè)大學(xué);2007年

5 蔣海濤;CMP體系結(jié)構(gòu)的L2 Cache替換算法研究[D];重慶大學(xué);2008年

，

本文編號(hào)：2438420

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2438420.html

上一篇：數(shù)據(jù)中心網(wǎng)絡(luò)中擁塞現(xiàn)象和不公平現(xiàn)象的分析和改進(jìn)
下一篇：可重構(gòu)計(jì)算體系結(jié)構(gòu)及應(yīng)用研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

CMP上結(jié)合bank一致性技術(shù)的NUCA任意步長(zhǎng)數(shù)據(jù)提升技術(shù)