天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 計算機論文 >

CMP上結合bank一致性技術的NUCA任意步長數(shù)據(jù)提升技術

發(fā)布時間:2019-03-11 16:03
【摘要】:目前,計算機已經(jīng)成為人們生活和工作必不可少的工具,在使用中,人們對計算機的要求也越來越高,希望計算機能擁有更高的處理速度、更大的存儲能力、更方便友好的使用方法等等。為了提高處理器的速度,制造商不斷的提高處理器的主頻,但是隨之而來的是更大的功耗,成為了處理器速度提高的瓶頸。在這種情況下,片上多核處理器CMP (Chip Multi-Processor)隨之誕生,它將多個處理器內(nèi)核集成在一個處理器芯片上以此來提高計算能力。CMP已經(jīng)成為市場的主流,對CMP處理芯片的研究也成為了必要。 同時,集成電路制造工藝迅速發(fā)展起來,片上cache的容量被制造的越來越大,但隨著cache體積的增大,大容量片上cache的線延遲也隨著變長,越來越長的線延遲對CPU的處理速度產(chǎn)生了很大的影響。因此,Kim C等人提出非一致性cache (NUCA),它允許cache的不同的bank具有不同的訪問延遲,從而比之從前的一致性cache (UCA)具有更小的平均訪問延遲。 在動態(tài)非一致cache (DNUCA)中,cache支持cache line(即數(shù)據(jù)塊)的遷移,即可以將被命中的數(shù)據(jù)向距離訪問處理器更近的bank中移動,從而減少CPU再次訪問同一個數(shù)據(jù)時的訪問延遲。這種數(shù)據(jù)在cache中的移動就叫做數(shù)據(jù)提升或塊遷移。 數(shù)據(jù)提升需要找到目標bank來存放要提升的數(shù)據(jù),但是目前的一些數(shù)據(jù)提升技術不考慮目標bank的實際狀態(tài),并且采用的固定的提升步長,在數(shù)據(jù)提升的過程中,可能將目標bank中更有用的數(shù)據(jù)替換出cache,或替換到離CPU更遠的bank中,產(chǎn)生cache污染問題,使得數(shù)據(jù)提升不能達到良好的作用。 在CMP結構的基礎上對提升技術進行改進的同時,我們還要考慮一個重要的問題,就是共享數(shù)據(jù)的問題。多個核在一個芯片上,共享某一個L2級或L3級的cache,一定會有同時訪問某個共享數(shù)據(jù)的情況產(chǎn)生。但數(shù)據(jù)提升技術就是要將當前CPU訪問的數(shù)據(jù)提升到離自己更近的bank中,來達到下次訪問同一個數(shù)據(jù)的時候能更快的訪問到。那么當多個CPU訪問同一個共享數(shù)據(jù)的時候,就會出現(xiàn)共享數(shù)據(jù)被“拉”到NUCA的中間部分中,從而限制了數(shù)據(jù)提升技術帶來的優(yōu)勢。 因此,這里在提升技術的改進中,結合了bank一致性技術,就是允許共享數(shù)據(jù)在NUCA中擁有多個副本,每個副本屬于不同的CPU,再通過bank一致性技術來維護NUCA中不同副本數(shù)據(jù)的一致,從而解決數(shù)據(jù)的競爭所帶來的問題,提高了CPU訪問共享數(shù)據(jù)的速度。 維護數(shù)據(jù)一致性需要記錄數(shù)據(jù)的不同狀態(tài),而本文提出的數(shù)據(jù)提升策略則剛好利用cache line的不同狀態(tài)來選擇將要遷移到的目標bank,從而提出了一種CMP上的結合了bank一致性的任意步長數(shù)據(jù)提升技術。 本文首先對研究背景和相關的技術進行了簡單的介紹,又介紹了幾種系統(tǒng)結構研究方面的幾種基本的仿真工具,并詳細介紹了本文所用的仿真工具Simics.然后,對現(xiàn)有的固定步長的數(shù)據(jù)提升技術及其問題進行了介紹,介紹了本文結合的bank一致性技術。之后,詳細地描述了本文所提出的CMP上的結合了bank一致性的任意步長數(shù)據(jù)提升技術。 最后,利用全系統(tǒng)仿真,使用NAS Parallel Benchmark (NPB)基準測試程序,對該技術進行了測試,并且得到了理想的試驗結果。該技術能有效降低處理器訪問共享cache的訪問延遲,相比Kim C等人提出的設計使IPC平均提高了8.19%,減少了提升發(fā)生的次數(shù),改善了系統(tǒng)性能。
[Abstract]:At present, the computer has become an indispensable tool for people's life and work. In use, people's demands on the computer are getting higher and higher, and it is hoped that the computer can have higher processing speed, more storage capacity, more convenient and friendly use method and so on. In order to improve the speed of the processor, the manufacturer keeps increasing the processor's main frequency, but it comes with more power consumption and becomes the bottleneck of the processor's speed. In this case, on-chip multi-core processor CMP (Chip Multi-Processor) is born, which integrates multiple processor cores on a processor chip to improve computing power. CMP has become the mainstream of the market, and the research of the CMP processing chip is also necessary. At the same time, the manufacturing process of the integrated circuit is rapidly developed, and the capacity of the on-chip cache is more and more large, but with the increase of the cache volume, the line delay of the high-capacity on-chip cache also increases with the increase of the cache volume, and the increasing line delay has a great effect on the processing speed of the CPU. In response, Kim C et al. proposed a non-consistent cache (NUCA), which allows different banks of cache to have different access delays, thus having a smaller average access delay than the previous consistency cache (UCA) late. In dynamic non-consistent cache (DNUCA), cache supports the migration of cache line (i.e., data block), that is, the hit data can be moved to the bank closer to the access processor, thereby reducing the follow-up of the CPU when the same data is accessed again by the CPU Ask for a delay. The movement of this kind of data in cache is called data promotion or Block migration. The data upgrade requires the target bank to be found to store the data to be upgraded, but some of the current data-lifting techniques do not take into account the actual state of the target bank, and the fixed lift steps used are likely to replace the more useful data in the target bank during the data upgrade cache, or replace to a bank farther from the CPU, cause cache pollution problems, so that data enhancement cannot be reached Good effect. On the basis of the structure of the CMP, we need to consider an important issue, namely, the improvement of the lifting technology, that is, The problem of sharing data. Multiple cores on a single chip share a cache of an L2 or L3 level and will have access to a share at the same time Data is the case. But the data-raising technology is to raise the data accessed by the current CPU to the bank that is closer to its own, to reach the same data next time faster access to. Then, when multiple CPUs access the same shared data, the shared data is "Lula" into the middle of the NUCA, thereby limiting the data promotion The benefits of technology. So, in the improvement of the upgrade technology, the bank consistency technology is combined to allow shared data to have multiple copies in the NUCA, each of which belongs to a different CPU, and is maintained in the NUCA by the bank consistency technology The consistency of the data of the different copies, thus solving the problems caused by the competition of the data, and improving the CPU. The speed of the access to the shared data. The consistency of the maintenance data needs to record the different states of the data, and the data promotion strategy proposed in this paper just uses the different states of the cache line to select the target bank to be migrated, so that the consistency of bank is proposed. In this paper, a brief introduction to the research background and the related technologies is given, and several basic simulation tools for the research of the system structure are introduced, and the paper is introduced in detail. Simics, a simulation tool, is introduced and the existing fixed step size data lifting technology and its problems are introduced in this paper. After the combined bank consistency, the combined bank-one on the CMP is described in detail. And finally, using the whole system simulation, the NAS Parallel Benchmark (NPB) benchmark test program is used to carry out the technology. The technology can effectively reduce the access delay of the access shared cache by the processor. Compared with the design made by Kim C and the like, the average of the IPC is increased by 8.19%, and the result is reduced.
【學位授予單位】:吉林大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP332

【參考文獻】

相關期刊論文 前10條

1 劉磊;;對片上多核系統(tǒng)的系統(tǒng)結構的研究[J];電腦知識與技術;2008年29期

2 喻之斌;金海;;多核處理器體系結構軟件仿真技術:研究綜述[J];計算機科學;2007年10期

3 何軍;王飆;;多核處理器的結構設計研究[J];計算機工程;2007年16期

4 黃安文;高軍;張民選;;多核處理器片上存儲系統(tǒng)研究[J];計算機工程;2010年04期

5 吳俊杰;潘曉輝;楊學軍;;面向非一致Cache的智能多跳提升技術[J];計算機學報;2009年10期

6 王軍;高速緩沖存儲器Cache簡介[J];計算機與通信;1997年10期

7 吳俊杰;潘曉輝;;面向多核NUCA共享數(shù)據(jù)競爭問題的Bank一致性技術[J];計算機工程與科學;2009年11期

8 吳俊杰;楊學軍;;非一致Cache體系結構技術綜述[J];計算機工程與科學;2011年02期

9 高翔;張福新;湯彥;章隆兵;胡偉武;唐志敏;;基于龍芯CPU的多核全系統(tǒng)模擬器SimOS-Goodson[J];軟件學報;2007年04期

10 黃琨;馬可;曾洪博;張戈;章隆兵;;一種分片式多核處理器的用戶級模擬器[J];軟件學報;2008年04期

相關重要報紙文章 前2條

1 江南計算技術研究所 王飆 陳皖蘇;[N];計算機世界;2006年

2 阿戈;[N];中國計算機報;2007年

相關碩士學位論文 前5條

1 曹皓;多核處理器體系結構下Linux調(diào)度機制的研究[D];內(nèi)蒙古大學;2011年

2 劉佳;多核結構下片內(nèi)存儲系統(tǒng)的模型模擬技術研究[D];國防科學技術大學;2010年

3 史莉雯;雙核處理器多級Cache的研究[D];西北工業(yè)大學;2007年

4 信磊;對稱多核處理器中Cache一致性的研究與實現(xiàn)[D];合肥工業(yè)大學;2007年

5 蔣海濤;CMP體系結構的L2 Cache替換算法研究[D];重慶大學;2008年

,

本文編號:2438420

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2438420.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶22c88***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美黑人精品一区二区在线| 久草视频这里只是精品| 色婷婷成人精品综合一区| 中文字幕有码视频熟女| 国产成人精品资源在线观看| 国产精品免费视频视频| 91福利视频日本免费看看| 亚洲午夜福利不卡片在线| 国产成人精品午夜福利| 色哟哟精品一区二区三区| 中文字幕亚洲在线一区| 在线九月婷婷丁香伊人| 手机在线不卡国产视频| 国产老熟女超碰一区二区三区| 日本av在线不卡一区| 91人妻人澡人人爽人人精品| 日本婷婷色大香蕉视频在线观看| 午夜福利直播在线视频| 偷自拍亚洲欧美一区二页| 国产水滴盗摄一区二区| 国产对白老熟女正在播放| 成人国产激情在线视频| 国产午夜精品福利免费不| 国产精品偷拍一区二区| 嫩草国产福利视频一区二区| 成年人免费看国产视频| 中文字幕一区二区三区大片| 中文字幕日韩一区二区不卡 | 亚洲欧美日韩综合在线成成| 欧美日韩国产精品自在自线| 日韩精品一区二区一牛| 成人日韩视频中文字幕| 天海翼高清二区三区在线| 国产精品香蕉在线的人| 粉嫩内射av一区二区| 国产伦精品一区二区三区高清版| 有坂深雪中文字幕亚洲中文| 亚洲专区中文字幕在线| 中文字幕一区二区久久综合| 国产伦精品一区二区三区高清版| 国产99久久精品果冻传媒|