基于2.5D封裝系統(tǒng)的存儲型計算研究
發(fā)布時間:2018-11-06 09:05
【摘要】:對于數(shù)據(jù)密集型應(yīng)用,大量能量和延時消耗在計算和存儲單元之間的數(shù)據(jù)傳輸上,造成馮·諾依曼瓶頸。在采用2.5D封裝集成的系統(tǒng)中,這一問題依然存在。為此,提出一種新型的硬件加速方案。引入存儲型計算到2.5D系統(tǒng)中,使片外存儲具備運算的能力。將存儲器劃分為若干個bank,支持bank間并行訪問,并在存儲陣列中設(shè)計可配置的加速單元,充分利用存儲陣列的帶寬進行并行計算,降低數(shù)據(jù)傳輸?shù)难訒r和能耗。以H.264解碼中的反量化反變換為例對該結(jié)構(gòu)進行實現(xiàn),仿真結(jié)果顯示,相較于傳統(tǒng)軟件實現(xiàn)方法,該方案可獲得7.1倍的性能提升,節(jié)省80.5%的能量,并且只增加2%的面積開銷。
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者單位】: 復旦大學專用集成電路與系統(tǒng)國家重點實驗室;中山大學中山大學-卡內(nèi)基梅隆大學聯(lián)合工程學院;廣東順德中山大學-卡內(nèi)基梅隆大學國際聯(lián)合研究院;
【基金】:廣東順德中山大學-卡內(nèi)基梅隆大學國際聯(lián)合研究院項目(20150303) 三星電子橫向課題(SLSI-201403DD013)
【分類號】:TN405
,
本文編號:2313838
[Abstract]:For data-intensive applications, a large amount of energy and delay is consumed on data transmission between computing and storage cells, resulting in von Neumann bottleneck. This problem still exists in the integrated system with 2.5 D package. Therefore, a new hardware acceleration scheme is proposed. The memory computing is introduced into 2.5D system, which makes the off-chip storage have the ability of operation. The memory is divided into several bank, to support parallel access between bank, and a configurable accelerator is designed in the memory array to make full use of the bandwidth of the memory array for parallel computation, thus reducing the delay and energy consumption of data transmission. Taking the inverse quantization inverse transform in H.264 decoding as an example, the simulation results show that compared with the traditional software implementation method, the performance of the scheme can be improved by 7.1 times and the energy of 80.5% can be saved. And only increase by 2% area overhead.
【作者單位】: 復旦大學專用集成電路與系統(tǒng)國家重點實驗室;中山大學中山大學-卡內(nèi)基梅隆大學聯(lián)合工程學院;廣東順德中山大學-卡內(nèi)基梅隆大學國際聯(lián)合研究院;
【基金】:廣東順德中山大學-卡內(nèi)基梅隆大學國際聯(lián)合研究院項目(20150303) 三星電子橫向課題(SLSI-201403DD013)
【分類號】:TN405
,
本文編號:2313838
本文鏈接:http://sikaile.net/kejilunwen/dianzigongchenglunwen/2313838.html
最近更新
教材專著