適用于空間通信的LDPC碼GPU高速譯碼架構(gòu)

發(fā)布時間：2018-03-05 15:30

本文選題：低密度奇偶校驗碼　切入點(diǎn)：圖形處理器　出處：《航空學(xué)報》2017年01期 　論文類型：期刊論文

【摘要】：鑒于目前空間通信對高速、可重配置信道譯碼器的需求,利用圖形處理器(GPU)的并行化運(yùn)算特點(diǎn),提出了一種低密度奇偶校驗(LDPC)碼軟件高速譯碼架構(gòu)。通過優(yōu)化Turbo消息傳遞譯碼(TDMP)算法節(jié)點(diǎn)更新運(yùn)算線程塊內(nèi)和塊間并行度、減少非規(guī)則行重造成的線程分支、降低線程對節(jié)點(diǎn)更新信息存儲資源的訪問延時以及合理量化譯碼器存儲信息來提升譯碼內(nèi)核函數(shù)的執(zhí)行效率。并在此基礎(chǔ)上引入異步統(tǒng)一計算設(shè)備構(gòu)架(CUDA)流處理機(jī)制,設(shè)計優(yōu)化的譯碼器輸入輸出數(shù)據(jù)傳輸和內(nèi)核函數(shù)之間的執(zhí)行調(diào)度方式以及CUDA流上的譯碼線程資源配置方式,最大化譯碼吞吐率的同時降低譯碼延時。在Nvidia最新的Tesla K20和GTX980平臺上對國際空間數(shù)據(jù)系統(tǒng)咨詢委員會(CCSDS)遙測標(biāo)準(zhǔn)LDPC碼進(jìn)行的TDMP譯碼實(shí)驗結(jié)果表明,本架構(gòu)進(jìn)行10次迭代譯碼的吞吐率最高可達(dá)約500 Mbps,平均譯碼延時約為2ms左右。與現(xiàn)有結(jié)果相比,本架構(gòu)在保持軟件架構(gòu)配置靈活性的同時更加有效的兼顧了譯碼吞吐率和延時性能。
[Abstract]:In view of the demand of space communication for high-speed and reconfigurable channel decoder, the parallel computing characteristics of GPU are utilized. In this paper, a high speed decoding architecture for low density parity check (LDPC) codes is proposed. By optimizing the Turbo message passing decoding (Turbo) algorithm, the parallelism between and within the operation thread blocks is updated to reduce the thread branch caused by irregular row replay. In order to improve the execution efficiency of decoding kernel functions, the thread access delay to node update information storage resources and the reasonable quantization of decoder storage information are reduced. On this basis, an asynchronous unified computing device architecture (CUDAA) stream processing mechanism is introduced. The optimized implementation scheduling between input and output data transmission and kernel functions and the configuration of decoding thread resources on the CUDA stream are designed. The experimental results of TDMP decoding on the latest Tesla K20 and GTX980 platforms of the International Space data Systems Advisory Committee (ISCS) telemetry standard LDPC codes show that the decoding time delay is reduced while the decoding throughput is maximized. The maximum throughput of 10 iterations is about 500 Mbpss, and the average decoding delay is about 2 Ms. Compared with the existing results, This architecture not only keeps the flexibility of software architecture configuration, but also takes into account the decoding throughput and delay performance more effectively.
【作者單位】：北京航空航天大學(xué)電子信息工程學(xué)院;
【基金】：國家自然科學(xué)基金(91438116)~~
【分類號】：V443.1;TN911.22

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 朱宏杰;裴玉奎;陸建華;;一種提高噴泉碼譯碼成功率的算法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2010年04期

2 單永杰;韓家瑋;張洪群;李安;;衛(wèi)星數(shù)據(jù)組合譯碼技術(shù)研究與實(shí)現(xiàn)[J];微計算機(jī)信息;2011年04期

3 錢t，

本文編號：1570828

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/xinxigongchenglunwen/1570828.html

上一篇：基于ZigBee和ARM的電熱膜供暖控制系統(tǒng)開發(fā)
下一篇：一種基于互素陣的孔徑擴(kuò)展方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

適用于空間通信的LDPC碼GPU高速譯碼架構(gòu)