適用于空間通信的LDPC碼GPU高速譯碼架構(gòu)
發(fā)布時(shí)間:2018-03-05 15:30
本文選題:低密度奇偶校驗(yàn)碼 切入點(diǎn):圖形處理器 出處:《航空學(xué)報(bào)》2017年01期 論文類型:期刊論文
【摘要】:鑒于目前空間通信對(duì)高速、可重配置信道譯碼器的需求,利用圖形處理器(GPU)的并行化運(yùn)算特點(diǎn),提出了一種低密度奇偶校驗(yàn)(LDPC)碼軟件高速譯碼架構(gòu)。通過(guò)優(yōu)化Turbo消息傳遞譯碼(TDMP)算法節(jié)點(diǎn)更新運(yùn)算線程塊內(nèi)和塊間并行度、減少非規(guī)則行重造成的線程分支、降低線程對(duì)節(jié)點(diǎn)更新信息存儲(chǔ)資源的訪問(wèn)延時(shí)以及合理量化譯碼器存儲(chǔ)信息來(lái)提升譯碼內(nèi)核函數(shù)的執(zhí)行效率。并在此基礎(chǔ)上引入異步統(tǒng)一計(jì)算設(shè)備構(gòu)架(CUDA)流處理機(jī)制,設(shè)計(jì)優(yōu)化的譯碼器輸入輸出數(shù)據(jù)傳輸和內(nèi)核函數(shù)之間的執(zhí)行調(diào)度方式以及CUDA流上的譯碼線程資源配置方式,最大化譯碼吞吐率的同時(shí)降低譯碼延時(shí)。在Nvidia最新的Tesla K20和GTX980平臺(tái)上對(duì)國(guó)際空間數(shù)據(jù)系統(tǒng)咨詢委員會(huì)(CCSDS)遙測(cè)標(biāo)準(zhǔn)LDPC碼進(jìn)行的TDMP譯碼實(shí)驗(yàn)結(jié)果表明,本架構(gòu)進(jìn)行10次迭代譯碼的吞吐率最高可達(dá)約500 Mbps,平均譯碼延時(shí)約為2ms左右。與現(xiàn)有結(jié)果相比,本架構(gòu)在保持軟件架構(gòu)配置靈活性的同時(shí)更加有效的兼顧了譯碼吞吐率和延時(shí)性能。
[Abstract]:In view of the demand of space communication for high-speed and reconfigurable channel decoder, the parallel computing characteristics of GPU are utilized. In this paper, a high speed decoding architecture for low density parity check (LDPC) codes is proposed. By optimizing the Turbo message passing decoding (Turbo) algorithm, the parallelism between and within the operation thread blocks is updated to reduce the thread branch caused by irregular row replay. In order to improve the execution efficiency of decoding kernel functions, the thread access delay to node update information storage resources and the reasonable quantization of decoder storage information are reduced. On this basis, an asynchronous unified computing device architecture (CUDAA) stream processing mechanism is introduced. The optimized implementation scheduling between input and output data transmission and kernel functions and the configuration of decoding thread resources on the CUDA stream are designed. The experimental results of TDMP decoding on the latest Tesla K20 and GTX980 platforms of the International Space data Systems Advisory Committee (ISCS) telemetry standard LDPC codes show that the decoding time delay is reduced while the decoding throughput is maximized. The maximum throughput of 10 iterations is about 500 Mbpss, and the average decoding delay is about 2 Ms. Compared with the existing results, This architecture not only keeps the flexibility of software architecture configuration, but also takes into account the decoding throughput and delay performance more effectively.
【作者單位】: 北京航空航天大學(xué)電子信息工程學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(91438116)~~
【分類號(hào)】:V443.1;TN911.22
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 朱宏杰;裴玉奎;陸建華;;一種提高噴泉碼譯碼成功率的算法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年04期
2 單永杰;韓家瑋;張洪群;李安;;衛(wèi)星數(shù)據(jù)組合譯碼技術(shù)研究與實(shí)現(xiàn)[J];微計(jì)算機(jī)信息;2011年04期
3 錢t,
本文編號(hào):1570828
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1570828.html
最近更新
教材專著