基于Goldschmidt算法的高性能雙精度浮點除法器設(shè)計
發(fā)布時間:2018-04-28 18:42
本文選題:浮點除法器 + Goldschmidt算法。 參考:《計算機應(yīng)用》2015年07期
【摘要】:針對雙精度浮點除法通常運算過程復(fù)雜、延時較大這一問題,提出一種基于Goldschmidt算法設(shè)計支持IEEE-754標(biāo)準(zhǔn)的高性能雙精度浮點除法器方法。首先,分析Goldschmidt算法運算除法的過程以及迭代運算產(chǎn)生的誤差;然后,提出了控制誤差的方法;其次,采用了較節(jié)約面積的雙查找表法確定迭代初值,迭代單元采用并行乘法器結(jié)構(gòu)以提高迭代速度;最后,合理劃分流水站,控制迭代過程使浮點除法可以流水執(zhí)行,從而進一步提高除法器運算速率。實驗結(jié)果表明,在40 nm工藝下,雙精度浮點除法器采用14位迭代初值流水結(jié)構(gòu),其綜合cell面積為84 902.261 8μm2,運行頻率可達2.2 GHz;相比采用8位迭代初值流水結(jié)構(gòu)運算速度提高了32.73%,面積增加了5.05%;計算一條雙精度浮點除法的延遲為12個時鐘周期,流水執(zhí)行時,單條除法平均延遲為3個時鐘周期,與其他處理器中基于SRT算法實現(xiàn)的雙精度浮點除法器相比,數(shù)據(jù)吞吐率提高了3~7倍;與其他處理器中基于Goldschmidt算法實現(xiàn)的雙精度浮點除法器相比,數(shù)據(jù)吞吐率提高了2~3倍。
[Abstract]:In order to solve the problem that the operation process of double precision floating-point division is complex and the delay is long, a high performance double-precision floating-point divider based on Goldschmidt algorithm is proposed to support IEEE-754 standard. Firstly, the process of division of Goldschmidt algorithm and the error caused by iterative operation are analyzed. Then, the method of controlling error is proposed. Secondly, the method of double look-up table is used to determine the initial value of iteration. The iteration unit adopts the parallel multiplier structure to improve the iteration speed. Finally, the flow station is divided reasonably, and the floating-point division can be performed by pipeline through controlling the iterative process, which further improves the operation speed of the divider. The experimental results show that the dual-precision floating-point divider uses a 14-bit iterative initial flow structure in the 40nm process. The integrated cell area is 84 902.261 8 渭 m 2, and the operation frequency can reach 2 902.261 GHz. Compared with using 8 bit iterative initial value pipeline structure, the operation speed is increased 32.73 and the area is increased by 5. 05. The delay of calculating a double precision floating point division is 12 clock cycles. The average delay of single division is three clock cycles. Compared with the dual-precision floating-point divider based on SRT algorithm in other processors, the data throughput is increased by 3 ~ 7 times. Compared with the dual-precision floating-point divider based on Goldschmidt algorithm in other processors, the data throughput is increased by 2 times.
【作者單位】: 國防科學(xué)技術(shù)大學(xué)計算機學(xué)院;
【基金】:湖南省重點學(xué)科建設(shè)項目(434515000008) 航空科學(xué)基金資助項目(2013zc88003) 國家自然科學(xué)基金資助項目(61402499)
【分類號】:TP332.22
【參考文獻】
相關(guān)期刊論文 前2條
1 吳鐵彬;劉衡竹;楊惠;張劍鋒;侯申;;一種快速SIMD浮點乘加器的設(shè)計與實現(xiàn)[J];計算機工程與科學(xué);2012年01期
2 李立s,
本文編號:1816424
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1816424.html
最近更新
教材專著