高性能微處理器中浮點融合乘加部件的設(shè)計與實現(xiàn)
發(fā)布時間:2018-09-08 09:28
【摘要】:浮點融合乘加(FMA)部件作為高性能微處理器的核心運算部件之一,對整個微處理器的浮點性能具有很大影響。浮點融合乘加運算算法復雜,邏輯執(zhí)行時間長,規(guī)模大;且驗證難,設(shè)計周期長。因此,對高性能浮點融合乘加部件的研究具有廣泛的應用價值和重要的現(xiàn)實意義。 本文對高性能浮點融合乘加部件的設(shè)計和優(yōu)化技術(shù)進行了研究,課題的研究內(nèi)容作為國家重大項目“高性能X處理器”的一部分,研究成果直接應用于工程實踐;趩螖(shù)據(jù)通路FMA算法,無異常中斷和軟件協(xié)處理(SWA)機制,以高頻率、小面積、兼容IEEE754標準為目標,本文設(shè)計了支持非規(guī)格化數(shù),符號零,無窮大和NaNs數(shù)輸入與輸出的FMA部件。主要研究工作及成果包括以下幾點: 1.對高性能浮點融合乘加部件及其關(guān)鍵技術(shù)進行了廣泛的研究,在此基礎(chǔ)上設(shè)計并實現(xiàn)了高性能X處理器的浮點融合乘加部件。 2.提出了一種乘法陣列的進位修正結(jié)構(gòu);設(shè)計了基于EAC結(jié)構(gòu)的主加法器,減少了FMA的邏輯級數(shù),提高了執(zhí)行速度。 3.采用最大規(guī)格化移位量控制和靈活的一位規(guī)格化修正技術(shù)設(shè)計了支持非規(guī)格化數(shù)的簡捷LZA結(jié)構(gòu);將精確無窮大操作和NaNs數(shù)據(jù)通路并入對齊的加數(shù)數(shù)據(jù)通路,非規(guī)格化操作數(shù)處理融入到正常的規(guī)格化數(shù)據(jù)流中,以最大限度地共享尾數(shù)處理數(shù)據(jù)通路。 4.用Verilog硬件描述語言完成了對整個設(shè)計的RTL級流水化建模實現(xiàn)。整個設(shè)計通過了包括IEEE754標準測試向量、特殊操作數(shù)、邊角數(shù)據(jù)和大量的隨機向量等各種測試集的測試,,保證了設(shè)計的正確性。 最后,對本文設(shè)計的浮點融合乘加部件進行了綜合和優(yōu)化調(diào)試,采用40nm體硅CMOS工藝,在最壞工藝條件下,其頻率能達到2.5GHz,面積56735.9um2,滿足X處理器的設(shè)計要求。
[Abstract]:As one of the core computing components of high-performance microprocessors, floating-point fusion multiplication plus (FMA) has great influence on the floating-point performance of the whole microprocessor. The floating-point fusion multiplication and addition algorithm is complex, the logical execution time is long, the scale is large, and the verification is difficult and the design period is long. Therefore, the research of high performance floating-point fusion multiplicative components has wide application value and important practical significance. In this paper, the design and optimization of high performance floating-point fusion multiplicative components are studied. As a part of the national important project "High performance X processor", the research results are directly applied in engineering practice. Based on the single data path FMA algorithm, no abnormal interrupt and software coprocessing (SWA) mechanism, and aiming at high frequency, small area and compatible with IEEE754 standard, this paper designs FMA parts that support non-normalized number, symbol zero, infinity and NaNs number input and output. The main research work and results include the following: 1. The high performance floating-point fusion multiplier and its key technology are studied extensively. Based on this, the floating-point fusion multiplicative and additive component of high performance X processor is designed and implemented. 2. In this paper, a carry correction structure of multiplication array is proposed, and a main adder based on EAC structure is designed, which reduces the logical series of FMA and improves the execution speed. A simple LZA structure supporting non-normalized number is designed by using the maximum normalized shift control and flexible one-bit correction technique, and the precise infinity operation and the NaNs data path are incorporated into the aligned additive data path. Non-normalized Operand processing is integrated into the normal normalized data stream to maximize the sharing of Mantissa processing data path. 4. The RTL level pipelining modeling of the whole design is implemented with Verilog hardware description language. The whole design has passed the tests including IEEE754 standard test vector, special Operand, edge angle data and a large number of random vectors, which ensures the correctness of the design. Finally, the floating-point fusion multiplicative component designed in this paper is synthesized and optimized. The 40nm bulk silicon CMOS process is adopted. Under the worst technological conditions, the frequency can reach 2.5 GHz and the area is 56735.9 um2, which meets the design requirements of X processor.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP332
本文編號:2230126
[Abstract]:As one of the core computing components of high-performance microprocessors, floating-point fusion multiplication plus (FMA) has great influence on the floating-point performance of the whole microprocessor. The floating-point fusion multiplication and addition algorithm is complex, the logical execution time is long, the scale is large, and the verification is difficult and the design period is long. Therefore, the research of high performance floating-point fusion multiplicative components has wide application value and important practical significance. In this paper, the design and optimization of high performance floating-point fusion multiplicative components are studied. As a part of the national important project "High performance X processor", the research results are directly applied in engineering practice. Based on the single data path FMA algorithm, no abnormal interrupt and software coprocessing (SWA) mechanism, and aiming at high frequency, small area and compatible with IEEE754 standard, this paper designs FMA parts that support non-normalized number, symbol zero, infinity and NaNs number input and output. The main research work and results include the following: 1. The high performance floating-point fusion multiplier and its key technology are studied extensively. Based on this, the floating-point fusion multiplicative and additive component of high performance X processor is designed and implemented. 2. In this paper, a carry correction structure of multiplication array is proposed, and a main adder based on EAC structure is designed, which reduces the logical series of FMA and improves the execution speed. A simple LZA structure supporting non-normalized number is designed by using the maximum normalized shift control and flexible one-bit correction technique, and the precise infinity operation and the NaNs data path are incorporated into the aligned additive data path. Non-normalized Operand processing is integrated into the normal normalized data stream to maximize the sharing of Mantissa processing data path. 4. The RTL level pipelining modeling of the whole design is implemented with Verilog hardware description language. The whole design has passed the tests including IEEE754 standard test vector, special Operand, edge angle data and a large number of random vectors, which ensures the correctness of the design. Finally, the floating-point fusion multiplicative component designed in this paper is synthesized and optimized. The 40nm bulk silicon CMOS process is adopted. Under the worst technological conditions, the frequency can reach 2.5 GHz and the area is 56735.9 um2, which meets the design requirements of X processor.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP332
【參考文獻】
相關(guān)博士學位論文 前1條
1 孫巖;納米集成電路軟錯誤分析與緩解技術(shù)研究[D];國防科學技術(shù)大學;2010年
本文編號:2230126
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2230126.html
最近更新
教材專著