嵌入式環(huán)境下浮點(diǎn)矩陣乘法的FPGA加速關(guān)鍵技術(shù)研究
本文關(guān)鍵詞: 浮點(diǎn)矩陣乘法 乘法累加器 FPGA加速 嵌入式系統(tǒng) PCI-E 出處:《湖南大學(xué)》2013年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:浮點(diǎn)矩陣乘法是數(shù)字信號(hào)處理的基礎(chǔ)算法,在通信、網(wǎng)絡(luò)、工控、醫(yī)療等領(lǐng)域有著廣泛的應(yīng)用。隨著嵌入式系統(tǒng)在這些領(lǐng)域的深入應(yīng)用,浮點(diǎn)矩陣乘法由于其計(jì)算復(fù)雜度高、處理效率低,通常成為提升嵌入式系統(tǒng)在這領(lǐng)域計(jì)算速度的瓶頸,F(xiàn)場(chǎng)可編程邏輯陣列(Field Programmable Gate Array,FPGA)協(xié)處理器因其速度快、可編程、使用靈活等特點(diǎn),而成為提升嵌入式系統(tǒng)的計(jì)算速度的一種有效方式,受到了國(guó)內(nèi)外研究者的廣泛關(guān)注。因此,研究嵌入式環(huán)境下的浮點(diǎn)矩陣乘法FPGA加速有著非常重要的意義。 本文針對(duì)三維熒光數(shù)學(xué)分離算法中浮點(diǎn)矩陣乘法的計(jì)算問(wèn)題,在分析浮點(diǎn)矩陣乘法算法與FPGA硬件結(jié)構(gòu)的基礎(chǔ)上,研究了一種基于并行結(jié)構(gòu)的流水線浮點(diǎn)矩陣乘法器以及異構(gòu)多處理器下的通信機(jī)制,以提高嵌入式環(huán)境下浮點(diǎn)矩陣乘法的FPGA計(jì)算性能,具體工作如下: 針對(duì)矩陣乘法的核心計(jì)算單元乘法累加器,分析每個(gè)時(shí)鐘周期中乘法累加的計(jì)算過(guò)程,在浮點(diǎn)乘法器和加法器知識(shí)產(chǎn)權(quán)核的基礎(chǔ)上,提出了一種流水線浮點(diǎn)乘法累加器結(jié)構(gòu)。該結(jié)構(gòu)中數(shù)據(jù)在經(jīng)過(guò)流水線乘法器和加法器之后,只需計(jì)算加法器的最后N級(jí)流水線結(jié)果之和即可得到所計(jì)算的累加和。此外,該結(jié)構(gòu)使用靈活、適用性好,可根據(jù)實(shí)際需求調(diào)整流水線的級(jí)數(shù)以適應(yīng)不同應(yīng)用的性能需求。 在上述乘法累加器的基礎(chǔ)上,本文研究設(shè)計(jì)了一種并行架構(gòu)下的浮點(diǎn)矩陣乘法器,降低了計(jì)算復(fù)雜度,提升了計(jì)算速度。該矩陣乘法器可以配置兩個(gè)相乘矩陣的行列參數(shù),并且可以根據(jù)實(shí)際的FPGA資源情況設(shè)置處理單元的數(shù)目,而相鄰的處理單元之間沒(méi)有數(shù)據(jù)的交互,具有良好的擴(kuò)展性。 針對(duì)浮點(diǎn)矩陣乘法的FPGA協(xié)處理器與嵌入式CPU的通信問(wèn)題,本文設(shè)計(jì)了基于串口UART口PCI-E總線的兩種通信結(jié)構(gòu)。在PCI-E的通信結(jié)構(gòu)中,將基于片上可編程系統(tǒng)結(jié)構(gòu)的FPGA端設(shè)計(jì)與嵌入式上位機(jī)的驅(qū)動(dòng)程序相結(jié)合,實(shí)現(xiàn)軟硬件系統(tǒng)的協(xié)同工作。 本文基于Verilog硬件描述語(yǔ)言實(shí)現(xiàn)了浮點(diǎn)乘法累加器和矩陣乘法,并從仿真、綜合等方面對(duì)其性能進(jìn)行了分析。為了進(jìn)一步驗(yàn)證其在嵌入式環(huán)境中的性能,分別實(shí)現(xiàn)了浮點(diǎn)矩陣乘法通過(guò)UART、PCI-E與本文所依托項(xiàng)目中的Intel E6x5C嵌入式平臺(tái)的通信。實(shí)驗(yàn)結(jié)果表明,采用高速PCI-E總線加速浮點(diǎn)矩陣乘法計(jì)算的方式,能夠比目前主流的Cortex A9和ARM9嵌入式平臺(tái)對(duì)浮點(diǎn)矩陣乘法的計(jì)算速率分別提升了約8倍和200倍,因此該加速方式能夠有效的提升嵌入式平臺(tái)對(duì)浮點(diǎn)運(yùn)算的計(jì)算性能。
[Abstract]:Floating-point matrix multiplication is the basic algorithm of digital signal processing. It is widely used in the fields of communication, network, industrial control, medical treatment and so on. Floating-point matrix multiplication has high computational complexity and low processing efficiency. Field Programmable Gate Array is often the bottleneck to improve the computing speed of embedded systems in this field. FPGA (FPGA) coprocessor has become an effective way to improve the computing speed of embedded system because of its high speed, programmable and flexible use, which has been widely concerned by researchers at home and abroad. It is very important to study the FPGA acceleration of floating-point matrix multiplication in embedded environment. Based on the analysis of floating-point matrix multiplication algorithm and FPGA hardware structure, this paper aims at the calculation of floating-point matrix multiplication in 3-D fluorescence mathematical separation algorithm. A pipeline floating-point matrix multiplier based on parallel architecture and communication mechanism under heterogeneous multi-processor are studied to improve the FPGA performance of floating-point matrix multiplication in embedded environment. The main work is as follows:. Based on the intellectual property core of floating-point multiplier and adder, the calculation process of multiplication accumulation in each clock cycle is analyzed for the multiplication accumulator, which is the core of matrix multiplication. A pipeline floating-point multiplicative accumulator structure is proposed in which the data is passed by pipeline multiplier and adder. Only the sum of the last N-order pipeline results of the adder can be calculated. In addition, the structure is flexible in use and has good applicability. Pipeline's series can be adjusted according to actual requirements to meet the performance requirements of different applications. Based on the above multiplicative accumulator, a floating-point matrix multiplier based on parallel architecture is designed, which reduces the computational complexity. The matrix multiplier can configure the column and column parameters of two multiplicative matrices and can set the number of processing units according to the actual FPGA resources. There is no data interaction between adjacent processing units, so it has good expansibility. The communication between FPGA coprocessor and embedded CPU based on floating-point matrix multiplication is discussed. In this paper, we design two communication structures based on PCI-E bus of serial port UART port, in the communication structure of PCI-E. The FPGA end design based on the on-chip programmable system structure is combined with the driver of the embedded host computer to realize the cooperative work of the hardware and software system. In this paper, floating-point multiplication accumulator and matrix multiplication are realized based on Verilog hardware description language. In order to further verify its performance in embedded environment, the floating-point matrix multiplication is implemented through UART. The communication between PCI-E and Intel E6x5C embedded platform in the project of this paper. The experimental results show that the high speed PCI-E bus is used to accelerate the calculation of floating-point matrix multiplication. Compared with the current mainstream Cortex A9 and ARM9 embedded platform, the computing speed of floating-point matrix multiplication can be increased about 8 times and 200 times respectively. Therefore, the acceleration method can effectively improve the computing performance of the embedded platform for floating point operation.
【學(xué)位授予單位】:湖南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TN791
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 劉志宏,蔡汝秀;三維熒光光譜技術(shù)分析應(yīng)用進(jìn)展[J];分析科學(xué)學(xué)報(bào);2000年06期
2 楊鑫;徐偉俊;陳先勇;夏宇聞;;Avalon總線最新接口標(biāo)準(zhǔn)綜述[J];中國(guó)集成電路;2007年11期
3 柴秀娟;山世光;卿來(lái)云;陳熙霖;高文;;基于3D人臉重建的光照、姿態(tài)不變?nèi)四樧R(shí)別[J];軟件學(xué)報(bào);2006年03期
4 田翔;周凡;陳耀武;劉莉;陳耀;;基于FPGA的實(shí)時(shí)雙精度浮點(diǎn)矩陣乘法器設(shè)計(jì)[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2008年09期
5 劉沛華;魯華祥;龔國(guó)良;劉文鵬;;基于FPGA的全流水雙精度浮點(diǎn)矩陣乘法器設(shè)計(jì)[J];智能系統(tǒng)學(xué)報(bào);2012年04期
相關(guān)博士學(xué)位論文 前1條
1 鄔貴明;FPGA矩陣計(jì)算并行算法與結(jié)構(gòu)[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年
,本文編號(hào):1485481
本文鏈接:http://sikaile.net/falvlunwen/zhishichanquanfa/1485481.html