當(dāng)前位置：主頁 > 法律論文 > 知識產(chǎn)權(quán)法論文 >

嵌入式環(huán)境下浮點(diǎn)矩陣乘法的FPGA加速關(guān)鍵技術(shù)研究

發(fā)布時間：2018-02-02 21:07

本文關(guān)鍵詞： 浮點(diǎn)矩陣乘法乘法累加器 FPGA加速嵌入式系統(tǒng) PCI-E　出處：《湖南大學(xué)》2013年碩士論文　論文類型：學(xué)位論文

【摘要】：浮點(diǎn)矩陣乘法是數(shù)字信號處理的基礎(chǔ)算法,在通信、網(wǎng)絡(luò)、工控、醫(yī)療等領(lǐng)域有著廣泛的應(yīng)用。隨著嵌入式系統(tǒng)在這些領(lǐng)域的深入應(yīng)用,浮點(diǎn)矩陣乘法由于其計算復(fù)雜度高、處理效率低,通常成為提升嵌入式系統(tǒng)在這領(lǐng)域計算速度的瓶頸。現(xiàn)場可編程邏輯陣列(Field Programmable Gate Array,FPGA)協(xié)處理器因其速度快、可編程、使用靈活等特點(diǎn),而成為提升嵌入式系統(tǒng)的計算速度的一種有效方式,受到了國內(nèi)外研究者的廣泛關(guān)注。因此,研究嵌入式環(huán)境下的浮點(diǎn)矩陣乘法FPGA加速有著非常重要的意義。本文針對三維熒光數(shù)學(xué)分離算法中浮點(diǎn)矩陣乘法的計算問題,在分析浮點(diǎn)矩陣乘法算法與FPGA硬件結(jié)構(gòu)的基礎(chǔ)上,研究了一種基于并行結(jié)構(gòu)的流水線浮點(diǎn)矩陣乘法器以及異構(gòu)多處理器下的通信機(jī)制,以提高嵌入式環(huán)境下浮點(diǎn)矩陣乘法的FPGA計算性能,具體工作如下：針對矩陣乘法的核心計算單元乘法累加器,分析每個時鐘周期中乘法累加的計算過程,在浮點(diǎn)乘法器和加法器知識產(chǎn)權(quán)核的基礎(chǔ)上,提出了一種流水線浮點(diǎn)乘法累加器結(jié)構(gòu)。該結(jié)構(gòu)中數(shù)據(jù)在經(jīng)過流水線乘法器和加法器之后,只需計算加法器的最后N級流水線結(jié)果之和即可得到所計算的累加和。此外,該結(jié)構(gòu)使用靈活、適用性好,可根據(jù)實際需求調(diào)整流水線的級數(shù)以適應(yīng)不同應(yīng)用的性能需求。在上述乘法累加器的基礎(chǔ)上,本文研究設(shè)計了一種并行架構(gòu)下的浮點(diǎn)矩陣乘法器,降低了計算復(fù)雜度,提升了計算速度。該矩陣乘法器可以配置兩個相乘矩陣的行列參數(shù),并且可以根據(jù)實際的FPGA資源情況設(shè)置處理單元的數(shù)目,而相鄰的處理單元之間沒有數(shù)據(jù)的交互,具有良好的擴(kuò)展性。針對浮點(diǎn)矩陣乘法的FPGA協(xié)處理器與嵌入式CPU的通信問題,本文設(shè)計了基于串口UART口PCI-E總線的兩種通信結(jié)構(gòu)。在PCI-E的通信結(jié)構(gòu)中,將基于片上可編程系統(tǒng)結(jié)構(gòu)的FPGA端設(shè)計與嵌入式上位機(jī)的驅(qū)動程序相結(jié)合,實現(xiàn)軟硬件系統(tǒng)的協(xié)同工作。本文基于Verilog硬件描述語言實現(xiàn)了浮點(diǎn)乘法累加器和矩陣乘法,并從仿真、綜合等方面對其性能進(jìn)行了分析。為了進(jìn)一步驗證其在嵌入式環(huán)境中的性能,分別實現(xiàn)了浮點(diǎn)矩陣乘法通過UART、PCI-E與本文所依托項目中的Intel E6x5C嵌入式平臺的通信。實驗結(jié)果表明,采用高速PCI-E總線加速浮點(diǎn)矩陣乘法計算的方式,能夠比目前主流的Cortex A9和ARM9嵌入式平臺對浮點(diǎn)矩陣乘法的計算速率分別提升了約8倍和200倍,因此該加速方式能夠有效的提升嵌入式平臺對浮點(diǎn)運(yùn)算的計算性能。
[Abstract]:Floating-point matrix multiplication is the basic algorithm of digital signal processing. It is widely used in the fields of communication, network, industrial control, medical treatment and so on. Floating-point matrix multiplication has high computational complexity and low processing efficiency. Field Programmable Gate Array is often the bottleneck to improve the computing speed of embedded systems in this field. FPGA (FPGA) coprocessor has become an effective way to improve the computing speed of embedded system because of its high speed, programmable and flexible use, which has been widely concerned by researchers at home and abroad. It is very important to study the FPGA acceleration of floating-point matrix multiplication in embedded environment. Based on the analysis of floating-point matrix multiplication algorithm and FPGA hardware structure, this paper aims at the calculation of floating-point matrix multiplication in 3-D fluorescence mathematical separation algorithm. A pipeline floating-point matrix multiplier based on parallel architecture and communication mechanism under heterogeneous multi-processor are studied to improve the FPGA performance of floating-point matrix multiplication in embedded environment. The main work is as follows:. Based on the intellectual property core of floating-point multiplier and adder, the calculation process of multiplication accumulation in each clock cycle is analyzed for the multiplication accumulator, which is the core of matrix multiplication. A pipeline floating-point multiplicative accumulator structure is proposed in which the data is passed by pipeline multiplier and adder. Only the sum of the last N-order pipeline results of the adder can be calculated. In addition, the structure is flexible in use and has good applicability. Pipeline's series can be adjusted according to actual requirements to meet the performance requirements of different applications. Based on the above multiplicative accumulator, a floating-point matrix multiplier based on parallel architecture is designed, which reduces the computational complexity. The matrix multiplier can configure the column and column parameters of two multiplicative matrices and can set the number of processing units according to the actual FPGA resources. There is no data interaction between adjacent processing units, so it has good expansibility. The communication between FPGA coprocessor and embedded CPU based on floating-point matrix multiplication is discussed. In this paper, we design two communication structures based on PCI-E bus of serial port UART port, in the communication structure of PCI-E. The FPGA end design based on the on-chip programmable system structure is combined with the driver of the embedded host computer to realize the cooperative work of the hardware and software system. In this paper, floating-point multiplication accumulator and matrix multiplication are realized based on Verilog hardware description language. In order to further verify its performance in embedded environment, the floating-point matrix multiplication is implemented through UART. The communication between PCI-E and Intel E6x5C embedded platform in the project of this paper. The experimental results show that the high speed PCI-E bus is used to accelerate the calculation of floating-point matrix multiplication. Compared with the current mainstream Cortex A9 and ARM9 embedded platform, the computing speed of floating-point matrix multiplication can be increased about 8 times and 200 times respectively. Therefore, the acceleration method can effectively improve the computing performance of the embedded platform for floating point operation.
【學(xué)位授予單位】：湖南大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TN791

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 劉志宏,蔡汝秀;三維熒光光譜技術(shù)分析應(yīng)用進(jìn)展[J];分析科學(xué)學(xué)報;2000年06期

2 楊鑫;徐偉俊;陳先勇;夏宇聞;;Avalon總線最新接口標(biāo)準(zhǔn)綜述[J];中國集成電路;2007年11期

3 柴秀娟;山世光;卿來云;陳熙霖;高文;;基于3D人臉重建的光照、姿態(tài)不變?nèi)四樧R別[J];軟件學(xué)報;2006年03期

4 田翔;周凡;陳耀武;劉莉;陳耀;;基于FPGA的實時雙精度浮點(diǎn)矩陣乘法器設(shè)計[J];浙江大學(xué)學(xué)報(工學(xué)版);2008年09期

5 劉沛華;魯華祥;龔國良;劉文鵬;;基于FPGA的全流水雙精度浮點(diǎn)矩陣乘法器設(shè)計[J];智能系統(tǒng)學(xué)報;2012年04期

相關(guān)博士學(xué)位論文前1條

1 鄔貴明;FPGA矩陣計算并行算法與結(jié)構(gòu)[D];國防科學(xué)技術(shù)大學(xué);2011年

，

本文編號：1485481

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/falvlunwen/zhishichanquanfa/1485481.html

上一篇：建筑作品著作權(quán)及其保護(hù)的研究
下一篇：核電站反應(yīng)堆保護(hù)系統(tǒng)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

嵌入式環(huán)境下浮點(diǎn)矩陣乘法的FPGA加速關(guān)鍵技術(shù)研究