基于FPGA的并行加速實驗平臺原型設(shè)計與實現(xiàn)

發(fā)布時間：2018-03-16 01:20

本文選題：PCI　切入點：Express　出處：《山東大學(xué)》2013年碩士論文　論文類型：學(xué)位論文

【摘要】：近年來,隨著物聯(lián)網(wǎng)等新概念的提出和計算機技術(shù)的進步,嵌入式系統(tǒng)正以前所未有的速度發(fā)展,各種新型的嵌入式設(shè)備不斷涌現(xiàn)；而且這些新出現(xiàn)的設(shè)備對智能化和實時性的要求越來越高,因此需要的運算量也越來越大。但是,傳統(tǒng)的嵌入式處理器由于受性能、頻率等方面的限制,單個的處理器已經(jīng)在很大程度上沒法滿足需求。如果采用多個嵌入式處理器來提高處理速度,其功耗必將會大大增加,對能量有限的嵌入式設(shè)備而言,這也是不合適的。在這種情況下,現(xiàn)場可編程邏輯門陣列(Field Programmable Gate Array, FPGA)加嵌入式處理器的異構(gòu)體系架構(gòu)成為了解決上述問題的一個理想方案之一。目前基于FPGA的并行加速模型可謂多種多樣,針對具體的算法采用FPGA作為協(xié)處理器進行并行加速研究也是學(xué)術(shù)界的熱點之一。但是通常,將算法采用FPGA進行并行加速后,多采用仿真和分析得到加速效果,缺少實際的板級測試,這主要是因為算法測試中需要與主控制器之間進行大量而且快速的數(shù)據(jù)交換,但是目前尚缺少這樣的數(shù)據(jù)交換平臺,因此急需這樣一個可以進行高速數(shù)據(jù)交換的并行加速實驗平臺,用于加速效果的板級測試。本文設(shè)計了一個并行加速實驗平臺原型。為達到數(shù)據(jù)交換速度要求,該平臺采用PCI Express總線與主控制器進行數(shù)據(jù)交換,為加速數(shù)據(jù)傳輸,采用了DMA傳輸?shù)姆绞�。文中給出了實驗平臺的總體設(shè)計及實現(xiàn)步驟和方法。采用自上而下的模塊化設(shè)計模式,將平臺分為了PCI Express端點控制器模塊、PCI Express事物層報文處理及DMA控制模塊、存儲控制器模塊、并行加速實驗?zāi)K和并行加速模塊與存儲器控制器之間的接口模塊。作為整個平臺的核心模塊,PCI Express事務(wù)層報文處理及DMA控制器模塊邏輯復(fù)雜,子模塊眾多,本文中重點介紹了該模塊的詳細(xì)設(shè)計和實現(xiàn)過程,將其劃分為發(fā)送部件、接收部件、DMA控制器、讀請求封裝器、發(fā)送數(shù)據(jù)仲裁及準(zhǔn)備模塊、接收數(shù)據(jù)分發(fā)模塊、DMA與存儲器控制器接口模塊和DMA與并行加速模塊接口等子模塊分別實現(xiàn)。同時也給出了其他模塊的設(shè)計實現(xiàn)過程。然后以排序算法為例,介紹了并行排序加速器的實現(xiàn),以此為基礎(chǔ),設(shè)計實現(xiàn)了并行加速模塊,從而完成了整個實驗平臺的設(shè)計實現(xiàn)。本文最后對上述設(shè)計實現(xiàn)的平臺進行了測試,給出了平臺的實際資源占用、最大交換速度及實際加速效果等數(shù)據(jù)。通過實驗證明,該平臺滿足并行加速實驗的要求,可以進行算法并行加速的板級測試和實驗。
[Abstract]:In recent years, with the introduction of new concepts such as the Internet of things and the progress of computer technology, embedded systems are developing at an unprecedented speed, and a variety of new embedded devices are emerging. Moreover, these new devices require more and more intelligentization and real-time performance, so they need more and more computation. However, the traditional embedded processors are limited by performance, frequency and so on. A single processor has largely failed to meet the requirements. If multiple embedded processors are used to increase processing speed, the power consumption will be greatly increased for embedded devices with limited energy. In this case, the heterogeneous architecture of Field Programmable Gate Array (FPGA) with embedded processors has become one of the ideal solutions to the above problems. At present, there are various parallel acceleration models based on FPGA, and it is also one of the hot topics in academic circles to use FPGA as a coprocessor for specific algorithms. But usually, FPGA is used to accelerate the algorithm in parallel. Simulation and analysis are often used to get accelerated results and lack of actual board level testing, which is mainly due to the need for a large amount of and fast data exchange between the algorithm test and the main controller, but there is still a lack of such a data exchange platform. Therefore, such a parallel acceleration experiment platform for high speed data exchange is urgently needed, which can be used to test the acceleration effect at board level. In this paper, a prototype of parallel acceleration experiment platform is designed. In order to meet the requirement of data exchange speed, the platform uses PCI Express bus to exchange data with the main controller. The DMA transmission mode is adopted. The overall design, implementation steps and methods of the experimental platform are given, and the top-down modular design mode is adopted. The platform is divided into PCI Express endpoint controller module, PCI Express transaction layer message processing module and DMA control module, and storage controller module. The parallel acceleration experiment module and the interface module between the parallel acceleration module and the memory controller. As the core module of the whole platform, the logic of the transaction layer message processing and the DMA controller module of the DMA controller are complex, and the sub-modules are numerous. This paper focuses on the detailed design and implementation of the module, which is divided into sending parts, receiving components of DMA controller, reading request wrapper, sending data arbitration and preparation module. The interface module of receiving data distribution module and memory controller and the interface module of DMA and parallel acceleration module are implemented respectively. At the same time, the design and implementation of other modules are also given. Then, the sorting algorithm is taken as an example. This paper introduces the implementation of parallel sorting accelerator, designs and implements the parallel acceleration module based on it, and completes the design and implementation of the whole experimental platform. Finally, the above design and implementation platform is tested in this paper. The actual resource occupation, maximum exchange speed and actual acceleration effect of the platform are given. It is proved by experiments that the platform can meet the requirements of parallel acceleration experiments and can be tested and experimented on board level with parallel algorithm acceleration.
【學(xué)位授予單位】：山東大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP368.1;TP338.6

【參考文獻】

相關(guān)期刊論文前2條

1 崔強強;金同標(biāo);朱勇;;基于IMPULSE C的GF(P)域橢圓加密算法的硬件加速[J];計算機應(yīng)用;2011年09期

2 錢松,周欽,俞軍;AES算法的一種高效FPGA實現(xiàn)方法[J];微電子學(xué)與計算機;2005年07期

相關(guān)碩士學(xué)位論文前3條

1 羅r，

本文編號：1617699

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1617699.html

上一篇：P2P分布式存儲系統(tǒng)認(rèn)證技術(shù)的研究與實現(xiàn)
下一篇：實時操作系統(tǒng)在熱打印機中的應(yīng)用研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于FPGA的并行加速實驗平臺原型設(shè)計與實現(xiàn)