嵌入式多核處理器設(shè)計與實現(xiàn)關(guān)鍵技術(shù)研究

發(fā)布時間：2018-05-24 00:31

本文選題：嵌入式多核處理器 + 片上網(wǎng)絡(luò)�。� 參考：《合肥工業(yè)大學(xué)》2012年博士論文

【摘要】：嵌入式應(yīng)用已經(jīng)從早期的工業(yè)控制領(lǐng)域擴展到以媒體處理,信息處理為代表的計算密集型應(yīng)用領(lǐng)域,對嵌入式微處理器的性能提出了更高要求。與此同時,隨著VLSI技術(shù)進(jìn)步,單純依靠提高主頻進(jìn)而提升處理器性能的道路已經(jīng)走到盡頭,設(shè)計以多核處理器為代表的先進(jìn)處理器體系結(jié)構(gòu)已經(jīng)成為提升處理器性能,滿足不斷提升的應(yīng)用需求的主要途徑。隨著工藝技術(shù)的進(jìn)步,嵌入式多核處理器已經(jīng)得到較快發(fā)展,但仍然面臨一系列科學(xué)技術(shù)問題亟待解決。因此,開展嵌入式多核處理器設(shè)計與實現(xiàn)關(guān)鍵技術(shù)研究,具有重要的理論和現(xiàn)實意義。合成孔徑雷達(dá)(Synthetic Aperture Radar, SAR)是一種典型的計算密集型嵌入式應(yīng)用,并且在軍事、經(jīng)濟和環(huán)境等領(lǐng)域有重要應(yīng)用價值。本文以SAR實時成像應(yīng)用為例,探索面向高性能計算領(lǐng)域的多核架構(gòu)設(shè)計方法,重點從架構(gòu)設(shè)計與實現(xiàn)、應(yīng)用加速設(shè)計以及應(yīng)用映射等方面開展研究工作。針對高性能嵌入式應(yīng)用對高計算能力的需求,本文提出了基于“任務(wù)簇”的處理器體系結(jié)構(gòu)模型,并根據(jù)該模型設(shè)計了一種嵌入式多核處理器架構(gòu)。通過討論單層結(jié)構(gòu)和層次化結(jié)構(gòu)片上網(wǎng)絡(luò)的通訊性能與應(yīng)用的通訊特征間的關(guān)系,本文還設(shè)計了一種雙層混合結(jié)構(gòu)的多核通訊架構(gòu),并研究了通訊架構(gòu)中路由器類型的選擇以及路由器的體系結(jié)構(gòu)設(shè)計問題。FFT是SAR成像應(yīng)用中的主要運算任務(wù)。為加速FFT運算過程,本文提出了一種高性能的并行FFT處理架構(gòu)。針對多核芯片組協(xié)同工作問題,本文提出了一種面向多核芯片組的任務(wù)映射算法,以及一種具有普適性的多核芯片通訊方案。最后,在上述研究成果的基礎(chǔ)上,設(shè)計了一款SAR實時成像嵌入式多核原型系統(tǒng),驗證了本文的研究工作。本文所取得的研究成果主要有： 1.提出一種基于“任務(wù)簇”的處理器體系結(jié)構(gòu)模型,并根據(jù)該模型設(shè)計了一種嵌入式多核處理器架構(gòu),其中通訊架構(gòu)采用雙層混合結(jié)構(gòu)。針對高性能嵌入式應(yīng)用對高計算能力的需求,基于“任務(wù)簇”的處理器體系結(jié)構(gòu)模型通過細(xì)分計算任務(wù)、加速規(guī)則計算任務(wù)來提高處理器的計算能力。通過討論單層結(jié)構(gòu)和層次化結(jié)構(gòu)片上網(wǎng)絡(luò)的通訊性能與應(yīng)用的通訊特征間的關(guān)系,本文設(shè)計了一種混合層次化雙層結(jié)構(gòu)的多核通訊架構(gòu)。新通訊架構(gòu)為嵌入式多核處理器提供了充足的片上通訊帶寬,并兼顧了應(yīng)用通訊特征的多樣性。 2.仿真分析了電路交換路由器與支持虛擬通道的蟲孔交換路由器,在不同通訊特征下的通訊性能：電路交換路由器預(yù)先建立端到端的傳輸鏈路,鏈路建立后報文切片順次連續(xù)到達(dá),并且路由器面積較小,在長報文傳輸(切片數(shù)量為幾百個)時通訊性能可以接受,但是在短報文傳輸(切片數(shù)量為十幾個)時通訊性能較差；蟲孔交換路由器不能保證報文切片連續(xù)到達(dá)且面積稍大,但對于長/短報文傳輸均表現(xiàn)出優(yōu)異的通訊性能。上述結(jié)論可以用來指導(dǎo)片上網(wǎng)絡(luò)設(shè)計中路由器的選擇。 3.提出了一種支持虛擬電路的電路交換路由器。針對已有電路交換路由器鏈路利用率較低的不足,本文研究了一種支持虛擬電路的電路交換路由器。實驗表明,新的路由器設(shè)計能夠有效的降低報文傳輸延遲并提高飽和注入率。 4.采用定常結(jié)構(gòu)的FFT運算流圖提出了一種無存儲訪問沖突的基2×K并行FFT架構(gòu)。該架構(gòu)通過并行地址產(chǎn)生算法,使K個基2蝶形運算單元同時讀取或?qū)懭胨璧?K個操作數(shù),達(dá)到平均每周期完成K個基2蝶式運算的處理能力。與已有的并行FFT架構(gòu)相比,地址映射算法易于硬件實現(xiàn)。并行地址產(chǎn)生部件由一個計數(shù)器和共4K個二選一多路選擇器組成,結(jié)構(gòu)簡單,并且對于不同K值,并行地址產(chǎn)生部件結(jié)構(gòu)相同,可以方便的根據(jù)FFT運算的速度要求設(shè)計不同并行度的FFT處理器,具有很好的可擴展性。在資源消耗方面,不考慮旋轉(zhuǎn)因子,對于N點的FFT,通常采用定常結(jié)構(gòu)的FFT處理器需要2N個存儲單元,而本文提出的FFT處理器只需要3N/2個存儲單元。 5.針對多核芯片組協(xié)同工作問題,本文提出了一種面向多核芯片組的任務(wù)映射算法,以及一種具有普適性的多核芯片通訊方案。板級互連總線的通訊帶寬較小,并且受芯片管腳個數(shù)限制,板級的數(shù)據(jù)鏈路個數(shù)有限,采用面向多核芯片組的任務(wù)映射算法可以有效減少芯片間的任務(wù)通訊量。同時,針對報文數(shù)據(jù)在多核芯片組中的傳輸問題,本文還提出了一種多核芯片通訊方案。該方案具有普適性,不受多核芯片的數(shù)量、拓?fù)浣Y(jié)構(gòu)和路由算法限制,并且易于硬件實現(xiàn)。 6.在上述研究成果的基礎(chǔ)上,本文設(shè)計了一款SAR實時成像多核原型系統(tǒng)。原型系統(tǒng)主要包括4顆Xilinx Virtex-6-550T FPGA芯片以及一些存儲、接口和電源管理芯片。4顆FPGA芯片均采用本文提出的嵌入式多核處理器體系架構(gòu)設(shè)計。原型系統(tǒng)流水處理雷達(dá)回波數(shù)據(jù),工作頻率在80MHz時,能夠在18秒內(nèi)得到一幅4096×2048點的256級灰度SAR圖像,并且原型系統(tǒng)的輸出圖像與PC得到的原始圖像間的差別可以忽略,成像質(zhì)量很好。
[Abstract]:Embedded applications have been extended from the early industrial control field to media processing, information processing as a computing intensive application field and higher requirements for the performance of embedded microprocessors. At the same time, with the progress of VLSI technology, the road to improve the performance of the processor simply depends on the improvement of the main frequency and the performance of the processor. The advanced processor architecture, represented by multi core processors, has become the main way to improve the performance of the processor and meet the increasing application requirements. With the progress of technology, the embedded multi-core processor has developed rapidly, but still faces a series of scientific and technical problems to be solved. Therefore, the embedded system is embedded in the process. Research on the key technologies of design and implementation of multi-core processor has important theoretical and practical significance.
Synthetic Aperture Radar (SAR) is a typical computing intensive embedded application, and has important application value in military, economic and environmental fields. This paper, taking SAR real-time imaging application as an example, explored the multi-core framework design method for high performance computing field, focusing on the design and implementation of the architecture and application. In order to meet the requirements of high computing capability for high performance embedded applications, this paper proposes a "task cluster" based processor architecture model, and designs an embedded multi core processor architecture based on this model. By discussing a single layer structure and a hierarchical structure on chip network. The relationship between the communication performance and the communication characteristics of the application, this paper also designs a multi core communication architecture of the double layer hybrid structure, and studies the selection of the router type and the architecture design of the router in the communication architecture..FFT is the main operation task in the SAR imaging application. In order to speed up the FFT operation, this paper proposes a new method. The high performance parallel FFT processing architecture. Aiming at the problem of multi core chipset cooperative work, this paper presents a task mapping algorithm for multi core chipset and a universal multi-core chip communication scheme. Finally, based on the above research results, a SAR real-time imaging embedded multi-core prototype system is designed. The research work of this article is confirmed.
The main achievements of this paper are as follows:
1. a kind of processor architecture model based on "task cluster" is proposed, and an embedded multi core processor architecture is designed based on the model. The communication architecture uses a double layer hybrid structure. For high performance embedded applications, the processor architecture model based on "task cluster" is a subdivision scheme. By discussing the relationship between the communication performance of the single layer structure and the hierarchical structure and the communication features of the application, this paper designs a multi layer multi-core communication architecture with mixed hierarchical structure. The new communication architecture provides the embedded multi-core processor. The communication bandwidth of the chip is taken into account, and the diversity of application communication features is taken into account.
2. simulation and analysis of the network switching router and the worm hole switching router supporting the virtual channel, the communication performance under the different communication characteristics: the circuit switching router establishes the end to end transmission link in advance. After the link is established, the packet slicing is continuous and continuous, and the router face product is small, and the length of the long message is hundreds of slices. The communication performance is acceptable while the communication performance is acceptable, but the communication performance is poor in the short message transmission. The wormhole switching router can not guarantee the continuous arrival of the message slice and the area is slightly larger, but it shows excellent communication performance for the long / short message transmission. The last conclusion can be used to guide the routing in the network design. The choice of the device.
3. a circuit switching router that supports virtual circuits is proposed. In this paper, a circuit switching router supporting virtual circuits is studied in this paper. The experiment shows that the design of the new router can effectively reduce the delay of message transmission and increase the saturation injection rate.
4. a base 2 * K parallel FFT architecture with no storage access conflict is proposed by using the constant structure of FFT flow graph. The architecture uses parallel address generation algorithm to read or write the 2K operand of K base 2 butterfly operation units at the same time, and achieves the processing ability of K base 2 butterfly operation on an average per cycle. And the existing parallel FFT Compared with the architecture, the address mapping algorithm is easy to implement. The parallel address generation component is composed of a counter and a common 4K two selector. The structure is simple, and for different K values, the parallel address generation component is the same. It is convenient to design FFT processors with different parallelism according to the speed of FFT operation. Good scalability. In terms of resource consumption, the rotation factor is not considered. For the FFT of the N point, the normally structured FFT processor needs 2N storage units, and the FFT processor proposed in this paper requires only 3N/2 storage units.
5. aiming at the problem of multi core chipset cooperative work, this paper proposes a task mapping algorithm for multi core chipset and a universal multi-core chip communication scheme. The communication bandwidth of the board level interconnection bus is small, and the number of the chip foot is limited. The number of the data link number of the board level is limited, and the multi core chip is used for the multi core chip group. The task mapping algorithm can effectively reduce the amount of communication between chips. At the same time, a multi core chip communication scheme is proposed for the transmission of message data in multi core chipset. This scheme is universal, not subject to the number of multi-core chips, topology and path constraints, and easy to implement in hardware.
6. on the basis of the above research results, this paper designs a SAR real-time imaging multi-core prototype system. The prototype system mainly includes 4 Xilinx Virtex-6-550T FPGA chips and some storage, and the interface and power management chip.4 FPGA chips are designed by the embedded multi-core processor architecture proposed in this paper. The radar echo data, when working frequency is 80MHz, can get a 4096 x 2048 point gray SAR image in 18 seconds, and the difference between the output image of the prototype system and the original image obtained by PC can be ignored, and the imaging quality is very good.
【學(xué)位授予單位】：合肥工業(yè)大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2012
【分類號】：TN957.52;TP332

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 劉建;陳杰;敖天勇;許漢荊;;片上異構(gòu)多核DSP同步與通信的實現(xiàn)[J];電子科技大學(xué)學(xué)報;2010年04期

2 黃寧;朱恩;榮瑜;;高速FFT芯片設(shè)計及結(jié)構(gòu)研究[J];電子器件;2008年02期

3 楊盛光;李麗;高明倫;張宇昂;;面向能耗和延時的NoC映射方法[J];電子學(xué)報;2008年05期

4 楊際祥;譚國真;王榮生;;多核軟件的幾個關(guān)鍵問題及其研究進(jìn)展[J];電子學(xué)報;2010年09期

5 盧世祥,韓松,王巖飛;合成孔徑雷達(dá)實時成像轉(zhuǎn)置存儲器的兩頁式結(jié)構(gòu)與實現(xiàn)[J];電子與信息學(xué)報;2005年08期

6 齊子初;劉慧;石小兵;韓銀和;;龍芯3號多核處理器的低功耗測試技術(shù)[J];計算機輔助設(shè)計與圖形學(xué)學(xué)報;2010年11期

7 尹亞明,李瓊,郭御風(fēng),劉光明;新型高性能RapidIO互連技術(shù)研究[J];計算機工程與科學(xué);2004年12期

8 閻鳴生,茅于海;定常結(jié)構(gòu)FFT算法[J];計算機學(xué)報;1989年07期

9 謝應(yīng)科,侯紫峰,韓承德;基2×2FFT的地址映射算法[J];計算機學(xué)報;2000年10期

10 馬余泰;FFT處理器無沖突地址生成方法[J];計算機學(xué)報;1995年11期

相關(guān)博士學(xué)位論文前2條

1 郭建軍;同步數(shù)據(jù)觸發(fā)體系結(jié)構(gòu)多核處理器存儲系統(tǒng)關(guān)鍵技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2008年

2 賴明澈;同步數(shù)據(jù)觸發(fā)多核處理器體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2008年

，

本文編號：1926990

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1926990.html

上一篇：計算機顯示系統(tǒng)電磁信息泄漏的檢測與分析
下一篇：云數(shù)據(jù)中心高能效的虛擬機遷移整合算法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

嵌入式多核處理器設(shè)計與實現(xiàn)關(guān)鍵技術(shù)研究