當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

面向可重構(gòu)系統(tǒng)的軟硬件劃分技術(shù)研究

發(fā)布時(shí)間：2018-04-21 10:07

本文選題：可重構(gòu)計(jì)算系統(tǒng) + 現(xiàn)場(chǎng)可編程門陣列�。� 參考：《哈爾濱工程大學(xué)》2013年博士論文

【摘要】：基于現(xiàn)場(chǎng)可編程門陣列的可重構(gòu)計(jì)算系統(tǒng)兼有通用處理器的靈活性和現(xiàn)場(chǎng)可編程門陣列的高效性,所以在高性能計(jì)算領(lǐng)域中正在被廣泛應(yīng)用。一個(gè)高效率的軟硬件劃分算法能夠?qū)?yīng)用程序自動(dòng)而有效地分配到通用處理器和現(xiàn)場(chǎng)可編程門陣列上,可以使兩種運(yùn)算部件最大限度地發(fā)揮出各自計(jì)算模式的優(yōu)勢(shì),因此,對(duì)軟硬件劃分的研究正逐漸成為可重構(gòu)計(jì)算系統(tǒng)領(lǐng)域的研究熱點(diǎn)�？v觀國(guó)內(nèi)外研究現(xiàn)狀,對(duì)軟硬件劃分的研究已經(jīng)取得了很多成果,但仍存在許多亟待解決的問(wèn)題。在前人工作的基礎(chǔ)上,本文以現(xiàn)場(chǎng)可編程門陣列的面積作為約束條件,以系統(tǒng)整體性能作為優(yōu)化目標(biāo),設(shè)計(jì)了一種面向中央處理器/現(xiàn)場(chǎng)可編程門陣列的可重構(gòu)加速系統(tǒng)的軟硬件劃分框架。該框架的主體由三大主要功能模塊組成,在每個(gè)模塊中,分別對(duì)應(yīng)用程序片段在中央處理器和現(xiàn)場(chǎng)可編程門陣列上實(shí)現(xiàn)時(shí)花費(fèi)代價(jià)的估計(jì)、以及軟硬件劃分算法等關(guān)鍵技術(shù)進(jìn)行了深入研究,希望上述框架不僅能夠確定程序片段是放在中央處理器上或是現(xiàn)場(chǎng)可編程門陣列上運(yùn)行,并且能對(duì)被選中放在現(xiàn)場(chǎng)可編程門陣列上運(yùn)行的每個(gè)程序片段(例如循環(huán))可能的多個(gè)硬件版本進(jìn)行確定,以得到盡可能佳的劃分解決方案。具體的研究?jī)?nèi)容包括:在計(jì)算密集型應(yīng)用程序中循環(huán)部分往往是其主要的工作負(fù)載,經(jīng)過(guò)分析,采用傳統(tǒng)面向循環(huán)的靜態(tài)分析技術(shù)無(wú)法得到循環(huán)執(zhí)行次數(shù)等動(dòng)態(tài)信息;而采用邊剖析等動(dòng)態(tài)分析技術(shù)雖能得到程序片段的執(zhí)行次數(shù)等信息,但卻不能判定該程序片段是否是循環(huán)結(jié)構(gòu),針對(duì)這種情況,本文將基于支配關(guān)系的循環(huán)識(shí)別技術(shù)和邊剖析的分析技術(shù)相結(jié)合,設(shè)計(jì)了一種動(dòng)靜態(tài)結(jié)合的循環(huán)運(yùn)行時(shí)分析算法,并在LLVM平臺(tái)上實(shí)現(xiàn)。實(shí)驗(yàn)結(jié)果表明,該算法既能夠自動(dòng)識(shí)別所有循環(huán)結(jié)構(gòu),又能對(duì)循環(huán)部分的平均迭代次數(shù)、循環(huán)調(diào)用次數(shù)、循環(huán)軟件運(yùn)行時(shí)間及在現(xiàn)場(chǎng)可編程門陣列上實(shí)現(xiàn)時(shí)軟硬件間通信開(kāi)銷等進(jìn)行精確分析,進(jìn)而為可重構(gòu)計(jì)算系統(tǒng)待加速循環(huán)的選擇提供較全面、精確的依據(jù)。在可重構(gòu)計(jì)算系統(tǒng)的高層次設(shè)計(jì)過(guò)程中,采用估計(jì)技術(shù)獲取硬件實(shí)現(xiàn)及執(zhí)行時(shí)的性能參數(shù)是一種快速可行的方法。但是現(xiàn)有的高層次硬件執(zhí)行時(shí)間/面積估計(jì)方法往往與特定的硬件實(shí)現(xiàn)環(huán)境(例如現(xiàn)場(chǎng)可編程門陣列的某種結(jié)構(gòu)及其使用的工具鏈屬性等)相關(guān),通用性差;另外,對(duì)循環(huán)實(shí)現(xiàn)時(shí)可能的多個(gè)版本的硬件實(shí)現(xiàn)代價(jià)的估計(jì)也支持不足。針對(duì)通用性差的問(wèn)題,本文在評(píng)估時(shí)首先根據(jù)程序語(yǔ)言中不同的運(yùn)算表達(dá)式,結(jié)合其通常的電路實(shí)現(xiàn)模式,推導(dǎo)出一整套與實(shí)現(xiàn)環(huán)境無(wú)關(guān)的針對(duì)每個(gè)運(yùn)算的硬件執(zhí)行時(shí)間/面積估計(jì)公式,再利用真實(shí)反饋信息對(duì)推導(dǎo)出的估計(jì)公式進(jìn)行修正,使其可以適用于各種不同的實(shí)現(xiàn)環(huán)境;針對(duì)硬件多版本的估計(jì)支持不足的問(wèn)題,設(shè)計(jì)了一種面向多版本的細(xì)化到以運(yùn)算操作為基本單位的參數(shù)輸入統(tǒng)一接口,再結(jié)合各個(gè)運(yùn)算操作經(jīng)過(guò)修正后的估計(jì)公式,構(gòu)建了一種面向循環(huán)在FPGA上實(shí)現(xiàn)時(shí)多版本特征的估計(jì)算法。該方法能夠快速、精確估計(jì)出不同程序片段在FPGA上實(shí)現(xiàn)時(shí)的硬件執(zhí)行時(shí)間/面積,尤其能夠?qū)ρh(huán)實(shí)現(xiàn)時(shí)各個(gè)不同硬件版本的執(zhí)行時(shí)間/面積進(jìn)行估計(jì),為硬件多版本設(shè)計(jì)空間探索和軟硬件劃分提供了精確的信息支持。承上所述,目前在RCS領(lǐng)域已經(jīng)有很多軟硬件劃分算法的成果,但這些方法通常默認(rèn)循環(huán)在FPGA上實(shí)現(xiàn)時(shí)只有一種硬件實(shí)現(xiàn)方式,忽略了循環(huán)的硬件多版本特征,降低了劃分解的質(zhì)量。另外在基于CPU/FPGA的可重構(gòu)加速系統(tǒng)中,通信開(kāi)銷往往是系統(tǒng)整體性能的瓶頸。針對(duì)以上兩種情況,本文首先構(gòu)建了一個(gè)帶有硬件多版本特征的軟硬件劃分模型,然后面向軟硬件間通信開(kāi)銷最優(yōu)對(duì)循環(huán)進(jìn)行分簇,并依據(jù)分簇的結(jié)果對(duì)劃分模型中的優(yōu)化目標(biāo)函數(shù)進(jìn)行更新,最后從全局優(yōu)化的角度,采用以浮點(diǎn)數(shù)編碼的遺傳算法來(lái)進(jìn)行求解,從而形成了本文設(shè)計(jì)的一種帶有硬件多版本探索和劃分粒度優(yōu)化再選擇的軟硬件劃分算法。通過(guò)該算法,不僅可以確定程序中某循環(huán)片段應(yīng)該放在CPU或在FPGA上實(shí)現(xiàn),而且還可以確定循環(huán)在FPGA上實(shí)現(xiàn)的較佳硬件版本形式,從全局性能最優(yōu)的角度提高了軟硬件劃分解的質(zhì)量。實(shí)驗(yàn)結(jié)果表明,采用遺傳算法求解帶有硬件多版本探索及劃分粒度再選擇的軟硬件劃分問(wèn)題得到了較好的效果,但隨著待劃分集合的規(guī)模增大,遺傳算法較弱的局部搜索能力又會(huì)影響劃分解的質(zhì)量。經(jīng)過(guò)分析,發(fā)現(xiàn)在選擇,交叉,變異算子中,遺傳算法的局部搜索能力在很大程度上依靠變異算子,該算子傳統(tǒng)上采用的隨機(jī)變異策略容易對(duì)優(yōu)秀的染色體造成破壞,產(chǎn)生較差的個(gè)體。因此本文在上述遺傳算法的基礎(chǔ)上,經(jīng)過(guò)改進(jìn),又設(shè)計(jì)了一種性能更佳的基于Q-學(xué)習(xí)和遺傳算法的面向硬件多版本探索的軟硬件劃分算法。依據(jù)硬件多版本的性能、面積的矛盾特征、將Q-學(xué)習(xí)算法和貪婪規(guī)則相結(jié)合,自適應(yīng)選擇合適的變異方向,成為改進(jìn)后遺傳算法的明顯特征。實(shí)驗(yàn)結(jié)果表明,與標(biāo)準(zhǔn)遺傳算法相比,改進(jìn)算法在搜索質(zhì)量、收斂性方面都具有良好的效果,增強(qiáng)了針對(duì)硬件多版本探索的局部搜索能力,進(jìn)一步提高了軟硬件劃分解的質(zhì)量。
[Abstract]:The reconfigurable computing system based on the field programmable gate array has the flexibility of the universal processor and the efficiency of the field programmable gate array, so it is being widely used in the field of high performance computing. An efficient software and hardware partition algorithm can automatically and effectively allocate the application to the general purpose processor and the field available. In the range gate array, the two operating components can maximize the advantages of their respective computing modes. Therefore, the research on the partition of hardware and software is becoming a hot topic in the field of reconfigurable computing systems. On the basis of the previous work, this paper takes the area of the field programmable gate array as the constraint condition, and takes the overall performance of the system as the optimization target, and designs a software and hardware partition frame for the reconfigurable acceleration system oriented to the central processor / field programmable gate array. The main body of the framework is composed of three main functional modules. In each module, the cost estimation of the application fragment in the central processor and the field programmable gate array, as well as the hardware and software partitioning algorithms are studied in depth. It is hoped that the framework not only can determine the program fragment on the central processor or the field programmable gate array. It runs on the column, and can determine the possible multiple hardware versions of each program fragment (such as loops) that are selected to run on the field programmable gate array to get the best possible partition solutions. The specific research content includes that the loop part in the computing intensive application is often its main workload, After analysis, the traditional cyclic static analysis technology can not get the dynamic information such as the number of cyclic execution. While the dynamic analysis technology such as edge analysis can get the information of the execution times of the program fragment, but it can not determine whether the program fragment is a cyclic structure. Combining ring recognition and edge analysis, a dynamic and static cyclic running time analysis algorithm is designed and implemented on the LLVM platform. The experimental results show that the algorithm can automatically identify all the cyclic structures, the average iteration number of the cyclic parts, the number of cycle calls, the running time of the cyclic software and the The communication overhead between hardware and software is accurately analyzed in the field programmable gate array, which provides a more comprehensive and accurate basis for the selection of the reconfigurable computing system to be accelerated cycle. In the high level design process of the reconfigurable computing system, it is a fast to use the estimation technique to obtain the performance parameters of the hard parts and execution. But the existing high level hardware execution time / area estimation methods are often related to the specific hardware implementation environment, such as some structure of the field programmable gate array and the tool chain properties used, and the generality is poor; in addition, the estimation of the hardware implementation costs of possible multiple versions of the loop is also supported. In order to solve the problem of poor generality, this paper first derives a set of formulae for estimating the execution time / area of each operation based on the different operational expressions in the program language and the common circuit implementation mode, and then uses the real feedback information to deduce the estimated formula. To make it correct, it can be applied to a variety of different implementation environment. Aiming at the problem of insufficient support for multi version of hardware, a kind of unified interface is designed for the parameter input of multi version to the basic unit of operation operation, and then a kind of circular orientation is constructed by combining the modified estimation formula of each operation operation. An estimation algorithm for multi version features implemented on FPGA. This method can quickly and accurately estimate the execution time / area of the hardware when different program segments are implemented on the FPGA, and it is especially able to estimate the execution time / area of different hardware versions when the cycle is implemented. There are many software and hardware partitioning algorithms in the RCS field, but these methods usually have only one hardware implementation in the FPGA implementation, ignoring the multi version features of the loop hardware and reducing the quality of the decomposition. In addition, the reconfigurable acceleration based on the CPU/FPGA is made. In the system, communication overhead is often the bottleneck of the overall performance of the system. In this paper, a software and hardware partition model with multi version features of hardware is constructed first, and then the optimal communication overhead between hardware and software is optimized to cluster the cycle, and the optimization objective function in the partition model is more based on the results of the cluster. Finally, from the point of view of global optimization, a genetic algorithm based on floating point number coding is used to solve the problem. Thus, a software and hardware partition algorithm with hardware multi version exploration and granularity optimization and re selection is formed in this paper. Through this algorithm, it can not only determine a cyclic fragment in the sequence of CPU or FPGA. In addition, the better hardware version of the loop on FPGA can be determined, and the quality of the software and hardware decomposition is improved from the point of the best global performance. The experimental results show that the application of genetic algorithm to the problem of hardware and software partitioning with multiple versions of hardware and the partition of granularity and re selection is better. The size of the partition set increases, and the weak local search ability of the genetic algorithm will affect the quality of the decomposition. It is found that the local search ability of the genetic algorithm depends largely on the mutation operator in the selection, cross and mutation operators. The traditional random mutation strategy used in the operator is easy to cause the excellent chromosomes. In this paper, on the basis of the above genetic algorithm, this paper designs a software and hardware partition algorithm based on Q- learning and genetic algorithm, which is based on the performance of the hardware and the contradictory features of the area, and combines the Q- learning algorithm with the greedy rule. The experimental results show that, compared with the standard genetic algorithm, the improved algorithm has a good effect on the search quality and convergence, and enhances the local search capability for the exploration of the hardware multi version, and further improves the quality of the software and hardware decomposition.

【學(xué)位授予單位】：哈爾濱工程大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP38

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 陳楨;;大規(guī)模嵌入式系統(tǒng)軟硬件劃分方法分析[J];無(wú)線互聯(lián)科技;2013年01期

2 張魯峰,李思昆,劉功杰;嵌入式系統(tǒng)軟硬件劃分方法研究[J];計(jì)算機(jī)應(yīng)用;2000年S1期

3 彭藝頻,凌明,楊軍;性能受限系統(tǒng)的軟硬件劃分方法[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年06期

4 彭藝頻,凌明,楊軍;基于資源受限的軟硬件劃分方法[J];電路與系統(tǒng)學(xué)報(bào);2005年03期

5 曹云;邊計(jì)年;吳強(qiáng);;改進(jìn)多路軟硬件劃分算法的篩選法[J];微電子學(xué)與計(jì)算機(jī);2007年01期

6 高健;李濤;;三種軟硬件劃分算法的比較分析[J];計(jì)算機(jī)工程與設(shè)計(jì);2007年14期

7 張樂(lè);項(xiàng)安;;基于遺傳算法的軟硬件劃分方法[J];電腦編程技巧與維護(hù);2010年14期

8 郭榮佐;黃君;王霖;;基于π網(wǎng)的嵌入式系統(tǒng)軟硬件劃分方法[J];計(jì)算機(jī)應(yīng)用;2012年03期

9 陳書(shū)敏;;基于π網(wǎng)的嵌入式系統(tǒng)軟硬件劃分方法[J];硅谷;2013年15期

10 趙敏媛,呂釗,顧君忠;嵌入式系統(tǒng)的軟硬件劃分[J];微計(jì)算機(jī)應(yīng)用;2005年03期

相關(guān)會(huì)議論文前4條

1 吳百鋒;彭澄廉;孫曉光;;面向數(shù)據(jù)處理領(lǐng)域嵌入式系統(tǒng)在實(shí)時(shí)性約束條件下的軟硬件劃分[A];全國(guó)第十五屆計(jì)算機(jī)科學(xué)與技術(shù)應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2003年

2 吳強(qiáng);邊計(jì)年;薛宏熙;;基于抽象體系結(jié)構(gòu)模板的多路軟硬件劃分算法[A];全國(guó)第13屆計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)（CAD/CG）學(xué)術(shù)會(huì)議論文集[C];2004年

3 高豐;劉鵬;姚慶棟;;基于系統(tǒng)集成芯片的RTOS的軟硬件劃分算法的研究[A];第十屆全國(guó)信號(hào)處理學(xué)術(shù)年會(huì)（CCSP-2001）論文集[C];2001年

4 晏陽(yáng);;基于ESL的軟硬件劃分在AVS熵解碼器中的應(yīng)用[A];2009通信理論與技術(shù)新發(fā)展——第十四屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集[C];2009年

相關(guān)博士學(xué)位論文前7條

1 牛曉霞;面向可重構(gòu)系統(tǒng)的軟硬件劃分技術(shù)研究[D];哈爾濱工程大學(xué);2013年

2 余娟;分布估計(jì)算法研究及其在軟硬件劃分中的應(yīng)用[D];西北工業(yè)大學(xué);2015年

3 彭藝頻;面向多媒體應(yīng)用的軟硬件劃分方法研究[D];東南大學(xué);2005年

4 全浩軍;盲優(yōu)化軟硬件劃分技術(shù)研究[D];天津大學(xué);2013年

5 馬天義;低功耗軟硬件劃分算法研究[D];哈爾濱工業(yè)大學(xué);2009年

6 桑勝田;基于相關(guān)性的SoC軟硬件劃分技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2010年

7 郭天天;嵌入式系統(tǒng)軟硬件劃分技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2006年

相關(guān)碩士學(xué)位論文前10條

1 王雷;基于貓群算法的SoC軟硬件劃分研究[D];西安電子科技大學(xué);2014年

2 黨林玉;可重構(gòu)高效能計(jì)算系統(tǒng)中軟硬件協(xié)同技術(shù)研究[D];解放軍信息工程大學(xué);2014年

3 韓宏業(yè);基于人工蜂群算法的軟硬件劃分算法研究[D];天津大學(xué);2014年

4 蔡曉;基于混洗蛙跳的軟硬件劃分算法的研究與實(shí)現(xiàn)[D];天津大學(xué);2014年

5 李炳巖;基于遺傳和陰性選擇的混合軟硬件劃分方法[D];西安電子科技大學(xué);2015年

6 余益科;動(dòng)態(tài)軟硬件劃分關(guān)鍵技術(shù)的研究[D];天津大學(xué);2016年

7 杜敏;嵌入式系統(tǒng)軟硬件劃分方法的研究[D];哈爾濱理工大學(xué);2008年

8 刁雙君;基于大規(guī)模嵌入式系統(tǒng)軟硬件劃分方法的研究[D];哈爾濱理工大學(xué);2010年

9 周雁;基于遺傳和粒子群優(yōu)化算法的軟硬件劃分方法研究[D];華東師范大學(xué);2011年

10 趙全偉;面向可重構(gòu)系統(tǒng)芯片的軟硬件劃分方法研究[D];湖南大學(xué);2011年

，

本文編號(hào)：1781973

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1781973.html

上一篇：軟件定義數(shù)據(jù)中心網(wǎng)絡(luò)的穩(wěn)定性分析與主動(dòng)同步
下一篇：接口綜合設(shè)計(jì)實(shí)驗(yàn)平臺(tái)的設(shè)計(jì)與應(yīng)用

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向可重構(gòu)系統(tǒng)的軟硬件劃分技術(shù)研究