天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 計算機論文 >

面向可重構(gòu)系統(tǒng)的軟硬件劃分技術(shù)研究

發(fā)布時間:2018-04-21 10:07

  本文選題:可重構(gòu)計算系統(tǒng) + 現(xiàn)場可編程門陣列 ; 參考:《哈爾濱工程大學(xué)》2013年博士論文


【摘要】:基于現(xiàn)場可編程門陣列的可重構(gòu)計算系統(tǒng)兼有通用處理器的靈活性和現(xiàn)場可編程門陣列的高效性,所以在高性能計算領(lǐng)域中正在被廣泛應(yīng)用。一個高效率的軟硬件劃分算法能夠?qū)?yīng)用程序自動而有效地分配到通用處理器和現(xiàn)場可編程門陣列上,可以使兩種運算部件最大限度地發(fā)揮出各自計算模式的優(yōu)勢,因此,對軟硬件劃分的研究正逐漸成為可重構(gòu)計算系統(tǒng)領(lǐng)域的研究熱點?v觀國內(nèi)外研究現(xiàn)狀,對軟硬件劃分的研究已經(jīng)取得了很多成果,但仍存在許多亟待解決的問題。在前人工作的基礎(chǔ)上,本文以現(xiàn)場可編程門陣列的面積作為約束條件,以系統(tǒng)整體性能作為優(yōu)化目標(biāo),設(shè)計了一種面向中央處理器/現(xiàn)場可編程門陣列的可重構(gòu)加速系統(tǒng)的軟硬件劃分框架。該框架的主體由三大主要功能模塊組成,在每個模塊中,分別對應(yīng)用程序片段在中央處理器和現(xiàn)場可編程門陣列上實現(xiàn)時花費代價的估計、以及軟硬件劃分算法等關(guān)鍵技術(shù)進行了深入研究,希望上述框架不僅能夠確定程序片段是放在中央處理器上或是現(xiàn)場可編程門陣列上運行,并且能對被選中放在現(xiàn)場可編程門陣列上運行的每個程序片段(例如循環(huán))可能的多個硬件版本進行確定,以得到盡可能佳的劃分解決方案。具體的研究內(nèi)容包括:在計算密集型應(yīng)用程序中循環(huán)部分往往是其主要的工作負載,經(jīng)過分析,采用傳統(tǒng)面向循環(huán)的靜態(tài)分析技術(shù)無法得到循環(huán)執(zhí)行次數(shù)等動態(tài)信息;而采用邊剖析等動態(tài)分析技術(shù)雖能得到程序片段的執(zhí)行次數(shù)等信息,但卻不能判定該程序片段是否是循環(huán)結(jié)構(gòu),針對這種情況,本文將基于支配關(guān)系的循環(huán)識別技術(shù)和邊剖析的分析技術(shù)相結(jié)合,設(shè)計了一種動靜態(tài)結(jié)合的循環(huán)運行時分析算法,并在LLVM平臺上實現(xiàn)。實驗結(jié)果表明,該算法既能夠自動識別所有循環(huán)結(jié)構(gòu),又能對循環(huán)部分的平均迭代次數(shù)、循環(huán)調(diào)用次數(shù)、循環(huán)軟件運行時間及在現(xiàn)場可編程門陣列上實現(xiàn)時軟硬件間通信開銷等進行精確分析,進而為可重構(gòu)計算系統(tǒng)待加速循環(huán)的選擇提供較全面、精確的依據(jù)。在可重構(gòu)計算系統(tǒng)的高層次設(shè)計過程中,采用估計技術(shù)獲取硬件實現(xiàn)及執(zhí)行時的性能參數(shù)是一種快速可行的方法。但是現(xiàn)有的高層次硬件執(zhí)行時間/面積估計方法往往與特定的硬件實現(xiàn)環(huán)境(例如現(xiàn)場可編程門陣列的某種結(jié)構(gòu)及其使用的工具鏈屬性等)相關(guān),通用性差;另外,對循環(huán)實現(xiàn)時可能的多個版本的硬件實現(xiàn)代價的估計也支持不足。針對通用性差的問題,本文在評估時首先根據(jù)程序語言中不同的運算表達式,結(jié)合其通常的電路實現(xiàn)模式,推導(dǎo)出一整套與實現(xiàn)環(huán)境無關(guān)的針對每個運算的硬件執(zhí)行時間/面積估計公式,再利用真實反饋信息對推導(dǎo)出的估計公式進行修正,使其可以適用于各種不同的實現(xiàn)環(huán)境;針對硬件多版本的估計支持不足的問題,設(shè)計了一種面向多版本的細化到以運算操作為基本單位的參數(shù)輸入統(tǒng)一接口,再結(jié)合各個運算操作經(jīng)過修正后的估計公式,構(gòu)建了一種面向循環(huán)在FPGA上實現(xiàn)時多版本特征的估計算法。該方法能夠快速、精確估計出不同程序片段在FPGA上實現(xiàn)時的硬件執(zhí)行時間/面積,尤其能夠?qū)ρh(huán)實現(xiàn)時各個不同硬件版本的執(zhí)行時間/面積進行估計,為硬件多版本設(shè)計空間探索和軟硬件劃分提供了精確的信息支持。承上所述,目前在RCS領(lǐng)域已經(jīng)有很多軟硬件劃分算法的成果,但這些方法通常默認循環(huán)在FPGA上實現(xiàn)時只有一種硬件實現(xiàn)方式,忽略了循環(huán)的硬件多版本特征,降低了劃分解的質(zhì)量。另外在基于CPU/FPGA的可重構(gòu)加速系統(tǒng)中,通信開銷往往是系統(tǒng)整體性能的瓶頸。針對以上兩種情況,本文首先構(gòu)建了一個帶有硬件多版本特征的軟硬件劃分模型,然后面向軟硬件間通信開銷最優(yōu)對循環(huán)進行分簇,并依據(jù)分簇的結(jié)果對劃分模型中的優(yōu)化目標(biāo)函數(shù)進行更新,最后從全局優(yōu)化的角度,采用以浮點數(shù)編碼的遺傳算法來進行求解,從而形成了本文設(shè)計的一種帶有硬件多版本探索和劃分粒度優(yōu)化再選擇的軟硬件劃分算法。通過該算法,不僅可以確定程序中某循環(huán)片段應(yīng)該放在CPU或在FPGA上實現(xiàn),而且還可以確定循環(huán)在FPGA上實現(xiàn)的較佳硬件版本形式,從全局性能最優(yōu)的角度提高了軟硬件劃分解的質(zhì)量。實驗結(jié)果表明,采用遺傳算法求解帶有硬件多版本探索及劃分粒度再選擇的軟硬件劃分問題得到了較好的效果,但隨著待劃分集合的規(guī)模增大,遺傳算法較弱的局部搜索能力又會影響劃分解的質(zhì)量。經(jīng)過分析,發(fā)現(xiàn)在選擇,交叉,變異算子中,遺傳算法的局部搜索能力在很大程度上依靠變異算子,該算子傳統(tǒng)上采用的隨機變異策略容易對優(yōu)秀的染色體造成破壞,產(chǎn)生較差的個體。因此本文在上述遺傳算法的基礎(chǔ)上,經(jīng)過改進,又設(shè)計了一種性能更佳的基于Q-學(xué)習(xí)和遺傳算法的面向硬件多版本探索的軟硬件劃分算法。依據(jù)硬件多版本的性能、面積的矛盾特征、將Q-學(xué)習(xí)算法和貪婪規(guī)則相結(jié)合,自適應(yīng)選擇合適的變異方向,成為改進后遺傳算法的明顯特征。實驗結(jié)果表明,與標(biāo)準(zhǔn)遺傳算法相比,改進算法在搜索質(zhì)量、收斂性方面都具有良好的效果,增強了針對硬件多版本探索的局部搜索能力,進一步提高了軟硬件劃分解的質(zhì)量。
[Abstract]:The reconfigurable computing system based on the field programmable gate array has the flexibility of the universal processor and the efficiency of the field programmable gate array, so it is being widely used in the field of high performance computing. An efficient software and hardware partition algorithm can automatically and effectively allocate the application to the general purpose processor and the field available. In the range gate array, the two operating components can maximize the advantages of their respective computing modes. Therefore, the research on the partition of hardware and software is becoming a hot topic in the field of reconfigurable computing systems. On the basis of the previous work, this paper takes the area of the field programmable gate array as the constraint condition, and takes the overall performance of the system as the optimization target, and designs a software and hardware partition frame for the reconfigurable acceleration system oriented to the central processor / field programmable gate array. The main body of the framework is composed of three main functional modules. In each module, the cost estimation of the application fragment in the central processor and the field programmable gate array, as well as the hardware and software partitioning algorithms are studied in depth. It is hoped that the framework not only can determine the program fragment on the central processor or the field programmable gate array. It runs on the column, and can determine the possible multiple hardware versions of each program fragment (such as loops) that are selected to run on the field programmable gate array to get the best possible partition solutions. The specific research content includes that the loop part in the computing intensive application is often its main workload, After analysis, the traditional cyclic static analysis technology can not get the dynamic information such as the number of cyclic execution. While the dynamic analysis technology such as edge analysis can get the information of the execution times of the program fragment, but it can not determine whether the program fragment is a cyclic structure. Combining ring recognition and edge analysis, a dynamic and static cyclic running time analysis algorithm is designed and implemented on the LLVM platform. The experimental results show that the algorithm can automatically identify all the cyclic structures, the average iteration number of the cyclic parts, the number of cycle calls, the running time of the cyclic software and the The communication overhead between hardware and software is accurately analyzed in the field programmable gate array, which provides a more comprehensive and accurate basis for the selection of the reconfigurable computing system to be accelerated cycle. In the high level design process of the reconfigurable computing system, it is a fast to use the estimation technique to obtain the performance parameters of the hard parts and execution. But the existing high level hardware execution time / area estimation methods are often related to the specific hardware implementation environment, such as some structure of the field programmable gate array and the tool chain properties used, and the generality is poor; in addition, the estimation of the hardware implementation costs of possible multiple versions of the loop is also supported. In order to solve the problem of poor generality, this paper first derives a set of formulae for estimating the execution time / area of each operation based on the different operational expressions in the program language and the common circuit implementation mode, and then uses the real feedback information to deduce the estimated formula. To make it correct, it can be applied to a variety of different implementation environment. Aiming at the problem of insufficient support for multi version of hardware, a kind of unified interface is designed for the parameter input of multi version to the basic unit of operation operation, and then a kind of circular orientation is constructed by combining the modified estimation formula of each operation operation. An estimation algorithm for multi version features implemented on FPGA. This method can quickly and accurately estimate the execution time / area of the hardware when different program segments are implemented on the FPGA, and it is especially able to estimate the execution time / area of different hardware versions when the cycle is implemented. There are many software and hardware partitioning algorithms in the RCS field, but these methods usually have only one hardware implementation in the FPGA implementation, ignoring the multi version features of the loop hardware and reducing the quality of the decomposition. In addition, the reconfigurable acceleration based on the CPU/FPGA is made. In the system, communication overhead is often the bottleneck of the overall performance of the system. In this paper, a software and hardware partition model with multi version features of hardware is constructed first, and then the optimal communication overhead between hardware and software is optimized to cluster the cycle, and the optimization objective function in the partition model is more based on the results of the cluster. Finally, from the point of view of global optimization, a genetic algorithm based on floating point number coding is used to solve the problem. Thus, a software and hardware partition algorithm with hardware multi version exploration and granularity optimization and re selection is formed in this paper. Through this algorithm, it can not only determine a cyclic fragment in the sequence of CPU or FPGA. In addition, the better hardware version of the loop on FPGA can be determined, and the quality of the software and hardware decomposition is improved from the point of the best global performance. The experimental results show that the application of genetic algorithm to the problem of hardware and software partitioning with multiple versions of hardware and the partition of granularity and re selection is better. The size of the partition set increases, and the weak local search ability of the genetic algorithm will affect the quality of the decomposition. It is found that the local search ability of the genetic algorithm depends largely on the mutation operator in the selection, cross and mutation operators. The traditional random mutation strategy used in the operator is easy to cause the excellent chromosomes. In this paper, on the basis of the above genetic algorithm, this paper designs a software and hardware partition algorithm based on Q- learning and genetic algorithm, which is based on the performance of the hardware and the contradictory features of the area, and combines the Q- learning algorithm with the greedy rule. The experimental results show that, compared with the standard genetic algorithm, the improved algorithm has a good effect on the search quality and convergence, and enhances the local search capability for the exploration of the hardware multi version, and further improves the quality of the software and hardware decomposition.

【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2013
【分類號】:TP38

【相似文獻】

相關(guān)期刊論文 前10條

1 陳楨;;大規(guī)模嵌入式系統(tǒng)軟硬件劃分方法分析[J];無線互聯(lián)科技;2013年01期

2 張魯峰,李思昆,劉功杰;嵌入式系統(tǒng)軟硬件劃分方法研究[J];計算機應(yīng)用;2000年S1期

3 彭藝頻,凌明,楊軍;性能受限系統(tǒng)的軟硬件劃分方法[J];東南大學(xué)學(xué)報(自然科學(xué)版);2004年06期

4 彭藝頻,凌明,楊軍;基于資源受限的軟硬件劃分方法[J];電路與系統(tǒng)學(xué)報;2005年03期

5 曹云;邊計年;吳強;;改進多路軟硬件劃分算法的篩選法[J];微電子學(xué)與計算機;2007年01期

6 高健;李濤;;三種軟硬件劃分算法的比較分析[J];計算機工程與設(shè)計;2007年14期

7 張樂;項安;;基于遺傳算法的軟硬件劃分方法[J];電腦編程技巧與維護;2010年14期

8 郭榮佐;黃君;王霖;;基于π網(wǎng)的嵌入式系統(tǒng)軟硬件劃分方法[J];計算機應(yīng)用;2012年03期

9 陳書敏;;基于π網(wǎng)的嵌入式系統(tǒng)軟硬件劃分方法[J];硅谷;2013年15期

10 趙敏媛,呂釗,顧君忠;嵌入式系統(tǒng)的軟硬件劃分[J];微計算機應(yīng)用;2005年03期

相關(guān)會議論文 前4條

1 吳百鋒;彭澄廉;孫曉光;;面向數(shù)據(jù)處理領(lǐng)域嵌入式系統(tǒng)在實時性約束條件下的軟硬件劃分[A];全國第十五屆計算機科學(xué)與技術(shù)應(yīng)用學(xué)術(shù)會議論文集[C];2003年

2 吳強;邊計年;薛宏熙;;基于抽象體系結(jié)構(gòu)模板的多路軟硬件劃分算法[A];全國第13屆計算機輔助設(shè)計與圖形學(xué)(CAD/CG)學(xué)術(shù)會議論文集[C];2004年

3 高豐;劉鵬;姚慶棟;;基于系統(tǒng)集成芯片的RTOS的軟硬件劃分算法的研究[A];第十屆全國信號處理學(xué)術(shù)年會(CCSP-2001)論文集[C];2001年

4 晏陽;;基于ESL的軟硬件劃分在AVS熵解碼器中的應(yīng)用[A];2009通信理論與技術(shù)新發(fā)展——第十四屆全國青年通信學(xué)術(shù)會議論文集[C];2009年

相關(guān)博士學(xué)位論文 前7條

1 牛曉霞;面向可重構(gòu)系統(tǒng)的軟硬件劃分技術(shù)研究[D];哈爾濱工程大學(xué);2013年

2 余娟;分布估計算法研究及其在軟硬件劃分中的應(yīng)用[D];西北工業(yè)大學(xué);2015年

3 彭藝頻;面向多媒體應(yīng)用的軟硬件劃分方法研究[D];東南大學(xué);2005年

4 全浩軍;盲優(yōu)化軟硬件劃分技術(shù)研究[D];天津大學(xué);2013年

5 馬天義;低功耗軟硬件劃分算法研究[D];哈爾濱工業(yè)大學(xué);2009年

6 桑勝田;基于相關(guān)性的SoC軟硬件劃分技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2010年

7 郭天天;嵌入式系統(tǒng)軟硬件劃分技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2006年

相關(guān)碩士學(xué)位論文 前10條

1 王雷;基于貓群算法的SoC軟硬件劃分研究[D];西安電子科技大學(xué);2014年

2 黨林玉;可重構(gòu)高效能計算系統(tǒng)中軟硬件協(xié)同技術(shù)研究[D];解放軍信息工程大學(xué);2014年

3 韓宏業(yè);基于人工蜂群算法的軟硬件劃分算法研究[D];天津大學(xué);2014年

4 蔡曉;基于混洗蛙跳的軟硬件劃分算法的研究與實現(xiàn)[D];天津大學(xué);2014年

5 李炳巖;基于遺傳和陰性選擇的混合軟硬件劃分方法[D];西安電子科技大學(xué);2015年

6 余益科;動態(tài)軟硬件劃分關(guān)鍵技術(shù)的研究[D];天津大學(xué);2016年

7 杜敏;嵌入式系統(tǒng)軟硬件劃分方法的研究[D];哈爾濱理工大學(xué);2008年

8 刁雙君;基于大規(guī)模嵌入式系統(tǒng)軟硬件劃分方法的研究[D];哈爾濱理工大學(xué);2010年

9 周雁;基于遺傳和粒子群優(yōu)化算法的軟硬件劃分方法研究[D];華東師范大學(xué);2011年

10 趙全偉;面向可重構(gòu)系統(tǒng)芯片的軟硬件劃分方法研究[D];湖南大學(xué);2011年

,

本文編號:1781973

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1781973.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9f907***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com