動(dòng)態(tài)二進(jìn)制翻譯建模及其并行化研究
發(fā)布時(shí)間:2018-04-08 13:24
本文選題:動(dòng)態(tài)二進(jìn)制翻譯 切入點(diǎn):間接分支 出處:《中國(guó)科學(xué)技術(shù)大學(xué)》2013年博士論文
【摘要】:隨著國(guó)產(chǎn)處理器的發(fā)展,特別是國(guó)產(chǎn)多核處理器的發(fā)展,解決軟件高效移植的問(wèn)題已經(jīng)成為新處理器能否占領(lǐng)市場(chǎng)的關(guān)鍵因素。二進(jìn)制代碼的兼容性是限制軟件移植關(guān)鍵問(wèn)題,也是限制新的體系結(jié)構(gòu)發(fā)展的重要障礙。動(dòng)態(tài)二進(jìn)制翻譯(Dynamic Binary Translation, DBT)技術(shù)作為一種跨平臺(tái)的動(dòng)態(tài)編譯技術(shù),為通過(guò)軟件的方法解決不同體系結(jié)構(gòu)之間的二進(jìn)制代碼兼容性提供了可能,也為程序動(dòng)態(tài)優(yōu)化和計(jì)算機(jī)虛擬化提供了新的方向。 由于現(xiàn)代硬件體系結(jié)構(gòu)的高度復(fù)雜性,不同的體系結(jié)構(gòu)之間存在著巨大的差異。動(dòng)態(tài)二進(jìn)制翻譯技術(shù)為了彌補(bǔ)這些硬件上差異,需要耗費(fèi)大量的額外開(kāi)銷進(jìn)行模擬,直接導(dǎo)致了動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)的性能遠(yuǎn)遠(yuǎn)低于本地程序的性能,阻礙了該技術(shù)的廣泛應(yīng)用。如何提高動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)的性能是該領(lǐng)域的核心研究問(wèn)題。由于多核平臺(tái)所具有的豐富計(jì)算資源,對(duì)傳統(tǒng)單線程動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)進(jìn)行并行化是當(dāng)前研究的熱點(diǎn)問(wèn)題。 本文在對(duì)龍芯處理器的動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)研究工作中,基于大量相關(guān)動(dòng)態(tài)運(yùn)行時(shí)系統(tǒng)的分析,為動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)的整個(gè)執(zhí)行過(guò)程構(gòu)建出“翻譯-執(zhí)行-查找”的動(dòng)態(tài)模型。通過(guò)該模型,也就可以把動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)和類似的運(yùn)行時(shí)系統(tǒng)簡(jiǎn)潔地劃分為翻譯模塊,執(zhí)行模塊和查找模塊。本文的研究?jī)?nèi)容和優(yōu)化方法也圍繞著這三個(gè)模塊,主要研究?jī)?nèi)容包括以下幾個(gè)方面: 1.歸納出動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)的“翻譯-執(zhí)行-查找”模型:本文在對(duì)大量的運(yùn)行時(shí)系統(tǒng)分析的基礎(chǔ)上,歸納出動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)執(zhí)行過(guò)程中的“翻譯-執(zhí)行-查找”的動(dòng)態(tài)模型。把動(dòng)態(tài)二進(jìn)制翻譯系劃分為翻譯模塊,執(zhí)行模塊和查找模塊,為進(jìn)一步研究工作提供準(zhǔn)確的優(yōu)化方向。 2.設(shè)計(jì)出一種帶私有緩存的間接分支目標(biāo)地址查找算法:在查找模塊中,間接分支指令的處理是動(dòng)態(tài)二進(jìn)制翻譯系統(tǒng)中的性能瓶頸,本文通過(guò)對(duì)間接分支目標(biāo)地址的局部性分析,提出利用私有緩存快速查找間接分支目標(biāo)地址的算法,有效減少了間接分支引起的上下文切換次數(shù)。 3.改進(jìn)了動(dòng)態(tài)二進(jìn)制多線程翻譯模型:在翻譯模塊中,本文詳細(xì)分析和對(duì)比了已有的多線程翻譯模型優(yōu)缺點(diǎn),提出了基于棧模式的預(yù)測(cè)算法并利用等待隊(duì)列來(lái)管理多個(gè)翻譯線程,提出內(nèi)存拷貝的方法改善分布式代碼緩存的局部性。利用這些優(yōu)化方法改善了多線程翻譯模型的性能。 4.提出了全寄存器直接映射方法:在執(zhí)行模塊中,寄存器的模擬器方式對(duì)于翻譯代碼膨脹率和被翻譯程序的性能有著至關(guān)重要的影響。本文綜合了基于內(nèi)存和直接映射的寄存器模擬方法,提出了在動(dòng)態(tài)二進(jìn)制翻譯機(jī)制中使用全部寄存器直接映射方法,并在此基礎(chǔ)上,進(jìn)行大量的中間代碼的翻譯規(guī)則簡(jiǎn)化,提高了后端代碼的翻譯質(zhì)量。 5.設(shè)計(jì)和實(shí)現(xiàn)基于龍芯平臺(tái)的原子指令模擬方法:在執(zhí)行模塊中,為了實(shí)現(xiàn)對(duì)于獨(dú)立計(jì)算單元的線程級(jí)并行化,本文基于已有的并行多核模擬器系統(tǒng),在龍芯平臺(tái)上設(shè)計(jì)并實(shí)現(xiàn)了線程級(jí)并行的多核模擬器。針對(duì)并行模擬器中內(nèi)存模擬問(wèn)題和原子指令模擬問(wèn)題,提出了基于龍芯體系結(jié)構(gòu)的解決方法。
[Abstract]:With the rapid development of the domestic processors, especially the domestic development of multi-core processors, solve the efficient software transplantation has become a key factor to occupy the market. The new processor binary code compatibility is the key problem restricting transplantation software, an important barrier also restricts the development of new architecture. Dynamic binary translation (Dynamic Binary, Translation, DBT) as a dynamic cross platform compiler technology, provides the possibility to solve the binary code compatibility between different system structure by software method, also state optimization and computer virtualization provides a new direction for the program.
Due to the high complexity of modern hardware architecture, there are great differences between different architectures. Dynamic binary translation technology in order to compensate for these differences requires additional hardware overhead, a lot of simulation, led directly to the performance of dynamic binary translation system is much lower than the performance of local procedures, hinder the wide application of the technology. How to improve the performance of dynamic binary translation system is the core issue in this field. Because the multi-core platform has rich computing resources, parallel to the traditional single thread dynamic binary translation system is a hot topic of current research.
Based on the research of dynamic binary translation system on the Godson processor in the analysis of a large number of relevant dynamic runtime system based on the entire implementation process for dynamic binary translation system to construct a dynamic model of translation - Implementation - Search ". Through this model, it can make the dynamic binary translation system and similar operation the system is divided into simple translation module, execution module and search module. The research content and the optimization method in this paper is focus on these three modules, the main contents include the following aspects:
1. summed up the dynamic binary translation system "translation do find" model: Based on a large number of runtime system on the basis of the analysis, summed up the dynamic binary translation system in the implementation of the "translation - Execution - Dynamic Model Search". The dynamic binary translation system is divided into translation module, execution module and a search module provides accurate optimization direction for further research work.
2. design an indirect branch target address lookup algorithm with private cache: the search module, processing of indirect branch instruction is the performance bottleneck in dynamic binary translation system, through the analysis of local indirect branch target address, proposed to use private cache to quickly find the indirect branch target address algorithm, effectively reduce the the number of context switches caused by indirect branches.
3. improved dynamic binary translation model: multi thread in the translation module, this paper analyzes and compares the advantages and disadvantages of existing multi thread translation model, prediction algorithm is proposed based on stack model and use the queue to manage multiple threads, local memory copy method is proposed to improve the utilization of these distributed code caching. The optimization method can improve the performance of multi thread translation model.
4. proposed full register direct mapping method: in the implementation of the module, the register simulator has a critical influence on the translation of code expansion efficiency and performance of the translated program. In this paper a comprehensive simulation method and direct memory mapped register based on the proposed dynamic binary translation mechanism used in all registers directly mapping method, and on this basis, a large number of intermediate code translation rules simplify, improve the translation quality of the backend code.
The simulation method of Loongson platform based on the design and implementation of 5. atomic instructions: in the implementation of the module, in order to achieve the independent thread level parallel computing unit, the parallel simulator system based on the design and implementation of multi-core simulator thread level parallelism on Loongson platform. The parallel simulator in memory simulation the problem and the atomic instruction simulation problems, and puts forward the solution of Loongson based architecture.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP332;TP391.2
【引證文獻(xiàn)】
相關(guān)期刊論文 前1條
1 蔣烈輝;陳慧超;董衛(wèi)宇;張彥文;;基于靜態(tài)寄存器分配的系統(tǒng)仿真協(xié)同優(yōu)化方法[J];計(jì)算機(jī)應(yīng)用;2014年05期
,本文編號(hào):1721822
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1721822.html
最近更新
教材專著