基于龍芯平臺(tái)的并行化動(dòng)態(tài)二進(jìn)制翻譯中無鎖隊(duì)列的研究
發(fā)布時(shí)間:2018-11-19 21:17
【摘要】:近年來,主流的桌面和服務(wù)器軟件均基于x86平臺(tái)開發(fā),但是龍芯是基于MIPS指令集的處理器,因此,如何讓現(xiàn)有x86平臺(tái)的軟件兼容MIPS架構(gòu),成為了國產(chǎn)芯片發(fā)展的重要問題。二進(jìn)制翻譯是實(shí)現(xiàn)x86軟件與龍芯cpu兼容的一種重要方法,目前龍芯平臺(tái)上主要使用QEMU作為全系統(tǒng)模擬器,它已能通過二進(jìn)制翻譯技術(shù)將windows XP操作系統(tǒng)運(yùn)行在龍芯平臺(tái)上,但其性能有待提高。 在處理器頻率發(fā)展步入2GHz以后,憑借提升處理器效率而帶來的性能提升愈加有限,為了保證摩爾定律繼續(xù)有效,多核處理器已經(jīng)成為潮流。但是現(xiàn)有的全系統(tǒng)模擬是串行模擬,只使用了主機(jī)處理器單個(gè)核心的資源,因此全系統(tǒng)模擬的并行化迫在眉睫。真正實(shí)現(xiàn)并行的系統(tǒng)級(jí)模擬,會(huì)大大提升機(jī)器的速度和性能,最終實(shí)現(xiàn)國產(chǎn)芯片的商業(yè)產(chǎn)業(yè)化。 如何讓龍芯平臺(tái)在模擬x86時(shí)能發(fā)揮其核心數(shù)量優(yōu)勢,漸漸成為研究的重點(diǎn)。目前基于QEMU的并行化全系統(tǒng)模擬在業(yè)內(nèi)已經(jīng)有人開始研究,比如PQEMU, HQEMU, COREMU都在不同角度對(duì)QEMU進(jìn)行并行化研究,但是這些并行的全系統(tǒng)模擬器都沒用使用龍芯平臺(tái)作為宿主機(jī)器。本文主要分析了QEMU的運(yùn)行原理,QEMU對(duì)SMP機(jī)器的模擬原理以及現(xiàn)有的并行化QEMU的實(shí)現(xiàn)手段。本文選擇將QEMU對(duì)SMP機(jī)器的模擬由串行改為并行,把QEMU對(duì)SMP機(jī)器不同核的模擬邏輯封裝到不同的線程里,由操作系統(tǒng)對(duì)這些線程進(jìn)行調(diào)度,使這些線程并發(fā)地執(zhí)行在龍芯的多個(gè)核上,從而達(dá)到多核龍芯模擬多核X86機(jī)器的目的。這種并行化方法需要解決兩個(gè)關(guān)鍵問題:原子指令翻譯問題和SMP機(jī)器的中斷模擬問題。本人所在課題組曾提出過一種基于gcc內(nèi)置原子操作函數(shù)的原子指令翻譯方案,但是經(jīng)本人研究發(fā)現(xiàn)這種翻譯方案存在問題,一是對(duì)簡單指令翻譯的膨脹都超過復(fù)雜指令,二是在解決非對(duì)齊原子指令時(shí),不能完備地應(yīng)對(duì)所有可能出現(xiàn)的情況。本文提出了一種新的直接使用MIPS的11/sc指令對(duì)的原子指令翻譯方案,該方案不存在冗余操作,并且能完備地解決非對(duì)齊原子指令翻譯問題。本文使用了linux實(shí)時(shí)信號(hào)與FIFO隊(duì)列來實(shí)現(xiàn)對(duì)SMP機(jī)器中斷的模擬。為保證中斷模擬效率,有必要使用無鎖技術(shù)實(shí)現(xiàn)FIFO隊(duì)列。本文根據(jù)MIPS的11/sc指令對(duì)的特點(diǎn),以及中斷模擬中無鎖隊(duì)列的特性,提出了一種能夠避免ABA問題的無鎖隊(duì)列的算法,大大提高了中斷模擬效率。最終QEMU能并行地運(yùn)行在龍芯3A平臺(tái)下,達(dá)到了充分利用龍芯主機(jī)平臺(tái)核心數(shù)量優(yōu)勢的目的。
[Abstract]:In recent years, the mainstream desktop and server software are developed on x86 platform, but Godson is a processor based on MIPS instruction set. Therefore, how to make the existing x86 platform software compatible with MIPS architecture has become an important issue in the development of domestic chips. Binary translation is an important method to realize the compatibility between x86 software and Godson cpu. At present, QEMU is mainly used as the whole system simulator on the Godson platform. It has been able to run the windows XP operating system on the Godson platform through binary translation technology. But its performance needs to be improved. After the development of processor frequency into 2GHz, the performance improvement caused by improving processor efficiency is more limited. In order to ensure that Moore's law continues to be effective, multi-core processor has become a trend. However, the existing full-system simulation is serial simulation, only uses the host processor single core resources, so the parallelization of the whole system simulation is urgent. The real realization of parallel system level simulation will greatly improve the speed and performance of the machine and finally realize the commercial industrialization of domestic chips. How to make the Longson platform in the simulation of x86 can give play to its core number advantage, gradually become the focus of research. At present, parallelization simulation based on QEMU has been studied in the industry. For example, PQEMU, HQEMU, COREMU has parallelized QEMU from different angles. But none of these parallel system simulators use the Godson platform as the host machine. This paper mainly analyzes the running principle of QEMU, the simulation principle of SMP machine by QEMU and the implementation of parallel QEMU. In this paper, we choose to change the simulation of SMP machines from serial to parallel by QEMU, encapsulate the simulation logic of different cores of SMP machines by QEMU into different threads, and schedule these threads by the operating system. These threads are executed concurrently on several cores of the dragon core, so that the multi-core dragon core simulates the multi-core X86 machine. This parallelization method needs to solve two key problems: atomic instruction translation and SMP machine interrupt simulation. My team has proposed an atomic instruction translation scheme based on gcc's built-in atomic op-operation function. However, I have found that there are some problems in this translation scheme. One is that the expansion of simple instruction translation exceeds that of complex instruction. Second, when dealing with unaligned atomic instructions, it is not able to deal with all possible situations. In this paper, a new atomic instruction translation scheme using MIPS's 11/sc instruction pair is proposed. The scheme has no redundant operation and can solve the problem of unaligned atomic instruction translation completely. In this paper, linux real-time signal and FIFO queue are used to simulate SMP machine interrupt. In order to ensure the efficiency of interrupt simulation, it is necessary to implement FIFO queue using lock-free technology. According to the characteristics of 11/sc instruction pair of MIPS and the characteristics of unlocked queue in interrupt simulation, this paper presents an algorithm of unlocked queue which can avoid the ABA problem, which greatly improves the efficiency of interrupt simulation. Finally, the QEMU can run in parallel on the Godson 3A platform, which can make full use of the core number advantage of the Godson host platform.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP332;TP391.2
本文編號(hào):2343452
[Abstract]:In recent years, the mainstream desktop and server software are developed on x86 platform, but Godson is a processor based on MIPS instruction set. Therefore, how to make the existing x86 platform software compatible with MIPS architecture has become an important issue in the development of domestic chips. Binary translation is an important method to realize the compatibility between x86 software and Godson cpu. At present, QEMU is mainly used as the whole system simulator on the Godson platform. It has been able to run the windows XP operating system on the Godson platform through binary translation technology. But its performance needs to be improved. After the development of processor frequency into 2GHz, the performance improvement caused by improving processor efficiency is more limited. In order to ensure that Moore's law continues to be effective, multi-core processor has become a trend. However, the existing full-system simulation is serial simulation, only uses the host processor single core resources, so the parallelization of the whole system simulation is urgent. The real realization of parallel system level simulation will greatly improve the speed and performance of the machine and finally realize the commercial industrialization of domestic chips. How to make the Longson platform in the simulation of x86 can give play to its core number advantage, gradually become the focus of research. At present, parallelization simulation based on QEMU has been studied in the industry. For example, PQEMU, HQEMU, COREMU has parallelized QEMU from different angles. But none of these parallel system simulators use the Godson platform as the host machine. This paper mainly analyzes the running principle of QEMU, the simulation principle of SMP machine by QEMU and the implementation of parallel QEMU. In this paper, we choose to change the simulation of SMP machines from serial to parallel by QEMU, encapsulate the simulation logic of different cores of SMP machines by QEMU into different threads, and schedule these threads by the operating system. These threads are executed concurrently on several cores of the dragon core, so that the multi-core dragon core simulates the multi-core X86 machine. This parallelization method needs to solve two key problems: atomic instruction translation and SMP machine interrupt simulation. My team has proposed an atomic instruction translation scheme based on gcc's built-in atomic op-operation function. However, I have found that there are some problems in this translation scheme. One is that the expansion of simple instruction translation exceeds that of complex instruction. Second, when dealing with unaligned atomic instructions, it is not able to deal with all possible situations. In this paper, a new atomic instruction translation scheme using MIPS's 11/sc instruction pair is proposed. The scheme has no redundant operation and can solve the problem of unaligned atomic instruction translation completely. In this paper, linux real-time signal and FIFO queue are used to simulate SMP machine interrupt. In order to ensure the efficiency of interrupt simulation, it is necessary to implement FIFO queue using lock-free technology. According to the characteristics of 11/sc instruction pair of MIPS and the characteristics of unlocked queue in interrupt simulation, this paper presents an algorithm of unlocked queue which can avoid the ABA problem, which greatly improves the efficiency of interrupt simulation. Finally, the QEMU can run in parallel on the Godson 3A platform, which can make full use of the core number advantage of the Godson host platform.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP332;TP391.2
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 王煥東;高翔;陳云霽;胡偉武;;龍芯3號(hào)互聯(lián)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2008年12期
2 王博;尚世鋒;武永衛(wèi);鄭緯民;;多核體系下的并行任務(wù)構(gòu)建[J];計(jì)算機(jī)研究與發(fā)展;2012年04期
3 蔡嵩松;劉奇;王劍;劉金剛;;基于龍芯處理器的二進(jìn)制翻譯器優(yōu)化[J];計(jì)算機(jī)工程;2009年07期
4 廖銀;孫廣中;姜海濤;靳國杰;陳國良;;動(dòng)態(tài)二進(jìn)制翻譯中全寄存器直接映射方法[J];計(jì)算機(jī)應(yīng)用與軟件;2011年11期
5 殷金彪;宋強(qiáng);;動(dòng)態(tài)二進(jìn)制翻譯器qemu的Tcache管理策略[J];計(jì)算機(jī)應(yīng)用與軟件;2012年09期
6 宋克鑫;陳香蘭;陳華平;王篁;;動(dòng)態(tài)二進(jìn)制翻譯的多核并行化中原子指令的翻譯研究[J];計(jì)算機(jī)應(yīng)用與軟件;2013年11期
相關(guān)博士學(xué)位論文 前1條
1 廖銀;動(dòng)態(tài)二進(jìn)制翻譯建模及其并行化研究[D];中國科學(xué)技術(shù)大學(xué);2013年
,本文編號(hào):2343452
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2343452.html
最近更新
教材專著