面向動(dòng)態(tài)異構(gòu)眾核處理器的任務(wù)調(diào)度研究
發(fā)布時(shí)間:2018-07-21 21:18
【摘要】:片上高效能計(jì)算的需求和芯片制造工藝偏差的增大共同驅(qū)動(dòng)著多核處理器進(jìn)入異構(gòu)時(shí)代。性能異構(gòu)多核處理器結(jié)構(gòu)的基本設(shè)計(jì)思想是在芯片上放置不同粒度的處理器核,在使用亂序超標(biāo)量大核開(kāi)發(fā)串行代碼性能的同時(shí),使用大量結(jié)構(gòu)簡(jiǎn)單的小核開(kāi)發(fā)線程級(jí)并行性。本質(zhì)上,性能異構(gòu)多核處理器只有當(dāng)芯片上處理器核的配置與任務(wù)負(fù)載的并行特征匹配時(shí),才能有效提高計(jì)算效率。但是,任務(wù)負(fù)載的并行特征和資源需求是動(dòng)態(tài)變化的,這就要求異構(gòu)處理器結(jié)構(gòu)必須具備根據(jù)負(fù)載特征動(dòng)態(tài)調(diào)整片上計(jì)算資源配置的能力。為此,近年來(lái)學(xué)術(shù)界進(jìn)一步提出了動(dòng)態(tài)異構(gòu)眾核處理器(Dynamic Heterogeneous Chip Multiprocessor, DHCMP)結(jié)構(gòu):它在芯片上放置大量同構(gòu)的基本核,同時(shí)在微結(jié)構(gòu)上支持將若干個(gè)基本核組合成單個(gè)邏輯處理器核(簡(jiǎn)稱邏輯核),從而允許系統(tǒng)軟件在運(yùn)行時(shí)動(dòng)態(tài)地將片上計(jì)算資源(即基本核)按需配置成多個(gè)性能異構(gòu)的邏輯核。 但是,動(dòng)態(tài)異構(gòu)處理器本身只提供邏輯核重配置的能力,是否能夠準(zhǔn)確判斷系統(tǒng)負(fù)載的并行特征和資源需求、并合理地配置DHCMP計(jì)算資源以達(dá)到高效能計(jì)算,任務(wù)調(diào)度程序則扮演著決定性的角色。然而,面向動(dòng)態(tài)異構(gòu)眾核處理器的相關(guān)任務(wù)調(diào)度研究還遠(yuǎn)未展開(kāi)。本文的研究旨在搭建一個(gè)能夠有效支持DHCMP邏輯核快速調(diào)整的任務(wù)調(diào)度框架,同時(shí)研究能夠有效使用DHCMP動(dòng)態(tài)異構(gòu)特性開(kāi)發(fā)高效能計(jì)算的邏輯核資源分配算法、以及能夠在DHCMP上提供基于任務(wù)優(yōu)先級(jí)公平性的進(jìn)程調(diào)度算法。本文的研究工作和成果主要包括以下四個(gè)方面: 1.研究了面向動(dòng)態(tài)異構(gòu)處理器的硬件/操作系統(tǒng)接口,向操作系統(tǒng)呈現(xiàn)了一個(gè)簡(jiǎn)潔通用的邏輯核抽象;將動(dòng)態(tài)異構(gòu)處理器的邏輯核重配置操作歸納為六個(gè)功能完備的原語(yǔ),操作系統(tǒng)通過(guò)調(diào)用這些原語(yǔ)的組合可以完成對(duì)邏輯核的任何重配置。同時(shí),研究了在動(dòng)態(tài)異構(gòu)處理器上進(jìn)程調(diào)度觸發(fā)粒度和計(jì)算資源調(diào)整觸發(fā)粒度之間的關(guān)系,進(jìn)而得出使用進(jìn)程調(diào)度時(shí)鐘即可滿足程序階段行為采樣和片上計(jì)算資源調(diào)整的頻率需求。 2.設(shè)計(jì)了面向動(dòng)態(tài)異構(gòu)處理器的任務(wù)調(diào)度框架,該調(diào)度框架基于集中式任務(wù)隊(duì)列,能夠高效支持邏輯核數(shù)目和粒度的快速調(diào)整。當(dāng)發(fā)生邏輯核的釋放/創(chuàng)建時(shí),任務(wù)調(diào)度程序只需要進(jìn)行出隊(duì)/入隊(duì)操作即可完成相應(yīng)數(shù)據(jù)結(jié)構(gòu)的更新。同時(shí),提出了類流水線調(diào)度機(jī)制以優(yōu)化調(diào)度程序在集中式隊(duì)列上較大的決策時(shí)間開(kāi)銷,從而使得基于集中式隊(duì)列的調(diào)度框架具備了可用性。 3.研究了程序階段行為和能夠反映程序計(jì)算訪存特征的常用微結(jié)構(gòu)參數(shù)之間的關(guān)系,提出了一個(gè)基于IPC的程序階段動(dòng)態(tài)識(shí)別算法。進(jìn)而,設(shè)計(jì)了邏輯核資源分配算法PERA:該算法能夠動(dòng)態(tài)檢測(cè)程序所處的執(zhí)行階段,并根據(jù)程序的執(zhí)行效率準(zhǔn)確地判斷出該階段內(nèi)程序?qū)τ?jì)算資源的需求。通過(guò)將PERA算法設(shè)計(jì)為一個(gè)有限狀態(tài)機(jī)、每次算法觸發(fā)運(yùn)行時(shí)只進(jìn)行一次狀態(tài)轉(zhuǎn)換,從而使得算法具備O(1)的時(shí)間復(fù)雜度。 4.設(shè)計(jì)了面向動(dòng)態(tài)異構(gòu)處理器的公平性調(diào)度算法EDP,該算法不僅可以保證每個(gè)進(jìn)程獲得和其優(yōu)先級(jí)成比例的性能,而且能夠保證多進(jìn)程的并行執(zhí)行對(duì)相同優(yōu)先級(jí)進(jìn)程的性能影響相同。同時(shí),得益于對(duì)邏輯核動(dòng)態(tài)異構(gòu)特性的有效使用,在EDP調(diào)度下動(dòng)態(tài)異構(gòu)處理器執(zhí)行負(fù)載的性能也得到了提高。我們的實(shí)驗(yàn)結(jié)果顯示,在片上計(jì)算資源總數(shù)相等的情況下,使用EDP調(diào)度的DHCMP在任務(wù)平均周轉(zhuǎn)時(shí)間上比對(duì)稱多核處理器和靜態(tài)異構(gòu)多核處理器分別勝出26.2%和11.8%;在系統(tǒng)吞吐率上分別勝出33.6%和12.5%. 本文設(shè)計(jì)的任務(wù)調(diào)度框架能夠?yàn)楹罄m(xù)面向動(dòng)態(tài)異構(gòu)眾核處理器的調(diào)度算法研究提供一個(gè)通用的支撐平臺(tái)。本文提出的邏輯核資源分配算法PERA、公平性調(diào)度算法EDP以及在算法設(shè)計(jì)過(guò)程中對(duì)程序階段行為的探索,可以供后續(xù)面向異構(gòu)多核/眾核處理器的任務(wù)調(diào)度工作參考。同時(shí),本文在集中式隊(duì)列上提出的類流水線調(diào)度優(yōu)化機(jī)制,可以作為一般方法論推廣應(yīng)用于其他眾核結(jié)構(gòu)。
[Abstract]:The requirement of high efficiency calculation on chip and the increase of chip manufacturing process deviation drive the multi-core processor into the heterogeneous era. The basic design idea of the heterogeneous multi-core processor structure is to place different granularity of the processor core on the chip, and use a large number of knots while using the chaotic sequence superscalar large kernel to open the serial code performance. In essence, a performance heterogeneous multicore processor can effectively improve the computational efficiency only when the configuration of the processor core is matched with the parallel features of the task load. However, the parallel features and resource requirements of the task load change dynamically, which requires the structure of heterogeneous processors to be necessary. In recent years, the academic circle has proposed a dynamic heterogeneous Dynamic Heterogeneous Chip Multiprocessor (DHCMP) structure in which a large number of basic cores of isomorphism are placed on the chip, and a number of basic nuclear combinations are supported on the microstructures. As a kernel of a single logic processor (called logical kernel), it allows system software to dynamically configure the computing resources on chip (i.e. basic kernel) into multiple heterogeneous logical cores at run time.
However, the dynamic heterogeneous processor itself only provides the ability of logical kernel reconfiguration, whether it can accurately determine the parallel features and resource requirements of the system load, and reasonably configure the DHCMP computing resources to achieve efficient computing. The task scheduler plays a decisive role. This research aims to build a task scheduling framework that can effectively support the rapid adjustment of DHCMP logic kernel, and study the efficient and efficient computing of logical kernel allocation algorithm which can effectively use DHCMP dynamic heterogeneity, as well as to provide the fairness of task priority based on DHCMP. Process scheduling algorithm. The research work and achievements in this paper mainly include the following four aspects:
1. the hardware / operating system interface for dynamic heterogeneous processors is studied, and a simple and general logical kernel abstract is presented to the operating system. The logical kernel reconfiguration operation of the dynamic heterogeneous processor is summed up as six fully functional primitives. The operating system can complete any logical kernel by calling the combinations of these primitive languages. At the same time, the relationship between process scheduling trigger granularity and computing resource adjustment trigger granularity on dynamic heterogeneous processors is studied, and then the use of process scheduling clock can satisfy the frequency requirement of program phase behavior sampling and the adjustment of computing resources on chip.
2. the task scheduling framework for dynamic heterogeneous processors is designed. The scheduling framework is based on centralized task queues, which can efficiently support the rapid adjustment of the number and granularity of logical kernel. When the logical kernel is released / created, the task scheduler only needs to carry out the team / queue exercises to complete the update of the corresponding data structure. A class pipelined scheduling mechanism is proposed to optimize the decision time overhead of the scheduler on the centralized queue, thus making the scheduling framework based on the centralized queue availability.
3. the relationship between the program stage behavior and the common microstructural parameters which can reflect the characteristics of program calculation and memory is studied. A dynamic recognition algorithm based on IPC is proposed. Then, the logical kernel allocation algorithm PERA: is designed, which can dynamically detect the execution phase of the program, and according to the execution efficiency of the program. The requirement of computing resources in this stage is accurately judged. By designing the PERA algorithm as a finite state machine, the algorithm has only one state conversion when the algorithm triggers the run, which makes the algorithm have the time complexity of O (1).
4. the fairness scheduling algorithm EDP for dynamic heterogeneous processors is designed. This algorithm can not only guarantee the performance of each process and its priority, but also ensure that the parallel execution of multi processes has the same impact on the performance of the same priority process. At the same time, it benefits from the efficient use of the dynamic heterogeneous characteristics of logical kernel, in E The performance of the dynamic heterogeneous processor execution load under DP scheduling has also been improved. Our experimental results show that, with the equal total number of computing resources on the chip, the DHCMP using EDP scheduling wins 26.2% and 11.8% more than symmetric multi-core processors and static heterogeneous multi-core processors in the task average turnover time; the throughput rate of the system is higher than that of the static heterogeneous multi-core processors. 33.6% and 12.5%., respectively
The task scheduling framework designed in this paper can provide a general support platform for the future research of scheduling algorithms for dynamic heterogeneous public kernel processors. The logical kernel allocation algorithm (PERA), fairness scheduling algorithm (EDP) and the exploration of program phase behavior in the process of algorithm design can be used for subsequent heterogeneity. At the same time, the class pipelining scheduling optimization mechanism proposed in the centralized queue can be popularized and applied to other public kernel structures as a general methodology.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP332
本文編號(hào):2136886
[Abstract]:The requirement of high efficiency calculation on chip and the increase of chip manufacturing process deviation drive the multi-core processor into the heterogeneous era. The basic design idea of the heterogeneous multi-core processor structure is to place different granularity of the processor core on the chip, and use a large number of knots while using the chaotic sequence superscalar large kernel to open the serial code performance. In essence, a performance heterogeneous multicore processor can effectively improve the computational efficiency only when the configuration of the processor core is matched with the parallel features of the task load. However, the parallel features and resource requirements of the task load change dynamically, which requires the structure of heterogeneous processors to be necessary. In recent years, the academic circle has proposed a dynamic heterogeneous Dynamic Heterogeneous Chip Multiprocessor (DHCMP) structure in which a large number of basic cores of isomorphism are placed on the chip, and a number of basic nuclear combinations are supported on the microstructures. As a kernel of a single logic processor (called logical kernel), it allows system software to dynamically configure the computing resources on chip (i.e. basic kernel) into multiple heterogeneous logical cores at run time.
However, the dynamic heterogeneous processor itself only provides the ability of logical kernel reconfiguration, whether it can accurately determine the parallel features and resource requirements of the system load, and reasonably configure the DHCMP computing resources to achieve efficient computing. The task scheduler plays a decisive role. This research aims to build a task scheduling framework that can effectively support the rapid adjustment of DHCMP logic kernel, and study the efficient and efficient computing of logical kernel allocation algorithm which can effectively use DHCMP dynamic heterogeneity, as well as to provide the fairness of task priority based on DHCMP. Process scheduling algorithm. The research work and achievements in this paper mainly include the following four aspects:
1. the hardware / operating system interface for dynamic heterogeneous processors is studied, and a simple and general logical kernel abstract is presented to the operating system. The logical kernel reconfiguration operation of the dynamic heterogeneous processor is summed up as six fully functional primitives. The operating system can complete any logical kernel by calling the combinations of these primitive languages. At the same time, the relationship between process scheduling trigger granularity and computing resource adjustment trigger granularity on dynamic heterogeneous processors is studied, and then the use of process scheduling clock can satisfy the frequency requirement of program phase behavior sampling and the adjustment of computing resources on chip.
2. the task scheduling framework for dynamic heterogeneous processors is designed. The scheduling framework is based on centralized task queues, which can efficiently support the rapid adjustment of the number and granularity of logical kernel. When the logical kernel is released / created, the task scheduler only needs to carry out the team / queue exercises to complete the update of the corresponding data structure. A class pipelined scheduling mechanism is proposed to optimize the decision time overhead of the scheduler on the centralized queue, thus making the scheduling framework based on the centralized queue availability.
3. the relationship between the program stage behavior and the common microstructural parameters which can reflect the characteristics of program calculation and memory is studied. A dynamic recognition algorithm based on IPC is proposed. Then, the logical kernel allocation algorithm PERA: is designed, which can dynamically detect the execution phase of the program, and according to the execution efficiency of the program. The requirement of computing resources in this stage is accurately judged. By designing the PERA algorithm as a finite state machine, the algorithm has only one state conversion when the algorithm triggers the run, which makes the algorithm have the time complexity of O (1).
4. the fairness scheduling algorithm EDP for dynamic heterogeneous processors is designed. This algorithm can not only guarantee the performance of each process and its priority, but also ensure that the parallel execution of multi processes has the same impact on the performance of the same priority process. At the same time, it benefits from the efficient use of the dynamic heterogeneous characteristics of logical kernel, in E The performance of the dynamic heterogeneous processor execution load under DP scheduling has also been improved. Our experimental results show that, with the equal total number of computing resources on the chip, the DHCMP using EDP scheduling wins 26.2% and 11.8% more than symmetric multi-core processors and static heterogeneous multi-core processors in the task average turnover time; the throughput rate of the system is higher than that of the static heterogeneous multi-core processors. 33.6% and 12.5%., respectively
The task scheduling framework designed in this paper can provide a general support platform for the future research of scheduling algorithms for dynamic heterogeneous public kernel processors. The logical kernel allocation algorithm (PERA), fairness scheduling algorithm (EDP) and the exploration of program phase behavior in the process of algorithm design can be used for subsequent heterogeneity. At the same time, the class pipelining scheduling optimization mechanism proposed in the centralized queue can be popularized and applied to other public kernel structures as a general methodology.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP332
【參考文獻(xiàn)】
相關(guān)博士學(xué)位論文 前2條
1 任永青;邏輯核動(dòng)態(tài)可重構(gòu)的眾核處理器體系結(jié)構(gòu)[D];中國(guó)科學(xué)技術(shù)大學(xué);2010年
2 許牧;可重構(gòu)眾核流處理器體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2012年
,本文編號(hào):2136886
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2136886.html
最近更新
教材專著