面向動態(tài)異構(gòu)眾核處理器的任務(wù)調(diào)度研究

發(fā)布時間：2018-07-21 21:18

【摘要】：片上高效能計(jì)算的需求和芯片制造工藝偏差的增大共同驅(qū)動著多核處理器進(jìn)入異構(gòu)時代。性能異構(gòu)多核處理器結(jié)構(gòu)的基本設(shè)計(jì)思想是在芯片上放置不同粒度的處理器核,在使用亂序超標(biāo)量大核開發(fā)串行代碼性能的同時,使用大量結(jié)構(gòu)簡單的小核開發(fā)線程級并行性。本質(zhì)上,性能異構(gòu)多核處理器只有當(dāng)芯片上處理器核的配置與任務(wù)負(fù)載的并行特征匹配時,才能有效提高計(jì)算效率。但是,任務(wù)負(fù)載的并行特征和資源需求是動態(tài)變化的,這就要求異構(gòu)處理器結(jié)構(gòu)必須具備根據(jù)負(fù)載特征動態(tài)調(diào)整片上計(jì)算資源配置的能力。為此,近年來學(xué)術(shù)界進(jìn)一步提出了動態(tài)異構(gòu)眾核處理器(Dynamic Heterogeneous Chip Multiprocessor, DHCMP)結(jié)構(gòu)：它在芯片上放置大量同構(gòu)的基本核,同時在微結(jié)構(gòu)上支持將若干個基本核組合成單個邏輯處理器核(簡稱邏輯核),從而允許系統(tǒng)軟件在運(yùn)行時動態(tài)地將片上計(jì)算資源(即基本核)按需配置成多個性能異構(gòu)的邏輯核。但是,動態(tài)異構(gòu)處理器本身只提供邏輯核重配置的能力,是否能夠準(zhǔn)確判斷系統(tǒng)負(fù)載的并行特征和資源需求、并合理地配置DHCMP計(jì)算資源以達(dá)到高效能計(jì)算,任務(wù)調(diào)度程序則扮演著決定性的角色。然而,面向動態(tài)異構(gòu)眾核處理器的相關(guān)任務(wù)調(diào)度研究還遠(yuǎn)未展開。本文的研究旨在搭建一個能夠有效支持DHCMP邏輯核快速調(diào)整的任務(wù)調(diào)度框架,同時研究能夠有效使用DHCMP動態(tài)異構(gòu)特性開發(fā)高效能計(jì)算的邏輯核資源分配算法、以及能夠在DHCMP上提供基于任務(wù)優(yōu)先級公平性的進(jìn)程調(diào)度算法。本文的研究工作和成果主要包括以下四個方面： 1.研究了面向動態(tài)異構(gòu)處理器的硬件/操作系統(tǒng)接口,向操作系統(tǒng)呈現(xiàn)了一個簡潔通用的邏輯核抽象；將動態(tài)異構(gòu)處理器的邏輯核重配置操作歸納為六個功能完備的原語,操作系統(tǒng)通過調(diào)用這些原語的組合可以完成對邏輯核的任何重配置。同時,研究了在動態(tài)異構(gòu)處理器上進(jìn)程調(diào)度觸發(fā)粒度和計(jì)算資源調(diào)整觸發(fā)粒度之間的關(guān)系,進(jìn)而得出使用進(jìn)程調(diào)度時鐘即可滿足程序階段行為采樣和片上計(jì)算資源調(diào)整的頻率需求。 2.設(shè)計(jì)了面向動態(tài)異構(gòu)處理器的任務(wù)調(diào)度框架,該調(diào)度框架基于集中式任務(wù)隊(duì)列,能夠高效支持邏輯核數(shù)目和粒度的快速調(diào)整。當(dāng)發(fā)生邏輯核的釋放/創(chuàng)建時,任務(wù)調(diào)度程序只需要進(jìn)行出隊(duì)/入隊(duì)操作即可完成相應(yīng)數(shù)據(jù)結(jié)構(gòu)的更新。同時,提出了類流水線調(diào)度機(jī)制以優(yōu)化調(diào)度程序在集中式隊(duì)列上較大的決策時間開銷,從而使得基于集中式隊(duì)列的調(diào)度框架具備了可用性。 3.研究了程序階段行為和能夠反映程序計(jì)算訪存特征的常用微結(jié)構(gòu)參數(shù)之間的關(guān)系,提出了一個基于IPC的程序階段動態(tài)識別算法。進(jìn)而,設(shè)計(jì)了邏輯核資源分配算法PERA:該算法能夠動態(tài)檢測程序所處的執(zhí)行階段,并根據(jù)程序的執(zhí)行效率準(zhǔn)確地判斷出該階段內(nèi)程序?qū)τ?jì)算資源的需求。通過將PERA算法設(shè)計(jì)為一個有限狀態(tài)機(jī)、每次算法觸發(fā)運(yùn)行時只進(jìn)行一次狀態(tài)轉(zhuǎn)換,從而使得算法具備O(1)的時間復(fù)雜度。 4.設(shè)計(jì)了面向動態(tài)異構(gòu)處理器的公平性調(diào)度算法EDP,該算法不僅可以保證每個進(jìn)程獲得和其優(yōu)先級成比例的性能,而且能夠保證多進(jìn)程的并行執(zhí)行對相同優(yōu)先級進(jìn)程的性能影響相同。同時,得益于對邏輯核動態(tài)異構(gòu)特性的有效使用,在EDP調(diào)度下動態(tài)異構(gòu)處理器執(zhí)行負(fù)載的性能也得到了提高。我們的實(shí)驗(yàn)結(jié)果顯示,在片上計(jì)算資源總數(shù)相等的情況下,使用EDP調(diào)度的DHCMP在任務(wù)平均周轉(zhuǎn)時間上比對稱多核處理器和靜態(tài)異構(gòu)多核處理器分別勝出26.2%和11.8%；在系統(tǒng)吞吐率上分別勝出33.6%和12.5%. 本文設(shè)計(jì)的任務(wù)調(diào)度框架能夠?yàn)楹罄m(xù)面向動態(tài)異構(gòu)眾核處理器的調(diào)度算法研究提供一個通用的支撐平臺。本文提出的邏輯核資源分配算法PERA、公平性調(diào)度算法EDP以及在算法設(shè)計(jì)過程中對程序階段行為的探索,可以供后續(xù)面向異構(gòu)多核／眾核處理器的任務(wù)調(diào)度工作參考。同時,本文在集中式隊(duì)列上提出的類流水線調(diào)度優(yōu)化機(jī)制,可以作為一般方法論推廣應(yīng)用于其他眾核結(jié)構(gòu)。
[Abstract]:The requirement of high efficiency calculation on chip and the increase of chip manufacturing process deviation drive the multi-core processor into the heterogeneous era. The basic design idea of the heterogeneous multi-core processor structure is to place different granularity of the processor core on the chip, and use a large number of knots while using the chaotic sequence superscalar large kernel to open the serial code performance. In essence, a performance heterogeneous multicore processor can effectively improve the computational efficiency only when the configuration of the processor core is matched with the parallel features of the task load. However, the parallel features and resource requirements of the task load change dynamically, which requires the structure of heterogeneous processors to be necessary. In recent years, the academic circle has proposed a dynamic heterogeneous Dynamic Heterogeneous Chip Multiprocessor (DHCMP) structure in which a large number of basic cores of isomorphism are placed on the chip, and a number of basic nuclear combinations are supported on the microstructures. As a kernel of a single logic processor (called logical kernel), it allows system software to dynamically configure the computing resources on chip (i.e. basic kernel) into multiple heterogeneous logical cores at run time.
However, the dynamic heterogeneous processor itself only provides the ability of logical kernel reconfiguration, whether it can accurately determine the parallel features and resource requirements of the system load, and reasonably configure the DHCMP computing resources to achieve efficient computing. The task scheduler plays a decisive role. This research aims to build a task scheduling framework that can effectively support the rapid adjustment of DHCMP logic kernel, and study the efficient and efficient computing of logical kernel allocation algorithm which can effectively use DHCMP dynamic heterogeneity, as well as to provide the fairness of task priority based on DHCMP. Process scheduling algorithm. The research work and achievements in this paper mainly include the following four aspects:
1. the hardware / operating system interface for dynamic heterogeneous processors is studied, and a simple and general logical kernel abstract is presented to the operating system. The logical kernel reconfiguration operation of the dynamic heterogeneous processor is summed up as six fully functional primitives. The operating system can complete any logical kernel by calling the combinations of these primitive languages. At the same time, the relationship between process scheduling trigger granularity and computing resource adjustment trigger granularity on dynamic heterogeneous processors is studied, and then the use of process scheduling clock can satisfy the frequency requirement of program phase behavior sampling and the adjustment of computing resources on chip.
2. the task scheduling framework for dynamic heterogeneous processors is designed. The scheduling framework is based on centralized task queues, which can efficiently support the rapid adjustment of the number and granularity of logical kernel. When the logical kernel is released / created, the task scheduler only needs to carry out the team / queue exercises to complete the update of the corresponding data structure. A class pipelined scheduling mechanism is proposed to optimize the decision time overhead of the scheduler on the centralized queue, thus making the scheduling framework based on the centralized queue availability.
3. the relationship between the program stage behavior and the common microstructural parameters which can reflect the characteristics of program calculation and memory is studied. A dynamic recognition algorithm based on IPC is proposed. Then, the logical kernel allocation algorithm PERA: is designed, which can dynamically detect the execution phase of the program, and according to the execution efficiency of the program. The requirement of computing resources in this stage is accurately judged. By designing the PERA algorithm as a finite state machine, the algorithm has only one state conversion when the algorithm triggers the run, which makes the algorithm have the time complexity of O (1).
4. the fairness scheduling algorithm EDP for dynamic heterogeneous processors is designed. This algorithm can not only guarantee the performance of each process and its priority, but also ensure that the parallel execution of multi processes has the same impact on the performance of the same priority process. At the same time, it benefits from the efficient use of the dynamic heterogeneous characteristics of logical kernel, in E The performance of the dynamic heterogeneous processor execution load under DP scheduling has also been improved. Our experimental results show that, with the equal total number of computing resources on the chip, the DHCMP using EDP scheduling wins 26.2% and 11.8% more than symmetric multi-core processors and static heterogeneous multi-core processors in the task average turnover time; the throughput rate of the system is higher than that of the static heterogeneous multi-core processors. 33.6% and 12.5%., respectively
The task scheduling framework designed in this paper can provide a general support platform for the future research of scheduling algorithms for dynamic heterogeneous public kernel processors. The logical kernel allocation algorithm (PERA), fairness scheduling algorithm (EDP) and the exploration of program phase behavior in the process of algorithm design can be used for subsequent heterogeneity. At the same time, the class pipelining scheduling optimization mechanism proposed in the centralized queue can be popularized and applied to other public kernel structures as a general methodology.
【學(xué)位授予單位】：中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2013
【分類號】：TP332

【參考文獻(xiàn)】

相關(guān)博士學(xué)位論文前2條

1 任永青;邏輯核動態(tài)可重構(gòu)的眾核處理器體系結(jié)構(gòu)[D];中國科學(xué)技術(shù)大學(xué);2010年

2 許牧;可重構(gòu)眾核流處理器體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2012年

，

本文編號：2136886

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2136886.html

上一篇：“異構(gòu)計(jì)算”專題前言
下一篇：基于并行計(jì)算的列車運(yùn)行仿真快速算法設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向動態(tài)異構(gòu)眾核處理器的任務(wù)調(diào)度研究