分布式環(huán)境下主副版本任務(wù)可靠調(diào)度方法研究
[Abstract]:With the development of computing technology and network technology, the data center and computing center constructed by distributed computing and parallel computing are widely used in the fields of industry, commerce, science and technology and military. In these applications, a large number of complex computational tasks are decomposed into several sub-task parallel processing, and the calculation results are effectively combined to obtain the final result. it can be seen that the effective task scheduling mechanism is the key factor that affects the performance and efficiency of distributed computing system during the decomposition and calculation process of the task, and the unreasonable task scheduling method can seriously affect the computing power of the system, reduce the parallel efficiency, Even failing to reach parallel computing should have the effect. Therefore, the task scheduling problem has been the core content of distributed system, grid system and cloud computing system. However, with the increasing scale of distributed system and increasing computing power, the stability and reliability of the system have become the key to the successful implementation of parallel application. For example, in a supercomputer or a large-scale cluster such as Chrome No. 2 and Google data center, due to the complex upper application and the ultra-high power consumption of the system, the system is extremely prone to malfunction, so it is particularly important to design a complete set of reliability guarantee mechanisms. It is one of the most important means to design a highly reliable scheduling algorithm at the scheduling stage of the system. Based on the objective of guaranteeing performance and improving reliability, this paper studies how to guarantee the efficient utilization of distributed computing system reliability and computing resources. The paper divides the types of tasks into real-time periodic task and non-real-time task type, and realizes high-reliability and high-performance scheduling strategy through main sub-version scheduling technology. The specific work is as follows: (1) In order to solve the problem of reliable scheduling of distributed computing system, a scheduling algorithm (DRCAMD) based on calculating node and communication link reliability cost is proposed. The method can adjust the target weight function of the system by the method of setting the weight value, balance the different requirements of the scheduling performance and the reliability of the user in the system, and additionally, aiming at the scheduling problem of the real-time task with the dependency relationship, This paper presents a schedulable analysis method which does not take into account the overlapping states of the main version task and the sub-version task, and the experimental results show that the algorithm has some advantages in the reliability and performance of the algorithm under the failure probability condition of certain computing nodes and communication links. (2) A two-stage reliable scheduling algorithm (MCRSS) and schedulable analysis method are proposed based on the main sub-version scheduling policy and the processing method of task criticality. the first phase of the algorithm is mainly to schedule the mixed key tasks needing to be scheduled according to the priority level, The second stage is to perform schedulable analysis on tasks scheduled to the target processor, and upgrade the tasks that can not meet the scheduling requirements until the deadline requirements for tasks can be met. The simulation experiment shows that the MCRSS algorithm can effectively deal with the reliable scheduling problem of different key-level tasks in hybrid critical tasks, and also ensures that the distributed computing system has good flexibility and performance. (3) Aiming at the scheduling problem of DAG task with priority dependence, a scheduling algorithm (EFTBT) based on the earliest completion time of sub-version task is proposed in this paper. The method obtains the earliest completion time of the sub-version task scheduling and the constraint of the scheduled target processor by analyzing the state of the main version task scheduling, and proves the rationality of the constraint, The method can obtain better scheduling performance under the premise of guaranteeing reliable scheduling, and in addition, aiming at the problem that a plurality of DAG tasks existing in the scientific workflow application are simultaneously scheduled, in order to solve the problem that a plurality of subsequent DAG tasks caused by unfair scheduling cannot be scheduled, A multi-DAG scheduling strategy (MDDL) based on layered thought is proposed. The experimental results show that the two algorithms can effectively improve the performance of scheduling compared with classical algorithms. (4) aiming at the characteristics of heterogeneous and dynamic characteristics of the large-scale distributed computing system, a reliable scheduling strategy with dependency relation DAG task based on the node and link fault characteristics is proposed, and the strategy is based on the earliest completion time algorithm EFTBT of the sub-version task, In this paper, the communication model and the sub-version execution strategy are given. The fault characteristic analysis method of distributed computing system is established. Based on this, a fault-tolerant scheduling algorithm (RAPA) based on communication contention model is proposed. The experimental results show that compared with HEFT and EFTBT, RAPA algorithm has better performance and reliability.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP338.8
【相似文獻(xiàn)】
相關(guān)期刊論文 前4條
1 彭日光;李仁發(fā);劉彥;陳宇;李浪;;動(dòng)態(tài)可重構(gòu)片上系統(tǒng)的任務(wù)在線調(diào)度算法[J];計(jì)算機(jī)工程;2010年05期
2 廖雷;如何在Windows下由一個(gè)任務(wù)啟動(dòng)和中止另一個(gè)任務(wù)[J];現(xiàn)代計(jì)算機(jī);1996年04期
3 李濤;楊愚魯;;可重構(gòu)資源管理及硬件任務(wù)布局的算法研究[J];計(jì)算機(jī)研究與發(fā)展;2008年02期
4 ;[J];;年期
相關(guān)博士學(xué)位論文 前1條
1 景維鵬;分布式環(huán)境下主副版本任務(wù)可靠調(diào)度方法研究[D];哈爾濱工業(yè)大學(xué);2016年
相關(guān)碩士學(xué)位論文 前4條
1 李橙;嵌入式MPSoC系統(tǒng)中的任務(wù)調(diào)度管理研究[D];浙江大學(xué);2010年
2 王宇;基于DVS的多核周期任務(wù)節(jié)能調(diào)度策略研究[D];武漢理工大學(xué);2013年
3 張鐵軍;基于多核CPU的任務(wù)級(jí)數(shù)據(jù)處理研究及其在集群平臺(tái)下的性能測(cè)試[D];重慶大學(xué);2011年
4 李俊;基于塊聚集的MapReduce性能研究與優(yōu)化[D];北京交通大學(xué);2014年
,本文編號(hào):2292323
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2292323.html