若干MapReduce排序問題的算法研究
發(fā)布時(shí)間:2019-07-07 20:59
【摘要】:MapReduce是Google提出的一種編程模型,一個(gè)處理和生成大數(shù)據(jù)集的相關(guān)實(shí)現(xiàn)。本文主要研究了Map Reduce平行機(jī)排序問題,對(duì)極小化最大完工時(shí)間(makespan),論文研究了m臺(tái)同類機(jī)離線和兩臺(tái)機(jī)在線排序模型。對(duì)極大化最小完工時(shí)間,文中研究了同型機(jī)離線排序模型。全文共分五章。第一章簡(jiǎn)要地介紹了排序問題的基本知識(shí)以及MapReduce的相關(guān)知識(shí)。第二章主要研究m臺(tái)同類機(jī)離線MapReduce排序問題,在這里map任務(wù)可以平行加工然而reduce任務(wù)不能平行加工。目標(biāo)是極小化最大完工時(shí)間,分別考慮reduce任務(wù)可中斷和reduce任務(wù)不可中斷兩種情況。對(duì)可中斷的reduce任務(wù),我們?cè)O(shè)計(jì)了一個(gè)近似比為2-(?)的算法,其中s1≥s2≥…≥2sm,是機(jī)器σi的加工速度;對(duì)不可中斷的reduce任務(wù),給出了一個(gè)近似比為2+(?)的算法。第三章主要研究?jī)膳_(tái)同類機(jī)在線MapReduce排序問題,目標(biāo)是極小化最大完工時(shí)間。首先對(duì)reduce任務(wù)可中斷和reduce任務(wù)不可中斷兩種情況,分別給出了問題的下界。對(duì)可中斷的reduce任務(wù),給出了競(jìng)爭(zhēng)比為(?)的最優(yōu)算法,其中s是兩臺(tái)機(jī)器的速度之比;對(duì)不中斷的reduce任務(wù),我們證明類似LS算法是最優(yōu)的且它的競(jìng)爭(zhēng)比為(?)和(?)。第四章主要研究離線MapReduce同型機(jī)排序問題,目標(biāo)是極大化最小完工時(shí)間。對(duì)可中斷的reduce任務(wù),給出了3臺(tái)機(jī)的最優(yōu)算法;對(duì)不可中斷的reduce任務(wù),考慮2臺(tái)機(jī)、3臺(tái)機(jī)、任意m臺(tái)機(jī)的情況,分別給出了最壞情況界為4/3、3/2、m的近似算法且都是緊的。最后,我們對(duì)相關(guān)問題的后續(xù)研究作了討論。第五章總結(jié)全文并提出相關(guān)問題進(jìn)一步的研究方向。
[Abstract]:MapReduce is a programming model proposed by Google, a related implementation of processing and generating large data sets. In this paper, the scheduling problem of Map Reduce parallel machines is studied, and the offline and two online sorting models of m similar machines are studied for minimizing the maximum completion time (makespan),. For maximizing the minimum completion time, the offline sorting model of the same machine is studied in this paper. The full text is divided into five chapters. The first chapter briefly introduces the basic knowledge of scheduling problem and the related knowledge of MapReduce. In the second chapter, we mainly study the off-line MapReduce scheduling problem of m similar machines, where map tasks can be processed in parallel, but reduce tasks can not be processed in parallel. The goal is to minimize the maximum completion time, considering the interruptible reduce task and the uninterruptible reduce task respectively. For interruptible reduce tasks, we design an approximate ratio of 2-(?) Where S1 鈮,
本文編號(hào):2511438
[Abstract]:MapReduce is a programming model proposed by Google, a related implementation of processing and generating large data sets. In this paper, the scheduling problem of Map Reduce parallel machines is studied, and the offline and two online sorting models of m similar machines are studied for minimizing the maximum completion time (makespan),. For maximizing the minimum completion time, the offline sorting model of the same machine is studied in this paper. The full text is divided into five chapters. The first chapter briefly introduces the basic knowledge of scheduling problem and the related knowledge of MapReduce. In the second chapter, we mainly study the off-line MapReduce scheduling problem of m similar machines, where map tasks can be processed in parallel, but reduce tasks can not be processed in parallel. The goal is to minimize the maximum completion time, considering the interruptible reduce task and the uninterruptible reduce task respectively. For interruptible reduce tasks, we design an approximate ratio of 2-(?) Where S1 鈮,
本文編號(hào):2511438
本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/2511438.html
最近更新
教材專著