MapReduce在科學(xué)計(jì)算中的研究與改進(jìn)
發(fā)布時(shí)間:2018-06-23 17:24
本文選題:MapRednce + 云計(jì)算; 參考:《安徽大學(xué)》2013年碩士論文
【摘要】:隨著異構(gòu)數(shù)據(jù)的急劇增加,云計(jì)算應(yīng)運(yùn)而生。作為云計(jì)算的編程模型MapReduce同樣也得到了廣泛的關(guān)注,特別是在學(xué)術(shù)界。為了解決覆蓋及中間數(shù)據(jù)的存儲(chǔ)等諸多問題,諸多學(xué)者提出了許多地改進(jìn)辦法并形成了自己的編程模型,如有Hadoop、Twister和Haloop等。 為了能夠?qū)崿F(xiàn)迭代算法,Haloop模型中增加了Loop Control機(jī)制,該機(jī)制在具體的實(shí)施時(shí)主要是增加了兩個(gè)函數(shù),即ADDMap和ADDReduce,這兩個(gè)函數(shù)的目的就在于來增加其迭代的次數(shù)。同時(shí)在Twister模型中也有相應(yīng)控制loop的機(jī)制。同樣,在本文中為了更好的執(zhí)行具有迭代的算法,不但保持了原有的接口和函數(shù),而且還在Map函數(shù)、Reduce函數(shù)、ADDMap函數(shù)和ADDReduce函數(shù)中增加了一個(gè)參數(shù)M,M的作用主要是來區(qū)分科學(xué)計(jì)算中的四類算法的。如果M等于1就代表是第一類算法;如果M等于2時(shí)就代表第二類算法;如果M等于3時(shí)就代表第三類算法;如果M等于4時(shí)就代表第四類算法。由于第三類和第四類算法都是具有迭代的算法,這時(shí)把該兩類算法經(jīng)常要用到的函數(shù)及接口都寫成了適配器。在具體做實(shí)驗(yàn)時(shí),開發(fā)人員就可以根據(jù)需要往函數(shù)體里面增加相應(yīng)的函數(shù)體。為了確保數(shù)據(jù)的安全,在實(shí)驗(yàn)時(shí)變量被聲明成保護(hù)型。把那些變化不大的數(shù)據(jù)放在緩沖池,這樣一來就可在Slave節(jié)點(diǎn)的本地系統(tǒng)上讀寫數(shù)據(jù),而不用從Master節(jié)點(diǎn)上讀寫數(shù)據(jù),這樣不但可以減輕Master節(jié)點(diǎn)的工作量,而且可以提高運(yùn)行效率。 基于種種調(diào)度算法的缺點(diǎn),提出改進(jìn)的算法。該算法增加如下參數(shù):計(jì)算成本,任務(wù)的最后期限和客服端機(jī)器的處理速度等參數(shù),還設(shè)置了兩個(gè)隊(duì)列:計(jì)算資源隊(duì)列和最后期限隊(duì)列。其中,計(jì)算資源隊(duì)列中任務(wù)的優(yōu)先級是由計(jì)算成本來決定。計(jì)算計(jì)算成本時(shí)要乘以一個(gè)權(quán)值Weight,該權(quán)值的大小是由在Map函數(shù)、Reduce函數(shù)、ADDMap函數(shù)和ADDReduce函數(shù)中增加的參數(shù)M來決定。如果M等于1時(shí),Weight也等于1;如果M等于2時(shí),Weight也等于2;如果M等于3時(shí),Weight也等于3;如果M等于4時(shí),Weight也等于4。最后期限隊(duì)列的優(yōu)先級是由最后期限(deadline)來決定。并設(shè)置計(jì)算資源隊(duì)列的所有任務(wù)的優(yōu)先級都高于最后期限隊(duì)列中所有任務(wù),如果最后期限隊(duì)列存在有最后期限等于0的任務(wù),則將該任務(wù)直接插到計(jì)算資源隊(duì)列的隊(duì)首的位置。這樣一來,不僅確保了大任務(wù)的高效執(zhí)行,同時(shí)也照顧了小任務(wù)的執(zhí)行。改進(jìn)的算法取得了很好的性能。在文章的最后舉出相關(guān)的例子并利用Hadoop作了相應(yīng)的實(shí)驗(yàn)。
[Abstract]:With the rapid increase of heterogeneous data, cloud computing emerges as the times require. MapReduce, as a programming model for cloud computing, has also received widespread attention, especially in academia. In order to solve the problems of overlay and storage of intermediate data, many scholars have proposed many improved methods and formed their own programming models, such as Hadoop Twister and Haloop. In order to implement the iterative algorithm, the Loop Control mechanism is added to the Haloop model. This mechanism mainly adds two functions, namely, ADDMap and ADDReduce. the purpose of these two functions is to increase the number of iterations. At the same time, there is a corresponding control mechanism of loop in the Twister model. Similarly, in this paper, in order to better implement the iterative algorithm, not only the original interface and function, Moreover, the function of adding a parameter MKM to the Map function / reduce function / ADDMap function and ADDReduce function is mainly to distinguish four kinds of algorithms in scientific computation. If M equals 1, it represents the first kind of algorithm; if M equals 2, it represents the second kind of algorithm; if M equals 3, it represents the third kind of algorithm; if M equals 4, it represents the fourth kind of algorithm. Since the third and fourth algorithms are iterative algorithms, the functions and interfaces often used by the two algorithms are written as adapters. When experimenting, the developer can add the corresponding function body to the function body as needed. In order to ensure the security of the data, variables are declared as protected in the experiment. It can read and write data on the Slave node's local system without reading and writing data from the master node, which can not only reduce the workload of the master node, but also improve the running efficiency. Based on the shortcomings of various scheduling algorithms, an improved algorithm is proposed. The algorithm adds the following parameters: computation cost, task deadline and the processing speed of the customer service machine, and sets two queues: computational resource queue and deadline queue. The priority of computing tasks in resource queue is determined by computing cost. The cost is to be multiplied by a weight, the size of which is determined by the addition of parameters M to the Map function, the reduce function, the ADDMap function, and the ADDReduce function. If M is equal to 1, weight is equal to 1; if M equals 2, weight is equal to 2; if M is equal to 3, weight is equal to 3; if M is equal to 4, it is also equal to 4. The priority of the deadline queue is determined by the deadline (deadline). The priority of all tasks in the computation resource queue is higher than that in the deadline queue. If the deadline queue has a task with a deadline equal to 0, the task is inserted directly into the head of the computing resource queue. In this way, not only to ensure the efficient implementation of large tasks, but also to take care of the implementation of small tasks. The improved algorithm achieves good performance. At the end of this paper, some examples are given and Hadoop is used to do some experiments.
【學(xué)位授予單位】:安徽大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.1;TP338.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 吳吉義;平玲娣;潘雪增;李卓;;云計(jì)算:從概念到平臺(tái)[J];電信科學(xué);2009年12期
2 俞能海;郝卓;徐甲甲;張衛(wèi)明;張馳;;云安全研究進(jìn)展綜述[J];電子學(xué)報(bào);2013年02期
3 陳國良;孫廣中;徐云;龍柏;;并行計(jì)算的一體化研究現(xiàn)狀與發(fā)展趨勢[J];科學(xué)通報(bào);2009年08期
4 潘巍;李戰(zhàn)懷;伍賽;陳群;;基于消息傳遞機(jī)制的MapReduce圖算法研究[J];計(jì)算機(jī)學(xué)報(bào);2011年10期
,本文編號:2057845
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2057845.html
最近更新
教材專著