MapReduce在科學(xué)計算中的研究與改進(jìn)

發(fā)布時間：2018-06-23 17:24

本文選題：MapRednce + 云計算　；參考：《安徽大學(xué)》2013年碩士論文

【摘要】：隨著異構(gòu)數(shù)據(jù)的急劇增加,云計算應(yīng)運(yùn)而生。作為云計算的編程模型MapReduce同樣也得到了廣泛的關(guān)注,特別是在學(xué)術(shù)界。為了解決覆蓋及中間數(shù)據(jù)的存儲等諸多問題,諸多學(xué)者提出了許多地改進(jìn)辦法并形成了自己的編程模型,如有Hadoop、Twister和Haloop等。為了能夠?qū)崿F(xiàn)迭代算法,Haloop模型中增加了Loop Control機(jī)制,該機(jī)制在具體的實施時主要是增加了兩個函數(shù),即ADDMap和ADDReduce,這兩個函數(shù)的目的就在于來增加其迭代的次數(shù)。同時在Twister模型中也有相應(yīng)控制loop的機(jī)制。同樣,在本文中為了更好的執(zhí)行具有迭代的算法,不但保持了原有的接口和函數(shù),而且還在Map函數(shù)、Reduce函數(shù)、ADDMap函數(shù)和ADDReduce函數(shù)中增加了一個參數(shù)M,M的作用主要是來區(qū)分科學(xué)計算中的四類算法的。如果M等于1就代表是第一類算法；如果M等于2時就代表第二類算法；如果M等于3時就代表第三類算法；如果M等于4時就代表第四類算法。由于第三類和第四類算法都是具有迭代的算法,這時把該兩類算法經(jīng)常要用到的函數(shù)及接口都寫成了適配器。在具體做實驗時,開發(fā)人員就可以根據(jù)需要往函數(shù)體里面增加相應(yīng)的函數(shù)體。為了確保數(shù)據(jù)的安全,在實驗時變量被聲明成保護(hù)型。把那些變化不大的數(shù)據(jù)放在緩沖池,這樣一來就可在Slave節(jié)點的本地系統(tǒng)上讀寫數(shù)據(jù),而不用從Master節(jié)點上讀寫數(shù)據(jù),這樣不但可以減輕Master節(jié)點的工作量,而且可以提高運(yùn)行效率。基于種種調(diào)度算法的缺點,提出改進(jìn)的算法。該算法增加如下參數(shù)：計算成本,任務(wù)的最后期限和客服端機(jī)器的處理速度等參數(shù),還設(shè)置了兩個隊列：計算資源隊列和最后期限隊列。其中,計算資源隊列中任務(wù)的優(yōu)先級是由計算成本來決定。計算計算成本時要乘以一個權(quán)值Weight,該權(quán)值的大小是由在Map函數(shù)、Reduce函數(shù)、ADDMap函數(shù)和ADDReduce函數(shù)中增加的參數(shù)M來決定。如果M等于1時,Weight也等于1；如果M等于2時,Weight也等于2；如果M等于3時,Weight也等于3；如果M等于4時,Weight也等于4。最后期限隊列的優(yōu)先級是由最后期限(deadline)來決定。并設(shè)置計算資源隊列的所有任務(wù)的優(yōu)先級都高于最后期限隊列中所有任務(wù),如果最后期限隊列存在有最后期限等于0的任務(wù),則將該任務(wù)直接插到計算資源隊列的隊首的位置。這樣一來,不僅確保了大任務(wù)的高效執(zhí)行,同時也照顧了小任務(wù)的執(zhí)行。改進(jìn)的算法取得了很好的性能。在文章的最后舉出相關(guān)的例子并利用Hadoop作了相應(yīng)的實驗。
[Abstract]:With the rapid increase of heterogeneous data, cloud computing emerges as the times require. MapReduce, as a programming model for cloud computing, has also received widespread attention, especially in academia. In order to solve the problems of overlay and storage of intermediate data, many scholars have proposed many improved methods and formed their own programming models, such as Hadoop Twister and Haloop. In order to implement the iterative algorithm, the Loop Control mechanism is added to the Haloop model. This mechanism mainly adds two functions, namely, ADDMap and ADDReduce. the purpose of these two functions is to increase the number of iterations. At the same time, there is a corresponding control mechanism of loop in the Twister model. Similarly, in this paper, in order to better implement the iterative algorithm, not only the original interface and function, Moreover, the function of adding a parameter MKM to the Map function / reduce function / ADDMap function and ADDReduce function is mainly to distinguish four kinds of algorithms in scientific computation. If M equals 1, it represents the first kind of algorithm; if M equals 2, it represents the second kind of algorithm; if M equals 3, it represents the third kind of algorithm; if M equals 4, it represents the fourth kind of algorithm. Since the third and fourth algorithms are iterative algorithms, the functions and interfaces often used by the two algorithms are written as adapters. When experimenting, the developer can add the corresponding function body to the function body as needed. In order to ensure the security of the data, variables are declared as protected in the experiment. It can read and write data on the Slave node's local system without reading and writing data from the master node, which can not only reduce the workload of the master node, but also improve the running efficiency. Based on the shortcomings of various scheduling algorithms, an improved algorithm is proposed. The algorithm adds the following parameters: computation cost, task deadline and the processing speed of the customer service machine, and sets two queues: computational resource queue and deadline queue. The priority of computing tasks in resource queue is determined by computing cost. The cost is to be multiplied by a weight, the size of which is determined by the addition of parameters M to the Map function, the reduce function, the ADDMap function, and the ADDReduce function. If M is equal to 1, weight is equal to 1; if M equals 2, weight is equal to 2; if M is equal to 3, weight is equal to 3; if M is equal to 4, it is also equal to 4. The priority of the deadline queue is determined by the deadline (deadline). The priority of all tasks in the computation resource queue is higher than that in the deadline queue. If the deadline queue has a task with a deadline equal to 0, the task is inserted directly into the head of the computing resource queue. In this way, not only to ensure the efficient implementation of large tasks, but also to take care of the implementation of small tasks. The improved algorithm achieves good performance. At the end of this paper, some examples are given and Hadoop is used to do some experiments.
【學(xué)位授予單位】：安徽大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP311.1;TP338.6

【參考文獻(xiàn)】

相關(guān)期刊論文前4條

1 吳吉義;平玲娣;潘雪增;李卓;;云計算:從概念到平臺[J];電信科學(xué);2009年12期

2 俞能海;郝卓;徐甲甲;張衛(wèi)明;張馳;;云安全研究進(jìn)展綜述[J];電子學(xué)報;2013年02期

3 陳國良;孫廣中;徐云;龍柏;;并行計算的一體化研究現(xiàn)狀與發(fā)展趨勢[J];科學(xué)通報;2009年08期

4 潘巍;李戰(zhàn)懷;伍賽;陳群;;基于消息傳遞機(jī)制的MapReduce圖算法研究[J];計算機(jī)學(xué)報;2011年10期

，

本文編號：2057845

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2057845.html

上一篇：基于光纖和USB的成像集成系統(tǒng)
下一篇：一種支持全雙工數(shù)據(jù)傳輸?shù)亩嗤ǖ繢MA控制器設(shè)計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

MapReduce在科學(xué)計算中的研究與改進(jìn)