Hadoop平臺下調(diào)度算法和下載機制的優(yōu)化
發(fā)布時間:2019-05-10 00:16
【摘要】:在飛速發(fā)展的互聯(lián)網(wǎng)技術(shù)中,數(shù)據(jù)量的增長呈爆炸性的趨勢。數(shù)據(jù)作為信息的載體,在信息化的發(fā)展過程中占有舉足輕重的地位。海量數(shù)據(jù)的管理困難、高數(shù)據(jù)存儲成本、低可靠性和低安全性等是現(xiàn)在社會面臨的重大難題。更多的企業(yè)開始涉足于云計算領(lǐng)域,使用云計算進行數(shù)據(jù)的分布式計算和管理。云計算服務的優(yōu)勢在于其可靠性高、易于擴展、存儲容量大及處理速度快等特點,所以關(guān)于云計算服務系統(tǒng)的研究已經(jīng)成為了IT技術(shù)進一步發(fā)展的趨勢。論文以提高云計算實現(xiàn)平臺Hadoop中的數(shù)據(jù)處理速度為目標,深入地研究了MapReduce和HDFS內(nèi)部運行機制。 針對Hadoop運行環(huán)境的異構(gòu)性,為了使Hadoop能夠根據(jù)每個計算節(jié)點的運算能力進行合理的任務分配,提出了一種改良的自適應負載調(diào)節(jié)調(diào)度算法(SALS)。該算法將Hadoop調(diào)度算法和當前系統(tǒng)負載水平相結(jié)合,實現(xiàn)了自適應的調(diào)度算法,并改進了Hadoop原始的推測執(zhí)行算法,新的算法使得影響系統(tǒng)響應時間的掉隊者能得到更精確的判定,掉隊者任務的命中率得到了很大程度上提高,從而更加有效的提高整個系統(tǒng)的響應能力。 針對Hadoop中HDFS的內(nèi)部數(shù)據(jù)下載效率較低和可能出現(xiàn)的負載不均衡的問題,提出一種分布式文件并行下載算法。該算法從文件整體下載效率和數(shù)據(jù)塊的下載效率兩方面出發(fā),提出了相應的優(yōu)化方法,并在此基礎(chǔ)上引入P2P的多線程思想能夠有效地提高系統(tǒng)的下載效率。在傳統(tǒng)并行算法的基礎(chǔ)之上,引入了一種新的速度預測函數(shù)。該函數(shù)利用平均歷史下載速度和當前速度以實現(xiàn)對未來下載速度更精確的預測。實驗證明,與Hadoop自身的下載機制相比較,該算法能明顯改變系統(tǒng)的性能,以盡快的滿足用戶下載的需求。
[Abstract]:In the rapid development of Internet technology, the growth of the amount of data shows an explosive trend. As the carrier of information, data plays an important role in the development of information. The management difficulty of massive data, high data storage cost, low reliability and low security are the major problems faced by the society at present. More enterprises begin to dabble in the field of cloud computing, using cloud computing for distributed computing and management of data. The advantage of cloud computing service lies in its high reliability, easy expansion, large storage capacity and fast processing speed, so the research on cloud computing service system has become the trend of further development of IT technology. In order to improve the data processing speed in cloud computing implementation platform Hadoop, the internal running mechanism of MapReduce and HDFS is deeply studied in this paper. Aiming at the heterogeneity of Hadoop running environment, in order to enable Hadoop to allocate tasks reasonably according to the computing power of each computing node, an improved adaptive load adjustment scheduling algorithm (SALS). Is proposed. The algorithm combines the Hadoop scheduling algorithm with the current system load level, realizes the adaptive scheduling algorithm, and improves the original speculative execution algorithm of Hadoop. The new algorithm enables those who affect the response time of the system to get a more accurate decision. The hit rate of the left-behind task has been greatly improved, so as to improve the response ability of the whole system more effectively. In order to solve the problems of low internal data download efficiency and unbalanced load in HDFS in Hadoop, a distributed file parallel download algorithm is proposed. Based on the two aspects of file download efficiency and data block download efficiency, this algorithm puts forward the corresponding optimization method, and on this basis, the introduction of P2P multi-threading idea can effectively improve the download efficiency of the system. Based on the traditional parallel algorithm, a new speed prediction function is introduced. This function uses the average historical download speed and the current speed to achieve a more accurate prediction of the future download speed. The experimental results show that compared with the download mechanism of Hadoop itself, the algorithm can obviously change the performance of the system in order to meet the download needs of users as soon as possible.
【學位授予單位】:中南大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP3
本文編號:2473201
[Abstract]:In the rapid development of Internet technology, the growth of the amount of data shows an explosive trend. As the carrier of information, data plays an important role in the development of information. The management difficulty of massive data, high data storage cost, low reliability and low security are the major problems faced by the society at present. More enterprises begin to dabble in the field of cloud computing, using cloud computing for distributed computing and management of data. The advantage of cloud computing service lies in its high reliability, easy expansion, large storage capacity and fast processing speed, so the research on cloud computing service system has become the trend of further development of IT technology. In order to improve the data processing speed in cloud computing implementation platform Hadoop, the internal running mechanism of MapReduce and HDFS is deeply studied in this paper. Aiming at the heterogeneity of Hadoop running environment, in order to enable Hadoop to allocate tasks reasonably according to the computing power of each computing node, an improved adaptive load adjustment scheduling algorithm (SALS). Is proposed. The algorithm combines the Hadoop scheduling algorithm with the current system load level, realizes the adaptive scheduling algorithm, and improves the original speculative execution algorithm of Hadoop. The new algorithm enables those who affect the response time of the system to get a more accurate decision. The hit rate of the left-behind task has been greatly improved, so as to improve the response ability of the whole system more effectively. In order to solve the problems of low internal data download efficiency and unbalanced load in HDFS in Hadoop, a distributed file parallel download algorithm is proposed. Based on the two aspects of file download efficiency and data block download efficiency, this algorithm puts forward the corresponding optimization method, and on this basis, the introduction of P2P multi-threading idea can effectively improve the download efficiency of the system. Based on the traditional parallel algorithm, a new speed prediction function is introduced. This function uses the average historical download speed and the current speed to achieve a more accurate prediction of the future download speed. The experimental results show that compared with the download mechanism of Hadoop itself, the algorithm can obviously change the performance of the system in order to meet the download needs of users as soon as possible.
【學位授予單位】:中南大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP3
【參考文獻】
相關(guān)期刊論文 前5條
1 沈昌祥;張煥國;馮登國;曹珍富;黃繼武;;信息安全綜述[J];中國科學(E輯:信息科學);2007年02期
2 陳全;鄧倩妮;;異構(gòu)環(huán)境下自適應的Map-Reduce調(diào)度[J];計算機工程與科學;2009年S1期
3 曹寧;吳中海;劉宏志;張齊勛;;HDFS下載效率的優(yōu)化[J];計算機應用;2010年08期
4 徐非,楊廣文,鞠大鵬;基于Peer-to-Peer的分布式存儲系統(tǒng)的設(shè)計[J];軟件學報;2004年02期
5 溫小飛,朱宗柏,胡春枝,肖金生;高性能計算機集群的性能評價[J];武漢理工大學學報(信息與管理工程版);2005年04期
相關(guān)碩士學位論文 前1條
1 孟令芬;pc集群作業(yè)調(diào)度算法研究[D];中國石油大學;2009年
,本文編號:2473201
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2473201.html
最近更新
教材專著