Hadoop集群系統(tǒng)性能優(yōu)化的研究

發(fā)布時間：2019-03-17 18:20

【摘要】：云計算在商業(yè)和科學研究上的價值已漸漸被社會認可。它可以在搜索引擎、互聯(lián)網(wǎng)應(yīng)用技術(shù)、大規(guī)模數(shù)據(jù)計算等方面發(fā)揮出巨大的能量。Hadoop技術(shù)作為云計算技術(shù)的開源實現(xiàn)，對云計算技術(shù)的發(fā)展起到了十分重要的作用�，F(xiàn)在大多數(shù)的企業(yè)和科學研究采用了Hadoop作為云計算平臺。Hadoop憑借它簡單的并行編程模型，龐大的數(shù)據(jù)存儲能力和高效的計算能力為用戶提供了良好的客戶體驗。但是，由于Hadoop的發(fā)展時間比較短暫，系統(tǒng)中仍然有很多地方可以去完善和改進，才能更加充分地發(fā)揮其系統(tǒng)性能。因此對Hadoop系統(tǒng)性能的研究工作是必要并有意義的。系統(tǒng)性能參數(shù)和任務(wù)級調(diào)度算法對Hadoop系統(tǒng)工作性能起著重要的影響，其中系統(tǒng)性能參數(shù)關(guān)系到集群工作各階段對系統(tǒng)資源的使用情況；任務(wù)級調(diào)度算法是Hadoop工作時任務(wù)分配的關(guān)鍵。參數(shù)值的確定與任務(wù)分配沒有統(tǒng)一的模型，是比較復(fù)雜的工作，目前對它們的研究還處于發(fā)展階段。因而我們從這兩方面對Hadoop系統(tǒng)性能的優(yōu)化進行了研究。本文著重對集群節(jié)點的執(zhí)行能力進行了分析與研究。為使Hadoop集群系統(tǒng)能夠應(yīng)對多變的任務(wù)及集群節(jié)點自身的差異對系統(tǒng)工作性能帶來的影響，，設(shè)計TaskConfigure服務(wù)器及構(gòu)建了Hadoop集群參數(shù)信息系統(tǒng)對集群參數(shù)進行自動調(diào)優(yōu)；并針對當前Hadoop集群默認運行的任務(wù)級調(diào)度算法可能存在的負載分布不均的狀況，提出了一種基于節(jié)點能力的任務(wù)自適應(yīng)分配方法。其中，參數(shù)信息系統(tǒng)的實現(xiàn)，采用節(jié)點資源利用效率生成集群系統(tǒng)參數(shù)的優(yōu)化配置值，再按節(jié)點和任務(wù)的分類為各類分配不同的配置參數(shù)值，這樣可保證節(jié)點在恰當?shù)呐渲脜?shù)下執(zhí)行任務(wù)；同時，為了集群在執(zhí)行任務(wù)時各工作節(jié)點能夠保持負載相對均衡，以節(jié)點性能、任務(wù)特征、節(jié)點失效率等計算節(jié)點權(quán)值比例參數(shù)作為節(jié)點任務(wù)量調(diào)度分配的依據(jù)，并判斷節(jié)點自身的負載狀態(tài)，根據(jù)負載狀態(tài)值自適應(yīng)地調(diào)整運行的任務(wù)量。通過實驗表明，集群總的任務(wù)完成時間明顯地縮減，各節(jié)點的負載更加均衡，節(jié)點資源的利用更為合理，并且使集群具有良好的穩(wěn)定性和擴展性。
[Abstract]:The value of cloud computing in business and scientific research has gradually been recognized by society. Hadoop technology, as the open source implementation of cloud computing technology, plays a very important role in the development of cloud computing technology. Now most enterprises and scientific research have adopted Hadoop as the cloud computing platform. Hadoop has provided a good customer experience for users with its simple parallel programming model, huge data storage capacity and efficient computing power. However, because the development time of Hadoop is relatively short, there are still many places in the system that can be improved and improved in order to give full play to its system performance. Therefore, it is necessary and meaningful to study the performance of Hadoop system. System performance parameters and task-level scheduling algorithms play an important role in the performance of Hadoop system, in which the system performance parameters are related to the use of system resources in each stage of cluster work. Task-level scheduling algorithm is the key to task assignment in Hadoop. There is no unified model for the determination of parameter values and assignment of tasks, which is a complex task, and the research on them is still in the stage of development. Therefore, we studied the performance optimization of Hadoop system from these two aspects. This paper focuses on the cluster node execution capacity analysis and research. In order to enable the Hadoop cluster system to cope with the changeable tasks and the impact of the cluster nodes' own differences on the performance of the system, the TaskConfigure server is designed and the Hadoop cluster parameter information system is constructed to optimize the cluster parameters automatically. In order to solve the problem of uneven load distribution in the current task-level scheduling algorithms running by default in Hadoop clusters, an adaptive task allocation method based on node capability is proposed. Among them, the implementation of parameter information system, using node resource utilization efficiency to generate the cluster system parameters of the optimal configuration value, and then according to the classification of nodes and tasks for the allocation of different configuration parameters, This ensures that the node can perform the task under the appropriate configuration parameters. At the same time, in order to keep the load balance among the nodes in the cluster, the weight ratio parameters such as node performance, task characteristics, node failure rate and so on are used as the basis of node task scheduling and assignment. The load state of the node itself is judged and the task quantity is adjusted adaptively according to the load state value. The experimental results show that the total task completion time of the cluster is significantly reduced, the load of each node is more balanced, the utilization of node resources is more reasonable, and the cluster has good stability and expansibility.
【學位授予單位】：遼寧師范大學
【學位級別】：碩士
【學位授予年份】：2013
【分類號】：TP311.5

【參考文獻】

相關(guān)期刊論文前6條

1 辛大欣;劉飛;;Hadoop集群性能優(yōu)化技術(shù)研究[J];電腦知識與技術(shù);2011年22期

2 林偉偉;;一種改進的Hadoop數(shù)據(jù)放置策略[J];華南理工大學學報(自然科學版);2012年01期

3 黃

本文編號：2442568

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2442568.html

上一篇：基于深度學習的學術(shù)搜索引擎——Semantic Scholar
下一篇：科技論文的語義搜索研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Hadoop集群系統(tǒng)性能優(yōu)化的研究