Hadoop集群系統(tǒng)性能優(yōu)化的研究
發(fā)布時(shí)間:2019-03-17 18:20
【摘要】:云計(jì)算在商業(yè)和科學(xué)研究上的價(jià)值已漸漸被社會(huì)認(rèn)可。它可以在搜索引擎、互聯(lián)網(wǎng)應(yīng)用技術(shù)、大規(guī)模數(shù)據(jù)計(jì)算等方面發(fā)揮出巨大的能量。Hadoop技術(shù)作為云計(jì)算技術(shù)的開(kāi)源實(shí)現(xiàn),對(duì)云計(jì)算技術(shù)的發(fā)展起到了十分重要的作用,F(xiàn)在大多數(shù)的企業(yè)和科學(xué)研究采用了Hadoop作為云計(jì)算平臺(tái)。Hadoop憑借它簡(jiǎn)單的并行編程模型,龐大的數(shù)據(jù)存儲(chǔ)能力和高效的計(jì)算能力為用戶提供了良好的客戶體驗(yàn)。但是,由于Hadoop的發(fā)展時(shí)間比較短暫,系統(tǒng)中仍然有很多地方可以去完善和改進(jìn),才能更加充分地發(fā)揮其系統(tǒng)性能。因此對(duì)Hadoop系統(tǒng)性能的研究工作是必要并有意義的。 系統(tǒng)性能參數(shù)和任務(wù)級(jí)調(diào)度算法對(duì)Hadoop系統(tǒng)工作性能起著重要的影響,其中系統(tǒng)性能參數(shù)關(guān)系到集群工作各階段對(duì)系統(tǒng)資源的使用情況;任務(wù)級(jí)調(diào)度算法是Hadoop工作時(shí)任務(wù)分配的關(guān)鍵。參數(shù)值的確定與任務(wù)分配沒(méi)有統(tǒng)一的模型,是比較復(fù)雜的工作,目前對(duì)它們的研究還處于發(fā)展階段。因而我們從這兩方面對(duì)Hadoop系統(tǒng)性能的優(yōu)化進(jìn)行了研究。 本文著重對(duì)集群節(jié)點(diǎn)的執(zhí)行能力進(jìn)行了分析與研究。為使Hadoop集群系統(tǒng)能夠應(yīng)對(duì)多變的任務(wù)及集群節(jié)點(diǎn)自身的差異對(duì)系統(tǒng)工作性能帶來(lái)的影響,,設(shè)計(jì)TaskConfigure服務(wù)器及構(gòu)建了Hadoop集群參數(shù)信息系統(tǒng)對(duì)集群參數(shù)進(jìn)行自動(dòng)調(diào)優(yōu);并針對(duì)當(dāng)前Hadoop集群默認(rèn)運(yùn)行的任務(wù)級(jí)調(diào)度算法可能存在的負(fù)載分布不均的狀況,提出了一種基于節(jié)點(diǎn)能力的任務(wù)自適應(yīng)分配方法。其中,參數(shù)信息系統(tǒng)的實(shí)現(xiàn),采用節(jié)點(diǎn)資源利用效率生成集群系統(tǒng)參數(shù)的優(yōu)化配置值,再按節(jié)點(diǎn)和任務(wù)的分類為各類分配不同的配置參數(shù)值,這樣可保證節(jié)點(diǎn)在恰當(dāng)?shù)呐渲脜?shù)下執(zhí)行任務(wù);同時(shí),為了集群在執(zhí)行任務(wù)時(shí)各工作節(jié)點(diǎn)能夠保持負(fù)載相對(duì)均衡,以節(jié)點(diǎn)性能、任務(wù)特征、節(jié)點(diǎn)失效率等計(jì)算節(jié)點(diǎn)權(quán)值比例參數(shù)作為節(jié)點(diǎn)任務(wù)量調(diào)度分配的依據(jù),并判斷節(jié)點(diǎn)自身的負(fù)載狀態(tài),根據(jù)負(fù)載狀態(tài)值自適應(yīng)地調(diào)整運(yùn)行的任務(wù)量。通過(guò)實(shí)驗(yàn)表明,集群總的任務(wù)完成時(shí)間明顯地縮減,各節(jié)點(diǎn)的負(fù)載更加均衡,節(jié)點(diǎn)資源的利用更為合理,并且使集群具有良好的穩(wěn)定性和擴(kuò)展性。
[Abstract]:The value of cloud computing in business and scientific research has gradually been recognized by society. Hadoop technology, as the open source implementation of cloud computing technology, plays a very important role in the development of cloud computing technology. Now most enterprises and scientific research have adopted Hadoop as the cloud computing platform. Hadoop has provided a good customer experience for users with its simple parallel programming model, huge data storage capacity and efficient computing power. However, because the development time of Hadoop is relatively short, there are still many places in the system that can be improved and improved in order to give full play to its system performance. Therefore, it is necessary and meaningful to study the performance of Hadoop system. System performance parameters and task-level scheduling algorithms play an important role in the performance of Hadoop system, in which the system performance parameters are related to the use of system resources in each stage of cluster work. Task-level scheduling algorithm is the key to task assignment in Hadoop. There is no unified model for the determination of parameter values and assignment of tasks, which is a complex task, and the research on them is still in the stage of development. Therefore, we studied the performance optimization of Hadoop system from these two aspects. This paper focuses on the cluster node execution capacity analysis and research. In order to enable the Hadoop cluster system to cope with the changeable tasks and the impact of the cluster nodes' own differences on the performance of the system, the TaskConfigure server is designed and the Hadoop cluster parameter information system is constructed to optimize the cluster parameters automatically. In order to solve the problem of uneven load distribution in the current task-level scheduling algorithms running by default in Hadoop clusters, an adaptive task allocation method based on node capability is proposed. Among them, the implementation of parameter information system, using node resource utilization efficiency to generate the cluster system parameters of the optimal configuration value, and then according to the classification of nodes and tasks for the allocation of different configuration parameters, This ensures that the node can perform the task under the appropriate configuration parameters. At the same time, in order to keep the load balance among the nodes in the cluster, the weight ratio parameters such as node performance, task characteristics, node failure rate and so on are used as the basis of node task scheduling and assignment. The load state of the node itself is judged and the task quantity is adjusted adaptively according to the load state value. The experimental results show that the total task completion time of the cluster is significantly reduced, the load of each node is more balanced, the utilization of node resources is more reasonable, and the cluster has good stability and expansibility.
【學(xué)位授予單位】:遼寧師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP311.5
[Abstract]:The value of cloud computing in business and scientific research has gradually been recognized by society. Hadoop technology, as the open source implementation of cloud computing technology, plays a very important role in the development of cloud computing technology. Now most enterprises and scientific research have adopted Hadoop as the cloud computing platform. Hadoop has provided a good customer experience for users with its simple parallel programming model, huge data storage capacity and efficient computing power. However, because the development time of Hadoop is relatively short, there are still many places in the system that can be improved and improved in order to give full play to its system performance. Therefore, it is necessary and meaningful to study the performance of Hadoop system. System performance parameters and task-level scheduling algorithms play an important role in the performance of Hadoop system, in which the system performance parameters are related to the use of system resources in each stage of cluster work. Task-level scheduling algorithm is the key to task assignment in Hadoop. There is no unified model for the determination of parameter values and assignment of tasks, which is a complex task, and the research on them is still in the stage of development. Therefore, we studied the performance optimization of Hadoop system from these two aspects. This paper focuses on the cluster node execution capacity analysis and research. In order to enable the Hadoop cluster system to cope with the changeable tasks and the impact of the cluster nodes' own differences on the performance of the system, the TaskConfigure server is designed and the Hadoop cluster parameter information system is constructed to optimize the cluster parameters automatically. In order to solve the problem of uneven load distribution in the current task-level scheduling algorithms running by default in Hadoop clusters, an adaptive task allocation method based on node capability is proposed. Among them, the implementation of parameter information system, using node resource utilization efficiency to generate the cluster system parameters of the optimal configuration value, and then according to the classification of nodes and tasks for the allocation of different configuration parameters, This ensures that the node can perform the task under the appropriate configuration parameters. At the same time, in order to keep the load balance among the nodes in the cluster, the weight ratio parameters such as node performance, task characteristics, node failure rate and so on are used as the basis of node task scheduling and assignment. The load state of the node itself is judged and the task quantity is adjusted adaptively according to the load state value. The experimental results show that the total task completion time of the cluster is significantly reduced, the load of each node is more balanced, the utilization of node resources is more reasonable, and the cluster has good stability and expansibility.
【學(xué)位授予單位】:遼寧師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP311.5
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 辛大欣;劉飛;;Hadoop集群性能優(yōu)化技術(shù)研究[J];電腦知識(shí)與技術(shù);2011年22期
2 林偉偉;;一種改進(jìn)的Hadoop數(shù)據(jù)放置策略[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年01期
3 黃
本文編號(hào):2442568
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2442568.html
最近更新
教材專著