面向Hadoop的應(yīng)用特性分析及系統(tǒng)性能優(yōu)化
發(fā)布時(shí)間:2018-09-16 21:48
【摘要】:Hadoop是目前使用最為廣泛的大數(shù)據(jù)處理系統(tǒng)。盡管Hadoop為大規(guī)模分布式數(shù)據(jù)處理提供了高效的解決方案,但是Hadoop系統(tǒng)仍然面臨著一系列的挑戰(zhàn):1)Hadoop對(duì)外提供的抽象編程接口隱藏了底層具體的實(shí)現(xiàn)細(xì)節(jié),難以對(duì)應(yīng)用程序進(jìn)行性能分析;2)Hadoop系統(tǒng)配置參數(shù)對(duì)系統(tǒng)性能有重要的影響,但默認(rèn)配置模式不能保證所有應(yīng)用程序獲得最佳的性能,需要有針對(duì)性地進(jìn)行配置參數(shù)調(diào)優(yōu);3)數(shù)據(jù)的頻繁移動(dòng)嚴(yán)重制約大數(shù)據(jù)系統(tǒng)的性能,需要尋求新的解決方案以降低數(shù)據(jù)移動(dòng)對(duì)大數(shù)據(jù)系統(tǒng)性能造成的不利影響。本文主要針對(duì)Hadoop系統(tǒng)中應(yīng)用程序的性能特性分析和性能優(yōu)化方案加以研究。首先,本文基于二進(jìn)制字節(jié)碼動(dòng)態(tài)追蹤技術(shù)設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)輕量級(jí)、非侵入式的分布式Hadoop應(yīng)用性能分析框架,能夠動(dòng)態(tài)獲取應(yīng)用程序的運(yùn)行時(shí)狀態(tài)并進(jìn)行性能分析,幫助用戶(hù)了解應(yīng)用程序在Hadoop系統(tǒng)中運(yùn)行時(shí)的性能特性,進(jìn)而為應(yīng)用程序的優(yōu)化指明方向。其次,本文提出了一種針對(duì)動(dòng)態(tài)資源分配場(chǎng)景的Hadoop應(yīng)用程序性能模型,并以該性能模型為基礎(chǔ)使用遺傳算法對(duì)全局的高維配置參數(shù)空間進(jìn)行搜索,從而解決Hadoop系統(tǒng)配置參數(shù)的調(diào)優(yōu)問(wèn)題。本文提出的Hadoop應(yīng)用程序性能模型的預(yù)測(cè)錯(cuò)誤率低于6%;相比于默認(rèn)配置,使用本文方案優(yōu)化后平均可以獲得9.52倍的性能提升,最高可獲得18.76倍的性能提升。最后,本文針對(duì)Hadoop系統(tǒng)中MapReduce應(yīng)用的數(shù)據(jù)并行處理特性提出了一種近數(shù)據(jù)處理系統(tǒng),提供了完整的軟硬件接口、動(dòng)態(tài)任務(wù)遷移機(jī)制和運(yùn)行時(shí)環(huán)境,并實(shí)現(xiàn)了 一個(gè)輕量級(jí)的MapReduce框架,支持將Map任務(wù)和Reduce任務(wù)遷移至近數(shù)據(jù)處理單元中完成。相比于不采用近數(shù)據(jù)處理的基準(zhǔn)系統(tǒng),本文提出的近數(shù)據(jù)處理系統(tǒng)獲得了4.83倍性能提升,系統(tǒng)功耗可以降低26%;相比于采用近數(shù)據(jù)處理但不支持?jǐn)?shù)據(jù)并行處理的SMC系統(tǒng),本文提出的近數(shù)據(jù)處理系統(tǒng)功耗增加了37%,但獲得了2.32倍的性能提升。
[Abstract]:Hadoop is the most widely used big data processing system. Although Hadoop provides an efficient solution for large-scale distributed data processing, Hadoop systems still face a series of challenges: 1) the abstract programming interface provided by Hadoop hides the underlying implementation details. Hadoop system configuration parameters have a significant impact on system performance, but default configuration mode does not guarantee optimal performance for all applications. In order to reduce the adverse effect of data mobility on the performance of big data system, the frequent movement of configuration parameters is needed to restrict the performance of big data system seriously, and a new solution is needed to reduce the adverse effect caused by data mobility on the performance of big data system. In this paper, the performance characteristic analysis and performance optimization scheme of application program in Hadoop system are studied. Firstly, this paper designs and implements a lightweight, non-intrusive distributed Hadoop application performance analysis framework based on binary bytecode dynamic tracing technology, which can dynamically obtain the runtime state of the application and analyze its performance. To help users understand the performance characteristics of applications running in Hadoop systems, and then point out the direction of application optimization. Secondly, this paper proposes a Hadoop application performance model for dynamic resource allocation scenarios. Based on the performance model, genetic algorithm is used to search the global high-dimensional configuration parameter space. In order to solve the Hadoop system configuration parameters optimization problem. The prediction error rate of the Hadoop application performance model proposed in this paper is less than 6. Compared with the default configuration, the optimized scheme can achieve an average performance improvement of 9.52 times and a maximum performance improvement of 18.76 times. Finally, this paper presents a near data processing system based on the data parallel processing characteristics of MapReduce application in Hadoop system, which provides complete hardware and software interface, dynamic task migration mechanism and runtime environment. A lightweight MapReduce framework is implemented to support the migration of Map and Reduce tasks to near data processing units. Compared with the reference system without near data processing, the proposed near data processing system has achieved a 4.83 times performance improvement, and the power consumption of the system can be reduced by 26. Compared with the SMC system which uses near data processing but does not support data parallel processing, the proposed near data processing system can improve the performance of the system by 4.83 times and reduce the power consumption of the system by 26%. The power consumption of the proposed near data processing system is increased by 37 times, but the performance is improved by 2.32 times.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP311.13
本文編號(hào):2244911
[Abstract]:Hadoop is the most widely used big data processing system. Although Hadoop provides an efficient solution for large-scale distributed data processing, Hadoop systems still face a series of challenges: 1) the abstract programming interface provided by Hadoop hides the underlying implementation details. Hadoop system configuration parameters have a significant impact on system performance, but default configuration mode does not guarantee optimal performance for all applications. In order to reduce the adverse effect of data mobility on the performance of big data system, the frequent movement of configuration parameters is needed to restrict the performance of big data system seriously, and a new solution is needed to reduce the adverse effect caused by data mobility on the performance of big data system. In this paper, the performance characteristic analysis and performance optimization scheme of application program in Hadoop system are studied. Firstly, this paper designs and implements a lightweight, non-intrusive distributed Hadoop application performance analysis framework based on binary bytecode dynamic tracing technology, which can dynamically obtain the runtime state of the application and analyze its performance. To help users understand the performance characteristics of applications running in Hadoop systems, and then point out the direction of application optimization. Secondly, this paper proposes a Hadoop application performance model for dynamic resource allocation scenarios. Based on the performance model, genetic algorithm is used to search the global high-dimensional configuration parameter space. In order to solve the Hadoop system configuration parameters optimization problem. The prediction error rate of the Hadoop application performance model proposed in this paper is less than 6. Compared with the default configuration, the optimized scheme can achieve an average performance improvement of 9.52 times and a maximum performance improvement of 18.76 times. Finally, this paper presents a near data processing system based on the data parallel processing characteristics of MapReduce application in Hadoop system, which provides complete hardware and software interface, dynamic task migration mechanism and runtime environment. A lightweight MapReduce framework is implemented to support the migration of Map and Reduce tasks to near data processing units. Compared with the reference system without near data processing, the proposed near data processing system has achieved a 4.83 times performance improvement, and the power consumption of the system can be reduced by 26. Compared with the SMC system which uses near data processing but does not support data parallel processing, the proposed near data processing system can improve the performance of the system by 4.83 times and reduce the power consumption of the system by 26%. The power consumption of the proposed near data processing system is increased by 37 times, but the performance is improved by 2.32 times.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 程學(xué)旗;靳小龍;王元卓;郭嘉豐;張鐵贏;李國(guó)杰;;大數(shù)據(jù)系統(tǒng)和分析技術(shù)綜述[J];軟件學(xué)報(bào);2014年09期
2 宮學(xué)慶;金澈清;王曉玲;張蓉;周傲英;;數(shù)據(jù)密集型科學(xué)與工程:需求和挑戰(zhàn)[J];計(jì)算機(jī)學(xué)報(bào);2012年08期
3 王鵬;孟丹;詹劍鋒;涂碧波;;數(shù)據(jù)密集型計(jì)算編程模型研究進(jìn)展[J];計(jì)算機(jī)研究與發(fā)展;2010年11期
,本文編號(hào):2244911
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2244911.html
最近更新
教材專(zhuān)著