面向Hadoop的應(yīng)用特性分析及系統(tǒng)性能優(yōu)化

發(fā)布時間：2018-09-16 21:48

【摘要】：Hadoop是目前使用最為廣泛的大數(shù)據(jù)處理系統(tǒng)。盡管Hadoop為大規(guī)模分布式數(shù)據(jù)處理提供了高效的解決方案,但是Hadoop系統(tǒng)仍然面臨著一系列的挑戰(zhàn):1)Hadoop對外提供的抽象編程接口隱藏了底層具體的實現(xiàn)細(xì)節(jié),難以對應(yīng)用程序進(jìn)行性能分析;2)Hadoop系統(tǒng)配置參數(shù)對系統(tǒng)性能有重要的影響,但默認(rèn)配置模式不能保證所有應(yīng)用程序獲得最佳的性能,需要有針對性地進(jìn)行配置參數(shù)調(diào)優(yōu);3)數(shù)據(jù)的頻繁移動嚴(yán)重制約大數(shù)據(jù)系統(tǒng)的性能,需要尋求新的解決方案以降低數(shù)據(jù)移動對大數(shù)據(jù)系統(tǒng)性能造成的不利影響。本文主要針對Hadoop系統(tǒng)中應(yīng)用程序的性能特性分析和性能優(yōu)化方案加以研究。首先,本文基于二進(jìn)制字節(jié)碼動態(tài)追蹤技術(shù)設(shè)計并實現(xiàn)了一個輕量級、非侵入式的分布式Hadoop應(yīng)用性能分析框架,能夠動態(tài)獲取應(yīng)用程序的運行時狀態(tài)并進(jìn)行性能分析,幫助用戶了解應(yīng)用程序在Hadoop系統(tǒng)中運行時的性能特性,進(jìn)而為應(yīng)用程序的優(yōu)化指明方向。其次,本文提出了一種針對動態(tài)資源分配場景的Hadoop應(yīng)用程序性能模型,并以該性能模型為基礎(chǔ)使用遺傳算法對全局的高維配置參數(shù)空間進(jìn)行搜索,從而解決Hadoop系統(tǒng)配置參數(shù)的調(diào)優(yōu)問題。本文提出的Hadoop應(yīng)用程序性能模型的預(yù)測錯誤率低于6%;相比于默認(rèn)配置,使用本文方案優(yōu)化后平均可以獲得9.52倍的性能提升,最高可獲得18.76倍的性能提升。最后,本文針對Hadoop系統(tǒng)中MapReduce應(yīng)用的數(shù)據(jù)并行處理特性提出了一種近數(shù)據(jù)處理系統(tǒng),提供了完整的軟硬件接口、動態(tài)任務(wù)遷移機(jī)制和運行時環(huán)境,并實現(xiàn)了一個輕量級的MapReduce框架,支持將Map任務(wù)和Reduce任務(wù)遷移至近數(shù)據(jù)處理單元中完成。相比于不采用近數(shù)據(jù)處理的基準(zhǔn)系統(tǒng),本文提出的近數(shù)據(jù)處理系統(tǒng)獲得了4.83倍性能提升,系統(tǒng)功耗可以降低26%;相比于采用近數(shù)據(jù)處理但不支持?jǐn)?shù)據(jù)并行處理的SMC系統(tǒng),本文提出的近數(shù)據(jù)處理系統(tǒng)功耗增加了37%,但獲得了2.32倍的性能提升。
[Abstract]:Hadoop is the most widely used big data processing system. Although Hadoop provides an efficient solution for large-scale distributed data processing, Hadoop systems still face a series of challenges: 1) the abstract programming interface provided by Hadoop hides the underlying implementation details. Hadoop system configuration parameters have a significant impact on system performance, but default configuration mode does not guarantee optimal performance for all applications. In order to reduce the adverse effect of data mobility on the performance of big data system, the frequent movement of configuration parameters is needed to restrict the performance of big data system seriously, and a new solution is needed to reduce the adverse effect caused by data mobility on the performance of big data system. In this paper, the performance characteristic analysis and performance optimization scheme of application program in Hadoop system are studied. Firstly, this paper designs and implements a lightweight, non-intrusive distributed Hadoop application performance analysis framework based on binary bytecode dynamic tracing technology, which can dynamically obtain the runtime state of the application and analyze its performance. To help users understand the performance characteristics of applications running in Hadoop systems, and then point out the direction of application optimization. Secondly, this paper proposes a Hadoop application performance model for dynamic resource allocation scenarios. Based on the performance model, genetic algorithm is used to search the global high-dimensional configuration parameter space. In order to solve the Hadoop system configuration parameters optimization problem. The prediction error rate of the Hadoop application performance model proposed in this paper is less than 6. Compared with the default configuration, the optimized scheme can achieve an average performance improvement of 9.52 times and a maximum performance improvement of 18.76 times. Finally, this paper presents a near data processing system based on the data parallel processing characteristics of MapReduce application in Hadoop system, which provides complete hardware and software interface, dynamic task migration mechanism and runtime environment. A lightweight MapReduce framework is implemented to support the migration of Map and Reduce tasks to near data processing units. Compared with the reference system without near data processing, the proposed near data processing system has achieved a 4.83 times performance improvement, and the power consumption of the system can be reduced by 26. Compared with the SMC system which uses near data processing but does not support data parallel processing, the proposed near data processing system can improve the performance of the system by 4.83 times and reduce the power consumption of the system by 26%. The power consumption of the proposed near data processing system is increased by 37 times, but the performance is improved by 2.32 times.
【學(xué)位授予單位】：浙江大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前3條

1 程學(xué)旗;靳小龍;王元卓;郭嘉豐;張鐵贏;李國杰;;大數(shù)據(jù)系統(tǒng)和分析技術(shù)綜述[J];軟件學(xué)報;2014年09期

2 宮學(xué)慶;金澈清;王曉玲;張蓉;周傲英;;數(shù)據(jù)密集型科學(xué)與工程:需求和挑戰(zhàn)[J];計算機(jī)學(xué)報;2012年08期

3 王鵬;孟丹;詹劍鋒;涂碧波;;數(shù)據(jù)密集型計算編程模型研究進(jìn)展[J];計算機(jī)研究與發(fā)展;2010年11期

，

本文編號：2244911

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2244911.html

上一篇：基于Petri網(wǎng)的業(yè)務(wù)流程分解工具研究
下一篇：基于KVMD-PWVD與LNMF的內(nèi)燃機(jī)振動譜圖像識別診斷方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向Hadoop的應(yīng)用特性分析及系統(tǒng)性能優(yōu)化