HDFS平臺上以能效為考量的小文件合并
發(fā)布時(shí)間:2018-10-18 09:40
【摘要】:為了解決Hadoop分布式文件系統(tǒng)(HDFS)平臺上小文件的存在帶來MapReduce程序運(yùn)行能耗成本偏高問題,建立Hadoop節(jié)點(diǎn)集群的能耗模型進(jìn)行分析推導(dǎo),證明了在Hadoop平臺上,存在能使程序運(yùn)行能耗成本最低的最優(yōu)文件大小,并在此基礎(chǔ)上結(jié)合經(jīng)濟(jì)學(xué)邊際分析理論提出一種基于能耗成本和訪問成本考慮的最優(yōu)文件大小判定策略.此策略可以對存放在HDFS上的小文件合并進(jìn)行效益計(jì)算,將小文件合并為成本最優(yōu)文件大小以獲得最佳收益.通過實(shí)驗(yàn)證明了能效最優(yōu)數(shù)據(jù)塊大小的存在,并證明了成本和效益相結(jié)合利用邊際分析理論來確定數(shù)據(jù)塊大小的合理性和有效性.
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者單位】: 中南大學(xué)軟件學(xué)院;河南大學(xué)軟件學(xué)院;北京信息科技大學(xué)計(jì)算機(jī)學(xué)院;
【基金】:國家自然科學(xué)基金項(xiàng)目(61272148;61301136) 高等學(xué)校博士學(xué)科點(diǎn)專項(xiàng)科研基金項(xiàng)目(20120162110061;20120162120091)
【分類號】:TP333
,
本文編號:2278736
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者單位】: 中南大學(xué)軟件學(xué)院;河南大學(xué)軟件學(xué)院;北京信息科技大學(xué)計(jì)算機(jī)學(xué)院;
【基金】:國家自然科學(xué)基金項(xiàng)目(61272148;61301136) 高等學(xué)校博士學(xué)科點(diǎn)專項(xiàng)科研基金項(xiàng)目(20120162110061;20120162120091)
【分類號】:TP333
,
本文編號:2278736
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2278736.html
最近更新
教材專著