基于MapReduce的多元連接優(yōu)化方法
發(fā)布時(shí)間:2018-07-21 17:27
【摘要】:多元連接是數(shù)據(jù)分析最常用的操作之一,MapReduce是廣泛用于大規(guī)模數(shù)據(jù)分析處理的編程模型,它給多元連接優(yōu)化帶來新的挑戰(zhàn):傳統(tǒng)的優(yōu)化方法不能簡(jiǎn)單地適用到MapReduce中;MapReduce連接執(zhí)行算法尚存優(yōu)化空間.針對(duì)前者,考慮到I/O代價(jià)是連接運(yùn)算的主要代價(jià),首先以降低I/O代價(jià)為目標(biāo)提出一種啟發(fā)式算法確定多元連接執(zhí)行順序,并在此基礎(chǔ)上進(jìn)一步優(yōu)化,最后針對(duì)MapReduce設(shè)計(jì)一種并行執(zhí)行策略提高多元連接的整體性能.針對(duì)后者,考慮到負(fù)載均衡能夠有效減少M(fèi)apReduce的"木桶效應(yīng)",通過任務(wù)公平分配算法提高連接內(nèi)部的并行度,并在此基礎(chǔ)上給出Reduce任務(wù)個(gè)數(shù)的確定方法.最后,通過實(shí)驗(yàn)驗(yàn)證本文提出的執(zhí)行計(jì)劃確定方法以及負(fù)載均衡算法的優(yōu)化效果.該研究對(duì)大數(shù)據(jù)環(huán)境下MapReduce多元連接的應(yīng)用具有指導(dǎo)意義,可以優(yōu)化如OLAP分析中的星型連接、社交網(wǎng)絡(luò)中社團(tuán)發(fā)現(xiàn)的鏈?zhǔn)竭B接等應(yīng)用的性能.
[Abstract]:Multivariate connection is one of the most commonly used operations in data analysis. MapReduce is a programming model widely used in large-scale data analysis and processing. It brings a new challenge to the multivariate connection optimization: the traditional optimization method can not be applied to MapReduce simply. There is still optimization space in the MapReduce connection execution algorithm. For the former, considering that I / O cost is the main cost of join operation, a heuristic algorithm is proposed to determine the order of multiple join execution with the aim of reducing I / O cost. Finally, a parallel execution strategy is designed for MapReduce to improve the overall performance of multiple connections. In view of the latter, considering that load balancing can effectively reduce the "bucket effect" of MapReduce, the parallel degree within the join is improved by using the task fair assignment algorithm, and the method of determining the number of reduce tasks is given. Finally, the proposed execution plan determination method and the optimization effect of load balancing algorithm are verified by experiments. This study is of guiding significance for the application of MapReduce multivariate connections in big data environment, and can optimize the performance of applications such as star connections in OLAP analysis and chain connections found in social networks.
【作者單位】: 東北大學(xué)計(jì)算機(jī)科學(xué)與工程學(xué)院;東北大學(xué)軟件學(xué)院;
【基金】:國家自然科學(xué)基金重大項(xiàng)目(61433008);國家自然科學(xué)基金青年基金項(xiàng)目(61202088) 國家博士后科學(xué)基金面上項(xiàng)目(2013M540232) 中央高;究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)基金項(xiàng)目(N120817001) 教育部高等學(xué)校博士學(xué)科點(diǎn)博導(dǎo)基金項(xiàng)目(20120042110028)~~
【分類號(hào)】:TP311.13
本文編號(hào):2136254
[Abstract]:Multivariate connection is one of the most commonly used operations in data analysis. MapReduce is a programming model widely used in large-scale data analysis and processing. It brings a new challenge to the multivariate connection optimization: the traditional optimization method can not be applied to MapReduce simply. There is still optimization space in the MapReduce connection execution algorithm. For the former, considering that I / O cost is the main cost of join operation, a heuristic algorithm is proposed to determine the order of multiple join execution with the aim of reducing I / O cost. Finally, a parallel execution strategy is designed for MapReduce to improve the overall performance of multiple connections. In view of the latter, considering that load balancing can effectively reduce the "bucket effect" of MapReduce, the parallel degree within the join is improved by using the task fair assignment algorithm, and the method of determining the number of reduce tasks is given. Finally, the proposed execution plan determination method and the optimization effect of load balancing algorithm are verified by experiments. This study is of guiding significance for the application of MapReduce multivariate connections in big data environment, and can optimize the performance of applications such as star connections in OLAP analysis and chain connections found in social networks.
【作者單位】: 東北大學(xué)計(jì)算機(jī)科學(xué)與工程學(xué)院;東北大學(xué)軟件學(xué)院;
【基金】:國家自然科學(xué)基金重大項(xiàng)目(61433008);國家自然科學(xué)基金青年基金項(xiàng)目(61202088) 國家博士后科學(xué)基金面上項(xiàng)目(2013M540232) 中央高;究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)基金項(xiàng)目(N120817001) 教育部高等學(xué)校博士學(xué)科點(diǎn)博導(dǎo)基金項(xiàng)目(20120042110028)~~
【分類號(hào)】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前2條
1 張海;馬建紅;;基于HDFS的小文件存儲(chǔ)與讀取優(yōu)化策略[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2014年05期
2 ;[J];;年期
相關(guān)碩士學(xué)位論文 前1條
1 毛仁偉;大型多人在線游戲中負(fù)載均衡及相關(guān)技術(shù)的研究[D];首都師范大學(xué);2014年
,本文編號(hào):2136254
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2136254.html
最近更新
教材專著