基于補(bǔ)償函數(shù)的Spark容錯(cuò)機(jī)制優(yōu)化

發(fā)布時(shí)間：2018-10-14 18:38

【摘要】：大數(shù)據(jù)時(shí)代,隨著數(shù)據(jù)量的增加和數(shù)據(jù)價(jià)值的發(fā)掘,分布式大數(shù)據(jù)計(jì)算系統(tǒng)已被企業(yè)和機(jī)構(gòu)廣泛的應(yīng)用與研究。伴隨分布式系統(tǒng)節(jié)點(diǎn)不斷增多,故障率也隨之提升,容錯(cuò)成為了分布式大數(shù)據(jù)計(jì)算系統(tǒng)研究的一項(xiàng)不可忽視的關(guān)鍵技術(shù)。在大數(shù)據(jù)應(yīng)用領(lǐng)域中,特別是數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí),迭代計(jì)算成為了其算法的一個(gè)主要特性,其通過(guò)反復(fù)迭代的過(guò)程,達(dá)到求解最優(yōu)解的目的。Spark作為新興的通用大數(shù)據(jù)處理框架,立足于內(nèi)存計(jì)算,在迭代計(jì)算中具有優(yōu)異的性能,迅速成為了最為流行的分布式大數(shù)據(jù)計(jì)算平臺(tái)。然而Spark主要采用Lineage機(jī)制實(shí)現(xiàn)數(shù)據(jù)的容錯(cuò),Lineage記錄一個(gè)數(shù)據(jù)集如何從其它數(shù)據(jù)集演變過(guò)來(lái),當(dāng)某塊分區(qū)數(shù)據(jù)丟失時(shí),Spark通過(guò)記錄的Lineage信息回溯丟失數(shù)據(jù)的依賴關(guān)系,重新計(jì)算丟失數(shù)據(jù),在迭代計(jì)算等長(zhǎng)任務(wù)場(chǎng)景中,存在重計(jì)算恢復(fù)時(shí)間過(guò)長(zhǎng)的問題。本文分析了迭代計(jì)算過(guò)程及其收斂性,得出迭代計(jì)算具有從不同的狀態(tài)收斂的穩(wěn)定性,提出一種基于補(bǔ)償函數(shù)的樂觀容錯(cuò)機(jī)制實(shí)現(xiàn)對(duì)數(shù)據(jù)的容錯(cuò),并使用此機(jī)制對(duì)Spark的容錯(cuò)機(jī)制進(jìn)行優(yōu)化。不同于傳統(tǒng)使用重計(jì)算恢復(fù)數(shù)據(jù)的容錯(cuò)方式,此機(jī)制在故障發(fā)生導(dǎo)致數(shù)據(jù)丟失時(shí),通過(guò)定義的補(bǔ)償函數(shù)快速生成補(bǔ)償值代替丟失的數(shù)據(jù),而不是重計(jì)算生成原始數(shù)據(jù),并保證整體數(shù)據(jù)集的一致性,使算法能夠繼續(xù)執(zhí)行,通過(guò)后續(xù)迭代過(guò)程校正數(shù)據(jù),并收斂到正確結(jié)果。在無(wú)故障時(shí),此機(jī)制采用樂觀的容錯(cuò)方式,不添加任何容錯(cuò)措施,不會(huì)造成額外開銷。實(shí)驗(yàn)結(jié)果表明基于補(bǔ)償函數(shù)的樂觀容錯(cuò)機(jī)制能夠有效保障迭代數(shù)據(jù)的可靠性,并且性能優(yōu)于現(xiàn)有的容錯(cuò)機(jī)制。
[Abstract]:In big data's time, with the increase of data volume and the discovery of data value, distributed big data computing system has been widely used and studied by enterprises and institutions. With the increasing number of nodes in distributed systems, the failure rate also increases. Fault tolerance has become a key technology in the research of distributed big data computing system. In the field of big data application, especially in data mining and machine learning, iterative computing has become one of the main characteristics of its algorithm. As a new general big data processing framework, Spark, which is based on memory computing, has excellent performance in iterative computing and has become the most popular platform for distributed big data computing. However, Spark mainly uses Lineage mechanism to implement data fault-tolerance. Lineage records how a dataset evolves from other data sets. When a block of data is lost, Spark can trace back the dependence of lost data through recorded Lineage information. Recalculating the lost data, there is a problem that the recalculation recovery time is too long in the iterative computation of equal length task scenario. In this paper, the process of iterative computation and its convergence are analyzed. It is concluded that iterative computation has the stability of convergence from different states. An optimistic fault-tolerant mechanism based on compensation function is proposed to realize the fault-tolerant of data. This mechanism is used to optimize the fault-tolerant mechanism of Spark. Different from the traditional fault-tolerant method of recalculating recovery data, when the fault occurs and results in data loss, the compensation value is generated by the defined compensation function to replace the lost data, instead of the original data generated by recalculation. The consistency of the whole data set is ensured so that the algorithm can continue to execute and correct the data through the subsequent iteration process and converge to the correct result. When there is no fault, the mechanism adopts optimistic fault-tolerant method and does not add any fault-tolerant measures. The experimental results show that the optimistic fault-tolerant mechanism based on compensation function can effectively guarantee the reliability of iterative data, and its performance is better than the existing fault-tolerant mechanism.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 英昌甜;于炯;卞琛;魯亮;錢育蓉;;并行計(jì)算框架Spark的自動(dòng)檢查點(diǎn)策略[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2017年02期

2 詹劍鋒;高婉鈴;王磊;李經(jīng)偉;魏凱;羅純杰;韓銳;田昕暉;姜春宇;;BigDataBench:開源的大數(shù)據(jù)系統(tǒng)評(píng)測(cè)基準(zhǔn)[J];計(jì)算機(jī)學(xué)報(bào);2016年01期

3 關(guān)國(guó)棟;滕飛;楊燕;;基于心跳超時(shí)機(jī)制的Hadoop實(shí)時(shí)容錯(cuò)技術(shù)[J];計(jì)算機(jī)應(yīng)用;2015年10期

4 宮婧;王文君;;大數(shù)據(jù)存儲(chǔ)中的容錯(cuò)關(guān)鍵技術(shù)綜述[J];南京郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年04期

5 印杰;江建慧;;復(fù)雜失效分布下的動(dòng)態(tài)檢查點(diǎn)設(shè)置[J];小型微型計(jì)算機(jī)系統(tǒng);2010年04期

相關(guān)碩士學(xué)位論文前3條

1 吳慶民;大數(shù)據(jù)環(huán)境下數(shù)據(jù)容錯(cuò)技術(shù)研究與實(shí)現(xiàn)[D];中國(guó)科學(xué)院大學(xué)(工程管理與信息技術(shù)學(xué)院);2016年

2 孫科;基于Spark的機(jī)器學(xué)習(xí)應(yīng)用框架研究與實(shí)現(xiàn)[D];上海交通大學(xué);2015年

3 梁彥;基于分布式平臺(tái)Spark和YARN的數(shù)據(jù)挖掘算法的并行化研究[D];中山大學(xué);2014年

，

本文編號(hào)：2271289

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2271289.html

上一篇：面向電子煙企業(yè)的MES系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
下一篇：局部形狀可調(diào)插值曲線曲面及其參數(shù)選取方案

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于補(bǔ)償函數(shù)的Spark容錯(cuò)機(jī)制優(yōu)化