基于補償函數(shù)的Spark容錯機制優(yōu)化
[Abstract]:In big data's time, with the increase of data volume and the discovery of data value, distributed big data computing system has been widely used and studied by enterprises and institutions. With the increasing number of nodes in distributed systems, the failure rate also increases. Fault tolerance has become a key technology in the research of distributed big data computing system. In the field of big data application, especially in data mining and machine learning, iterative computing has become one of the main characteristics of its algorithm. As a new general big data processing framework, Spark, which is based on memory computing, has excellent performance in iterative computing and has become the most popular platform for distributed big data computing. However, Spark mainly uses Lineage mechanism to implement data fault-tolerance. Lineage records how a dataset evolves from other data sets. When a block of data is lost, Spark can trace back the dependence of lost data through recorded Lineage information. Recalculating the lost data, there is a problem that the recalculation recovery time is too long in the iterative computation of equal length task scenario. In this paper, the process of iterative computation and its convergence are analyzed. It is concluded that iterative computation has the stability of convergence from different states. An optimistic fault-tolerant mechanism based on compensation function is proposed to realize the fault-tolerant of data. This mechanism is used to optimize the fault-tolerant mechanism of Spark. Different from the traditional fault-tolerant method of recalculating recovery data, when the fault occurs and results in data loss, the compensation value is generated by the defined compensation function to replace the lost data, instead of the original data generated by recalculation. The consistency of the whole data set is ensured so that the algorithm can continue to execute and correct the data through the subsequent iteration process and converge to the correct result. When there is no fault, the mechanism adopts optimistic fault-tolerant method and does not add any fault-tolerant measures. The experimental results show that the optimistic fault-tolerant mechanism based on compensation function can effectively guarantee the reliability of iterative data, and its performance is better than the existing fault-tolerant mechanism.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關(guān)期刊論文 前5條
1 英昌甜;于炯;卞琛;魯亮;錢育蓉;;并行計算框架Spark的自動檢查點策略[J];東南大學(xué)學(xué)報(自然科學(xué)版);2017年02期
2 詹劍鋒;高婉鈴;王磊;李經(jīng)偉;魏凱;羅純杰;韓銳;田昕暉;姜春宇;;BigDataBench:開源的大數(shù)據(jù)系統(tǒng)評測基準[J];計算機學(xué)報;2016年01期
3 關(guān)國棟;滕飛;楊燕;;基于心跳超時機制的Hadoop實時容錯技術(shù)[J];計算機應(yīng)用;2015年10期
4 宮婧;王文君;;大數(shù)據(jù)存儲中的容錯關(guān)鍵技術(shù)綜述[J];南京郵電大學(xué)學(xué)報(自然科學(xué)版);2014年04期
5 印杰;江建慧;;復(fù)雜失效分布下的動態(tài)檢查點設(shè)置[J];小型微型計算機系統(tǒng);2010年04期
相關(guān)碩士學(xué)位論文 前3條
1 吳慶民;大數(shù)據(jù)環(huán)境下數(shù)據(jù)容錯技術(shù)研究與實現(xiàn)[D];中國科學(xué)院大學(xué)(工程管理與信息技術(shù)學(xué)院);2016年
2 孫科;基于Spark的機器學(xué)習(xí)應(yīng)用框架研究與實現(xiàn)[D];上海交通大學(xué);2015年
3 梁彥;基于分布式平臺Spark和YARN的數(shù)據(jù)挖掘算法的并行化研究[D];中山大學(xué);2014年
,本文編號:2271289
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2271289.html