天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于補償函數(shù)的Spark容錯機制優(yōu)化

發(fā)布時間:2018-10-14 18:38
【摘要】:大數(shù)據(jù)時代,隨著數(shù)據(jù)量的增加和數(shù)據(jù)價值的發(fā)掘,分布式大數(shù)據(jù)計算系統(tǒng)已被企業(yè)和機構(gòu)廣泛的應(yīng)用與研究。伴隨分布式系統(tǒng)節(jié)點不斷增多,故障率也隨之提升,容錯成為了分布式大數(shù)據(jù)計算系統(tǒng)研究的一項不可忽視的關(guān)鍵技術(shù)。在大數(shù)據(jù)應(yīng)用領(lǐng)域中,特別是數(shù)據(jù)挖掘和機器學(xué)習(xí),迭代計算成為了其算法的一個主要特性,其通過反復(fù)迭代的過程,達到求解最優(yōu)解的目的。Spark作為新興的通用大數(shù)據(jù)處理框架,立足于內(nèi)存計算,在迭代計算中具有優(yōu)異的性能,迅速成為了最為流行的分布式大數(shù)據(jù)計算平臺。然而Spark主要采用Lineage機制實現(xiàn)數(shù)據(jù)的容錯,Lineage記錄一個數(shù)據(jù)集如何從其它數(shù)據(jù)集演變過來,當某塊分區(qū)數(shù)據(jù)丟失時,Spark通過記錄的Lineage信息回溯丟失數(shù)據(jù)的依賴關(guān)系,重新計算丟失數(shù)據(jù),在迭代計算等長任務(wù)場景中,存在重計算恢復(fù)時間過長的問題。本文分析了迭代計算過程及其收斂性,得出迭代計算具有從不同的狀態(tài)收斂的穩(wěn)定性,提出一種基于補償函數(shù)的樂觀容錯機制實現(xiàn)對數(shù)據(jù)的容錯,并使用此機制對Spark的容錯機制進行優(yōu)化。不同于傳統(tǒng)使用重計算恢復(fù)數(shù)據(jù)的容錯方式,此機制在故障發(fā)生導(dǎo)致數(shù)據(jù)丟失時,通過定義的補償函數(shù)快速生成補償值代替丟失的數(shù)據(jù),而不是重計算生成原始數(shù)據(jù),并保證整體數(shù)據(jù)集的一致性,使算法能夠繼續(xù)執(zhí)行,通過后續(xù)迭代過程校正數(shù)據(jù),并收斂到正確結(jié)果。在無故障時,此機制采用樂觀的容錯方式,不添加任何容錯措施,不會造成額外開銷。實驗結(jié)果表明基于補償函數(shù)的樂觀容錯機制能夠有效保障迭代數(shù)據(jù)的可靠性,并且性能優(yōu)于現(xiàn)有的容錯機制。
[Abstract]:In big data's time, with the increase of data volume and the discovery of data value, distributed big data computing system has been widely used and studied by enterprises and institutions. With the increasing number of nodes in distributed systems, the failure rate also increases. Fault tolerance has become a key technology in the research of distributed big data computing system. In the field of big data application, especially in data mining and machine learning, iterative computing has become one of the main characteristics of its algorithm. As a new general big data processing framework, Spark, which is based on memory computing, has excellent performance in iterative computing and has become the most popular platform for distributed big data computing. However, Spark mainly uses Lineage mechanism to implement data fault-tolerance. Lineage records how a dataset evolves from other data sets. When a block of data is lost, Spark can trace back the dependence of lost data through recorded Lineage information. Recalculating the lost data, there is a problem that the recalculation recovery time is too long in the iterative computation of equal length task scenario. In this paper, the process of iterative computation and its convergence are analyzed. It is concluded that iterative computation has the stability of convergence from different states. An optimistic fault-tolerant mechanism based on compensation function is proposed to realize the fault-tolerant of data. This mechanism is used to optimize the fault-tolerant mechanism of Spark. Different from the traditional fault-tolerant method of recalculating recovery data, when the fault occurs and results in data loss, the compensation value is generated by the defined compensation function to replace the lost data, instead of the original data generated by recalculation. The consistency of the whole data set is ensured so that the algorithm can continue to execute and correct the data through the subsequent iteration process and converge to the correct result. When there is no fault, the mechanism adopts optimistic fault-tolerant method and does not add any fault-tolerant measures. The experimental results show that the optimistic fault-tolerant mechanism based on compensation function can effectively guarantee the reliability of iterative data, and its performance is better than the existing fault-tolerant mechanism.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前5條

1 英昌甜;于炯;卞琛;魯亮;錢育蓉;;并行計算框架Spark的自動檢查點策略[J];東南大學(xué)學(xué)報(自然科學(xué)版);2017年02期

2 詹劍鋒;高婉鈴;王磊;李經(jīng)偉;魏凱;羅純杰;韓銳;田昕暉;姜春宇;;BigDataBench:開源的大數(shù)據(jù)系統(tǒng)評測基準[J];計算機學(xué)報;2016年01期

3 關(guān)國棟;滕飛;楊燕;;基于心跳超時機制的Hadoop實時容錯技術(shù)[J];計算機應(yīng)用;2015年10期

4 宮婧;王文君;;大數(shù)據(jù)存儲中的容錯關(guān)鍵技術(shù)綜述[J];南京郵電大學(xué)學(xué)報(自然科學(xué)版);2014年04期

5 印杰;江建慧;;復(fù)雜失效分布下的動態(tài)檢查點設(shè)置[J];小型微型計算機系統(tǒng);2010年04期

相關(guān)碩士學(xué)位論文 前3條

1 吳慶民;大數(shù)據(jù)環(huán)境下數(shù)據(jù)容錯技術(shù)研究與實現(xiàn)[D];中國科學(xué)院大學(xué)(工程管理與信息技術(shù)學(xué)院);2016年

2 孫科;基于Spark的機器學(xué)習(xí)應(yīng)用框架研究與實現(xiàn)[D];上海交通大學(xué);2015年

3 梁彥;基于分布式平臺Spark和YARN的數(shù)據(jù)挖掘算法的并行化研究[D];中山大學(xué);2014年

,

本文編號:2271289

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2271289.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶db281***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
日韩不卡一区二区视频| 日本一区不卡在线观看| 精品人妻一区二区三区免费| 午夜精品一区二区三区国产| 国产av精品高清一区二区三区| 超碰在线免费公开中国黄片| 九九热精品视频免费在线播放| 99久久精品国产日本| 久久99这里只精品热在线| 中文字幕日韩无套内射| 女生更色还是男生更色| 色哟哟在线免费一区二区三区| 五月天六月激情联盟网| 日韩黄片大全免费在线看| 色偷偷偷拍视频在线观看| 好吊妞在线免费观看视频| 国产精品欧美在线观看| 老熟妇乱视频一区二区| 亚洲精品一区二区三区日韩| 国产a天堂一区二区专区| 成年女人下边潮喷毛片免费| 在线观看视频国产你懂的| 麻豆印象传媒在线观看| 亚洲精品深夜福利视频| 国产亚洲欧美自拍中文自拍| 亚洲淫片一区二区三区| 亚洲中文字幕免费人妻| 亚洲av一区二区三区精品| 免费啪视频免费欧美亚洲| 国产亚洲欧美一区二区| 亚洲精品一区二区三区免| 中国黄色色片色哟哟哟哟哟哟| 午夜亚洲少妇福利诱惑| 欧美一级内射一色桃子| 色婷婷久久五月中文字幕| 欧美一区二区三区喷汁尤物| 国产精品激情在线观看| 太香蕉久久国产精品视频| 中文字幕日韩一区二区不卡| 国产欧美性成人精品午夜| 国产伦精品一区二区三区高清版|