集群計(jì)算引擎Spark中的內(nèi)存優(yōu)化研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-12-16 15:46

【摘要】：在迭代之間使用內(nèi)存做數(shù)據(jù)傳輸?shù)牟⑿杏?jì)算框架是當(dāng)前的一個(gè)研究熱點(diǎn)。與傳統(tǒng)的基于硬盤和網(wǎng)絡(luò)的計(jì)算方式相比,使用內(nèi)存可以減少數(shù)據(jù)傳輸?shù)臅r(shí)間。對(duì)于數(shù)據(jù)密集類型的任務(wù)，可以將運(yùn)行時(shí)間提升十幾倍。在新一代框架快速發(fā)展的同時(shí)，如何充分利用相對(duì)仍然緊缺的內(nèi)存資源，保證任務(wù)的運(yùn)行效率，成為一個(gè)亟待解決的問題。本文基于集群計(jì)算引擎Spark，研究了并行計(jì)算集群對(duì)于內(nèi)存的使用行為。通過對(duì)內(nèi)存行為進(jìn)行建模與分析,對(duì)內(nèi)存的使用進(jìn)行了決策自動(dòng)化以及替換策略優(yōu)化。提高了任務(wù)在資源有限情況下的運(yùn)行效率，以及在不同集群環(huán)境下任務(wù)效率的穩(wěn)定性。本文的貢獻(xiàn)主要有：通過對(duì)代碼的語義進(jìn)行分析，實(shí)現(xiàn)了內(nèi)存策略的自動(dòng)化。即調(diào)度器可以自動(dòng)識(shí)別出價(jià)值的數(shù)據(jù)集（RDD）放入緩存，，避免緩存存污染的同時(shí)，也減輕了程序員的編程負(fù)擔(dān)。在對(duì)代碼語義分析，獲得任務(wù)詳細(xì)信息的基礎(chǔ)上，對(duì)內(nèi)存使用的替換策略進(jìn)行了優(yōu)化。主要包括RDD大小和權(quán)重的計(jì)算，操作順序的優(yōu)化重排，使用寄存器分配模型加權(quán)重信息形成新的替換算法，代替原有的LRU算法以及多級(jí)緩存模型的智能化。最后對(duì)內(nèi)存在異構(gòu)集群群上的行為也進(jìn)行了初步的分析。最后通過不同的實(shí)驗(yàn)，驗(yàn)證了優(yōu)化后的方案可以提高任務(wù)對(duì)不同集群環(huán)境的適應(yīng)性，并且在在內(nèi)存資源相對(duì)有限的情況下使任務(wù)運(yùn)行效率更高，使系統(tǒng)的實(shí)用性整體增強(qiáng)，對(duì)于其他并行系統(tǒng)中的內(nèi)存使用也有實(shí)際的參考價(jià)值。
[Abstract]:A parallel computing framework using memory for data transfer between iterations is a hot topic. Compared with the traditional hard disk and network based computing, the use of memory can reduce the time of data transmission. For data-intensive types of tasks, you can increase the running time more than ten times. With the rapid development of the new generation framework, how to make full use of the relatively scarce memory resources and ensure the operational efficiency of the task has become a problem to be solved urgently. This paper studies the memory usage behavior of parallel computing clusters based on cluster computing engine Spark,. Through modeling and analysis of memory behavior, the decision automation and substitution strategy optimization of memory usage are carried out. The efficiency of task is improved under the condition of limited resources and the stability of task efficiency in different cluster environment. The main contributions of this paper are as follows: by analyzing the semantics of the code, the memory strategy is automated. That is, the scheduler can automatically recognize the value of the data set (RDD) into the cache, to avoid cache pollution, but also reduce the programmer's programming burden. On the basis of code semantic analysis and task details, the memory replacement strategy is optimized. It mainly includes the calculation of RDD size and weight, the optimal rearrangement of operation sequence, the use of register allocation model and weight information to form a new replacement algorithm, which replaces the original LRU algorithm and the intelligence of multi-level buffer model. Finally, the behavior of heterogeneous cluster is also analyzed. Finally, through different experiments, it is proved that the optimized scheme can improve the adaptability of the task to different cluster environments, and make the task run more efficiently under the condition of relatively limited memory resources, so that the practicability of the system is enhanced as a whole. It also has practical reference value for memory usage in other parallel systems.
【學(xué)位授予單位】：清華大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP333.1

【共引文獻(xiàn)】

相關(guān)期刊論文前2條

1 董新華;李瑞軒;周灣灣;王聰;薛正元;廖東杰;;Hadoop系統(tǒng)性能優(yōu)化與功能增強(qiáng)綜述[J];計(jì)算機(jī)研究與發(fā)展;2013年S2期

2 張永;尹傳曄;吳崇正;;基于MapReduce的PageRank算法優(yōu)化研究[J];計(jì)算機(jī)應(yīng)用研究;2014年02期

相關(guān)博士學(xué)位論文前2條

1 劉智;二進(jìn)制代碼級(jí)的漏洞攻擊檢測(cè)研究[D];電子科技大學(xué);2013年

2 王榮華;動(dòng)態(tài)二進(jìn)制翻譯優(yōu)化研究[D];浙江大學(xué);2013年

相關(guān)碩士學(xué)位論文前3條

1 賴海明;MapReduce作業(yè)調(diào)度算法分析與優(yōu)化研究[D];杭州電子科技大學(xué);2013年

2 羅杰;基于GCC的YHFT-Matrix編譯器關(guān)鍵技術(shù)研究與實(shí)現(xiàn)[D];國防科學(xué)技術(shù)大學(xué);2012年

3 蔣慧斐;海量日志分布式處理系統(tǒng)的研究與應(yīng)用[D];北京交通大學(xué);2014年

本文編號(hào)：2382595

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2382595.html

上一篇：熱處理工藝對(duì)3D打印PLA試件力學(xué)性能的影響
下一篇：基于嵌入式Linux的USB集成下載工具的設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

集群計(jì)算引擎Spark中的內(nèi)存優(yōu)化研究與實(shí)現(xiàn)