天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于內(nèi)存的MapReduce系統(tǒng)效率優(yōu)化機(jī)制研究

發(fā)布時(shí)間:2018-05-30 07:50

  本文選題:MapReduce + 內(nèi)存計(jì)算。 參考:《華中科技大學(xué)》2016年碩士論文


【摘要】:大數(shù)據(jù)時(shí)代下數(shù)據(jù)的處理與分析已成為一個(gè)十分重要的環(huán)節(jié)。為了滿足數(shù)據(jù)處理高時(shí)效的需求,基于內(nèi)存計(jì)算的大數(shù)據(jù)處理系統(tǒng)成為了新的研究熱點(diǎn),F(xiàn)有高性能計(jì)算集群由于內(nèi)存配置相對CPU配置明顯不足,當(dāng)運(yùn)行在上面的MapReduce系統(tǒng)用來處理數(shù)據(jù)密集性應(yīng)用,容易導(dǎo)致不必要的數(shù)據(jù)溢出到磁盤的I/O操作,內(nèi)存效率急需優(yōu)化。當(dāng)處理大規(guī)模的數(shù)據(jù)集時(shí),分區(qū)數(shù)量過多,基于哈希的Shuffle機(jī)制會導(dǎo)致過多的文件操作和內(nèi)存的不合理使用。但當(dāng)分區(qū)塊過大,任務(wù)消耗的內(nèi)存量變大,容易導(dǎo)致CPU與內(nèi)存出現(xiàn)協(xié)調(diào)不一致的性能瓶頸問題。同時(shí)每個(gè)工作節(jié)點(diǎn)處理的中間數(shù)據(jù)量分配不合理,容易導(dǎo)致負(fù)載不均衡,影響系統(tǒng)性能。適用于大數(shù)據(jù)處理的內(nèi)存效率優(yōu)化系統(tǒng)針對MapReduce系統(tǒng)在高性能計(jì)算集群中出現(xiàn)的問題,結(jié)合內(nèi)存計(jì)算的特性,提出并實(shí)現(xiàn)了內(nèi)存資源高效使用的優(yōu)化方案,用于構(gòu)建快速、高效的大數(shù)據(jù)處理平臺。首先,優(yōu)化系統(tǒng)設(shè)計(jì)了一種對象復(fù)用的Shuffle機(jī)制,通過復(fù)用文件寫句柄及其附屬對象有效解決了分區(qū)數(shù)量過多時(shí)內(nèi)存申請速度過快的問題,確保內(nèi)存的平穩(wěn)使用;其次,優(yōu)化系統(tǒng)建立了一種基于反饋-采樣-決策的任務(wù)分發(fā)機(jī)制,有效協(xié)調(diào)了分區(qū)塊過大時(shí)CPU與內(nèi)存的使用關(guān)系,極大地減少了中間數(shù)據(jù)溢出到磁盤的I/O開銷;最后,優(yōu)化系統(tǒng)實(shí)現(xiàn)了一種內(nèi)嵌負(fù)載均衡器的任務(wù)調(diào)度機(jī)制,確保每個(gè)工作節(jié)點(diǎn)處理的中間數(shù)據(jù)量幾乎一致,并且最大化地減少網(wǎng)絡(luò)傳輸數(shù)據(jù)量。優(yōu)化系統(tǒng)提出的內(nèi)存效率優(yōu)化方案集成在Spark系統(tǒng)上,實(shí)現(xiàn)了對用戶的透明,可以完全兼容已有的Spark應(yīng)用程序。通過典型案例測試,實(shí)驗(yàn)結(jié)果表明,改進(jìn)后的Spark系統(tǒng)相比原生系統(tǒng),在處理大規(guī)模數(shù)據(jù)集時(shí),內(nèi)存使用效率得到提高,磁盤I/O大量減少,在總的執(zhí)行時(shí)間上有著1.25倍到3.18倍的性能提升。
[Abstract]:Data processing and analysis in big data era has become a very important link. In order to meet the demand of high aging data processing, big data processing system based on memory computing has become a new research hotspot. Because the memory configuration of the existing high performance computing cluster is obviously insufficient compared with the CPU configuration, when the MapReduce system running on it is used to deal with data-intensive applications, it is easy to cause unnecessary data overflow to disk I / O operation, and the memory efficiency needs to be optimized urgently. When dealing with large-scale data sets, there are too many partitions, and the hash based Shuffle mechanism will lead to excessive file manipulation and improper use of memory. However, when the sub-block is too large, the amount of memory consumed by the task becomes larger, which easily leads to the performance bottleneck problem of inconsistent coordination between CPU and memory. At the same time, the allocation of the middle data is unreasonable, which easily leads to the imbalance of the load and affects the performance of the system. The memory efficiency optimization system suitable for big data processing, aiming at the problems of MapReduce system in high performance computing cluster, combining the characteristics of memory computing, proposes and realizes the optimization scheme of efficient use of memory resources, which is used to build rapidly. Efficient big data processing platform. Firstly, an Shuffle mechanism of object reuse is designed for optimizing the system. By reusing the file write handle and its subordinate objects, the problem of excessive request speed of memory when the number of partitions is excessive is effectively solved, and the smooth use of memory is ensured. The optimized system establishes a task distribution mechanism based on feedback, sampling and decision, which effectively coordinates the relationship between CPU and memory when the sub-block is too large, and greatly reduces the I / O overhead of the intermediate data overflow to disk. The optimization system implements a kind of task scheduling mechanism with embedded load balancer, which ensures that the intermediate data amount is almost the same per working node, and maximizes the amount of network transmission data. The memory efficiency optimization scheme proposed by the optimization system is integrated on the Spark system, which is transparent to the users and compatible with the existing Spark applications. The experimental results show that compared with the native system, the improved Spark system can improve the memory efficiency and reduce the I / O of the disk. Performance increases of 1.25 to 3.18 times in total execution time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 李建江;崔健;王聃;嚴(yán)林;黃義雙;;MapReduce并行編程模型研究綜述[J];電子學(xué)報(bào);2011年11期

,

本文編號:1954516

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1954516.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4b35b***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
国产亚洲欧美自拍中文自拍| 国产日韩综合一区在线观看| 亚洲专区一区中文字幕| 国内自拍偷拍福利视频| 欧美一区日韩二区亚洲三区| 国产伦精品一区二区三区精品视频| 日本理论片午夜在线观看| 一区二区三区日本高清| 免费在线播放不卡视频| 国产精品二区三区免费播放心| 在线视频三区日本精品| 欧美黄色成人真人视频| av中文字幕一区二区三区在线| 五月婷婷六月丁香狠狠| 最新69国产精品视频| 国产精品一区日韩欧美| 色哟哟哟在线观看视频| 欧美日韩国产综合在线| 国产精品午夜福利免费阅读 | 日本不卡片一区二区三区| 黄色av尤物白丝在线播放网址 | 冬爱琴音一区二区中文字幕| 精品久久少妇激情视频| 日韩和欧美的一区二区三区| 黄色国产精品一区二区三区| 国产精品亚洲二区三区| 都市激情小说在线一区二区三区| 久久国产精品热爱视频| 亚洲美女国产精品久久| 日韩中文字幕人妻精品| 久久99一本色道亚洲精品| 欧美日韩久久精品一区二区 | 精品亚洲香蕉久久综合网| 欧美日本精品视频在线观看| 欧美日韩一区二区综合| 精品国产亚洲区久久露脸| 在线日韩欧美国产自拍| 老司机精品视频在线免费| 中文字幕精品少妇人妻| 亚洲精品中文字幕一二三| 亚洲午夜av久久久精品|