天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 計算機論文 >

針對HBase的MapReduce數(shù)據(jù)訪問方式的優(yōu)化

發(fā)布時間:2018-11-01 16:28
【摘要】:隨著信息技術(shù)的飛速發(fā)展,互聯(lián)網(wǎng)上的數(shù)據(jù)量快速增長,數(shù)據(jù)種類也多種多樣,世界已經(jīng)轉(zhuǎn)移到以數(shù)據(jù)為中心的范式上——“大數(shù)據(jù)”時代。傳統(tǒng)的數(shù)據(jù)處理技術(shù)主要采用數(shù)據(jù)庫管理模式,在面對大數(shù)據(jù)時存在存儲空間不易擴展和查詢效率低下的問題,越來越無法滿足人們高效處理數(shù)據(jù)的要求。越來越多企業(yè)把目光投向開源的Hadoop云平臺,使用HBase來存儲和管理數(shù)據(jù)。HBase中數(shù)據(jù)讀取可以使用MapReduce框架來完成并行化,從而在處理速度上比傳統(tǒng)的數(shù)據(jù)庫管理方式有了較大提高,然而在此框架下HBase數(shù)據(jù)讀取的速度仍然無法趕上數(shù)據(jù)處理的速度,問題主要在于HBase的MapReduce數(shù)據(jù)訪問方式無法完全保證數(shù)據(jù)的本地性。 本文首先介紹大數(shù)據(jù)的相關(guān)知識,包括大數(shù)據(jù)存儲技術(shù)和大數(shù)據(jù)處理技術(shù),概述了云計算的分類、特點和主要平臺,著重研究了當前應用最廣泛的Hadoop云平臺的三種關(guān)鍵技術(shù),HDFS、MapReduce和HBase。從而為分析和改進HBase的MapReduce過程提供了理論依據(jù)。 然后通過深入分析HBase中MapReduce框架的任務分配流程、數(shù)據(jù)分片過程和數(shù)據(jù)讀取接口(Scan)的工作流程,找到了HBase進行MapRedcue計算的瓶頸:1)任務無法做到本地;2) Region中數(shù)據(jù)讀取是串行的;3)數(shù)據(jù)需要進行一次合并組成一條記錄。針對上述問題,本文提出了一種改進方法,該方法不以原來的邏輯存儲單元Region作為任務分配的基本單位,而是以HBase的物理存儲單元Block作為任務分配的基本單位;重新設(shè)計了數(shù)據(jù)分片讀取方法;采用華中杰提出了基于本地任務優(yōu)先的MapReduce的調(diào)度策略。 最后通過對比實驗證明:改進后的接口取消Scan接口的額外處理工作,加強了數(shù)據(jù)的本地性,使得訪問數(shù)據(jù)所花費的時間減少為原來接口的1/10,很好的節(jié)省了工作時間,,從而有效的提高了工作效率。
[Abstract]:With the rapid development of information technology, the amount of data on the Internet is growing rapidly, and the types of data are also varied. The world has been transferred to the data-centered paradigm "big data" era. The traditional data processing technology mainly adopts the database management mode. Facing big data, the storage space is not easy to expand and the query efficiency is low, which is more and more unable to meet the demand of people to deal with the data efficiently. More and more enterprises are looking to the open source Hadoop cloud platform, using HBase to store and manage data. Data reading in HBase can be parallelized by MapReduce framework, so the processing speed is much higher than that of traditional database management. However, under this framework, the speed of HBase data reading is still unable to catch up with the speed of data processing. The main problem lies in the fact that the MapReduce data access mode of HBase can not completely guarantee the nativeness of the data. This paper first introduces big data's relevant knowledge, including big data storage technology and big data processing technology, summarizes the classification, characteristics and main platforms of cloud computing, and focuses on three key technologies of the most widely used Hadoop cloud platform. HDFS,MapReduce and HBase. It provides a theoretical basis for analyzing and improving the MapReduce process of HBase. Then through deeply analyzing the flow of task allocation of MapReduce framework in HBase, the process of data fragmentation and the workflow of data reading interface (Scan), the bottleneck of MapRedcue calculation of HBase is found: 1) the task can not be done locally; 2) data reading in Region is serial; 3) data needs to be merged to form a record at a time. In order to solve the above problems, an improved method is proposed, in which the original logical storage unit (Region) is not taken as the basic unit of task allocation, but the physical storage unit (Block) of HBase is taken as the basic unit of task allocation. This paper redesigns the method of data partitioning and proposes a scheduling strategy based on local task first MapReduce using Huazhong Jie. Finally, through the contrast experiment, it is proved that the improved interface cancels the extra processing work of the Scan interface, strengthens the local data, reduces the time spent on accessing the data to 1 / 10 of the original interface, and saves the working time very well. Thus, the work efficiency is improved effectively.
【學位授予單位】:國防科學技術(shù)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP311.13;TP333

【參考文獻】

相關(guān)期刊論文 前8條

1 孫健;賈曉菁;;Google云計算平臺的技術(shù)架構(gòu)及對其成本的影響研究[J];電信科學;2010年01期

2 朱頌;;分布式文件系統(tǒng)HDFS的分析[J];福建電腦;2012年04期

3 劉琦琳;;IBM云計算:從理想到實踐[J];互聯(lián)網(wǎng)周刊;2009年11期

4 侯建;帥仁俊;侯文;;基于云計算的海量數(shù)據(jù)存儲模型[J];通信技術(shù);2011年05期

5 趙華茗;;搭建基于云計算的開源海量數(shù)據(jù)挖掘平臺[J];現(xiàn)代圖書情報技術(shù);2010年10期

6 王勇;;Google VS微軟:云端對決[J];中國企業(yè)家;2008年22期

7 牛莉麗;;云計算環(huán)境下的圖書館服務[J];醫(yī)學信息學雜志;2012年07期

8 郝樹魁;;Hadoop HDFS和MapReduce架構(gòu)淺析[J];郵電設(shè)計技術(shù);2012年07期



本文編號:2304468

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2304468.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2a022***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
成人区人妻精品一区二区三区| 日韩18一区二区三区| 欧美成人黄色一区二区三区| 91精品国产品国语在线不卡| 国产精品丝袜美腿一区二区| 亚洲天堂精品在线视频| 国产乱淫av一区二区三区| 中国美女偷拍福利视频| 国产精品成人一区二区三区夜夜夜| 91亚洲人人在字幕国产| 国产精品成人一区二区在线| 国产日韩欧美国产欧美日韩| 不卡在线播放一区二区三区| 中文字幕一区二区久久综合| 女生更色还是男生更色 | 黄色国产精品一区二区三区| 婷婷亚洲综合五月天麻豆| 翘臀少妇成人一区二区| 精品综合欧美一区二区三区| 视频一区二区黄色线观看| 中文字幕精品一区二区年下载| 欧美三级大黄片免费看| 正在播放玩弄漂亮少妇高潮| 国产熟女高清一区二区| 亚洲精品成人综合色在线| 欧美日韩精品一区二区三区不卡| 欧美午夜国产在线观看| 中文字幕欧美视频二区| 中文字幕一区二区熟女| 国产内射在线激情一区| 欧美精品久久99九九| 九九热九九热九九热九九热| 欧美人妻免费一区二区三区| 91久久精品中文内射| 免费在线观看激情小视频| 色综合久久六月婷婷中文字幕 | 久久精品国产亚洲av麻豆| 天堂av一区一区一区| 国产成人精品一区在线观看| 色婷婷日本视频在线观看| 国产亚洲欧美日韩精品一区|