天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于HBase的數(shù)據(jù)壓縮技術(shù)研究

發(fā)布時(shí)間:2018-07-24 08:30
【摘要】:隨著大數(shù)據(jù)技術(shù)的發(fā)展以及Hadoop等大數(shù)據(jù)平臺的迅速普及與推廣,生活中產(chǎn)生的數(shù)據(jù)量呈現(xiàn)爆炸性增長的趨勢,數(shù)據(jù)種類呈現(xiàn)復(fù)雜化,存儲(chǔ)方式呈現(xiàn)多樣化。傳統(tǒng)的基于行存儲(chǔ)的大數(shù)據(jù)存儲(chǔ)方式并不能夠以較低的成本將大數(shù)據(jù)存儲(chǔ)起來。與此同時(shí),由于數(shù)據(jù)的訪問頻度的不同,對于不同訪問級別的數(shù)據(jù)所采用的存儲(chǔ)方式提出了新的要求。針對以上情況,結(jié)合大數(shù)據(jù)平臺下的HBase數(shù)據(jù)庫,本文對大規(guī)模數(shù)據(jù)環(huán)境下基于HBase的壓縮存儲(chǔ)技術(shù)進(jìn)行了研究,主要的創(chuàng)新點(diǎn)如下:首先,提出一種基于訪問頻度的數(shù)據(jù)分類方法:根據(jù)一段時(shí)間內(nèi)數(shù)據(jù)庫文件的訪問次數(shù)得到相應(yīng)的訪問頻度,依據(jù)各數(shù)據(jù)文件的訪問頻度及相關(guān)閾值將數(shù)據(jù)文件劃分為冷熱數(shù)據(jù)并確定具體的訪問級別。在此基礎(chǔ)之上,提出基于數(shù)據(jù)訪問級別的壓縮策略選擇方法:定義了確定數(shù)據(jù)樣本的抽樣方法,針對原有的壓縮策略選擇方法中先驗(yàn)知識未必可靠的缺陷,通過添加評估層及時(shí)調(diào)整先驗(yàn)知識,并在基于相鄰參照區(qū)和基于統(tǒng)計(jì)列選擇方法的基礎(chǔ)上設(shè)計(jì)出HBase數(shù)據(jù)壓縮策略選擇方法,優(yōu)化存儲(chǔ)成本。仿真實(shí)驗(yàn)與結(jié)果表明,本文提出的方法不僅能夠有效實(shí)現(xiàn)大數(shù)據(jù)的存儲(chǔ),同時(shí)還提高了數(shù)據(jù)的訪問性能。其次,從數(shù)據(jù)遷移的角度,提出一種基于文件價(jià)值的數(shù)據(jù)遷移方法。首先,根據(jù)數(shù)據(jù)訪問頻度等因素計(jì)算出數(shù)據(jù)塊文件的價(jià)值,由這個(gè)文件價(jià)值得到數(shù)據(jù)遷移的目的設(shè)備。同時(shí)改進(jìn)了數(shù)據(jù)遷移技術(shù),利用數(shù)據(jù)緩沖區(qū)和雙緩沖隊(duì)列解決了數(shù)據(jù)遷入遷出速率不匹配的問題,提高了數(shù)據(jù)遷移效率,節(jié)省了內(nèi)存和時(shí)間消耗,最終實(shí)現(xiàn)了對大數(shù)據(jù)平臺數(shù)據(jù)的存儲(chǔ)優(yōu)化。最后,基于以上的方法與理論,本文構(gòu)建了基于數(shù)據(jù)壓縮存儲(chǔ)的原型系統(tǒng)并給出一個(gè)電子商務(wù)應(yīng)用示范。系統(tǒng)的實(shí)現(xiàn)遵循需求分析、概要設(shè)計(jì)、詳細(xì)設(shè)計(jì)及其實(shí)現(xiàn)等流程,完成壓縮存儲(chǔ)管理、數(shù)據(jù)遷移等功能模塊,驗(yàn)證了本文提出算法的可行性,展現(xiàn)了基于HBase的壓縮技術(shù)理論成果在動(dòng)態(tài)場景下的應(yīng)用效果。
[Abstract]:With the development of big data technology and the rapid popularization and popularization of big data platform such as Hadoop, the amount of data produced in life is increasing explosively, the data types are complicated, and the storage methods are diversified. Traditional big data storage based on row storage can not store big data at lower cost. At the same time, due to the different frequency of data access, new requirements for the storage of data at different access levels are put forward. In view of the above situation, combined with the HBase database under the big data platform, this paper studies the compressed storage technology based on HBase in the large-scale data environment. The main innovations are as follows: first, A data classification method based on access frequency is proposed. According to the number of visits to database files within a certain period of time, the corresponding access frequency is obtained. According to the access frequency and relevant threshold of each data file, the data file is divided into hot and cold data and the specific access level is determined. On this basis, a compression strategy selection method based on data access level is proposed: a sampling method for determining data samples is defined, and a prior knowledge may not be reliable in the original compression strategy selection method. By adding the evaluation layer to adjust the prior knowledge in time, and based on the adjacent reference area and the statistical column selection method, the HBase data compression strategy selection method is designed to optimize the storage cost. Simulation experiments and results show that the proposed method can not only effectively realize the storage of big data, but also improve the performance of data access. Secondly, from the point of view of data migration, a data migration method based on file value is proposed. Firstly, the value of the data block file is calculated according to the data access frequency and other factors, and the target equipment of data migration is obtained from the value of the file. At the same time, the technology of data migration is improved, the data buffer and double buffer queue are used to solve the problem of the mismatch of the data immigration rate, the efficiency of data migration is improved, and the memory and time consumption are saved. Finally, the storage optimization of big data platform data is realized. Finally, based on the above methods and theories, this paper constructs a prototype system based on data compression storage and gives a demonstration of e-commerce application. The realization of the system follows the flow of requirement analysis, outline design, detailed design and its implementation, and completes the compression storage management, data migration and other functional modules, which verifies the feasibility of the algorithm proposed in this paper. The application effect of compression theory based on HBase in dynamic scene is presented.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文 前2條

1 Sabin Lupan;;電力轉(zhuǎn)換與管理的數(shù)字控制[J];電子設(shè)計(jì)技術(shù);2008年10期

2 ;[J];;年期

相關(guān)碩士學(xué)位論文 前1條

1 伏彩航;基于HBase的數(shù)據(jù)壓縮技術(shù)研究[D];南京郵電大學(xué);2016年

,

本文編號:2140810

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2140810.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶61b16***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
国产原创激情一区二区三区| 东京热电东京热一区二区三区| 国产91麻豆精品成人区| 久久精品偷拍视频观看| 国产精品一区二区不卡中文| 日本人妻丰满熟妇久久| 欧美乱视频一区二区三区| 亚洲男人的天堂就去爱| 少妇人妻一级片一区二区三区 | 黑人巨大精品欧美一区二区区| 少妇视频一区二区三区| 亚洲最大福利在线观看| 亚洲视频在线观看你懂的| 欧美日韩国产欧美日韩| 亚洲国产成人精品福利| 一二区不卡不卡在线观看| 国产精品一区二区三区欧美| 午夜福利黄片免费观看| 国产一区二区熟女精品免费| 亚洲天堂久久精品成人| 日韩精品免费一区三区| 国产精品午夜福利免费阅读| 91欧美日韩国产在线观看| 邻居人妻人公侵犯人妻视频| 人妻精品一区二区三区视频免精| 欧美不卡高清一区二区三区| 亚洲国产性生活高潮免费视频| 精品人妻一区二区四区| 后入美臀少妇一区二区| 在线播放欧美精品一区| 伊人色综合久久伊人婷婷| 欧美日韩国产成人高潮| 国产精品视频一级香蕉| 91午夜少妇极品福利| 手机在线观看亚洲中文字幕| 国产成人精品一区在线观看| 尹人大香蕉一级片免费看| 日韩亚洲激情在线观看| av在线免费播放一区二区| 好吊日视频这里都是精品| 日韩美女偷拍视频久久|