當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

海量數(shù)據(jù)分布式存儲(chǔ)技術(shù)的研究與應(yīng)用

發(fā)布時(shí)間：2019-01-02 08:44

【摘要】：近年來(lái),隨著信息技術(shù)的蓬勃發(fā)展,互聯(lián)網(wǎng)上業(yè)務(wù)不斷地?cái)U(kuò)張,用戶不斷地增加,存儲(chǔ)空間不斷地增大,數(shù)據(jù)呈現(xiàn)出無(wú)法想象的增長(zhǎng)趨勢(shì)。然而存儲(chǔ)容量往往同存儲(chǔ)性能總成反比,傳統(tǒng)數(shù)據(jù)庫(kù)在應(yīng)付海量數(shù)據(jù)時(shí)顯得十分吃力,暴露出并發(fā)性低、擴(kuò)展性差、效率低下等問(wèn)題。因此,海量數(shù)據(jù)存儲(chǔ)成為重點(diǎn)研究對(duì)象,基于MPP(Massive Parallel Processing)架構(gòu)的并行處理分布式數(shù)據(jù)庫(kù)就是其中的一個(gè)研究方向。本文對(duì)海量數(shù)據(jù)存儲(chǔ)技術(shù)做了探索性的研究,選題自“十一五"國(guó)家科技重點(diǎn)支撐項(xiàng)目——安全可信的電信級(jí)生殖健康服務(wù)運(yùn)營(yíng)支撐體系關(guān)鍵技術(shù)研究,主要解決項(xiàng)目中數(shù)據(jù)量不斷擴(kuò)大帶來(lái)的存取性能問(wèn)題,為項(xiàng)目提供高并發(fā)性、高可用性、高擴(kuò)展性的存儲(chǔ)技術(shù)支持。本文的所做的研究工作主要包括以下幾個(gè)方面：1、基于海量數(shù)據(jù)存儲(chǔ)技術(shù)、關(guān)系型數(shù)據(jù)與NoSQL數(shù)據(jù)模型、分布式數(shù)據(jù)庫(kù)存儲(chǔ)和基于MPP架構(gòu)的并行處理模式的理論,總結(jié)了海量數(shù)據(jù)存儲(chǔ)的方案和應(yīng)用到的新技術(shù)。2、分析了海量數(shù)據(jù)存儲(chǔ)技術(shù)特點(diǎn)、比較了國(guó)內(nèi)外常用的分布式海量數(shù)據(jù)存儲(chǔ)技術(shù)的優(yōu)缺點(diǎn),設(shè)計(jì)了海量數(shù)據(jù)的分布存儲(chǔ)模型,并詳細(xì)闡述了SQL解析模塊、數(shù)據(jù)切分模塊、并行查詢模塊以及結(jié)果模塊的實(shí)現(xiàn)方法。3、在海量數(shù)據(jù)存儲(chǔ)模型設(shè)計(jì)和數(shù)據(jù)并行查詢存儲(chǔ)技術(shù)的基礎(chǔ)上,自主研發(fā)了基于MPP架構(gòu)的存儲(chǔ)架構(gòu)‘'DB Mapping"系統(tǒng),實(shí)現(xiàn)了具有良好的擴(kuò)展性和大規(guī)模并行處理的優(yōu)勢(shì)的海量數(shù)據(jù)存儲(chǔ)解決方案。論文主要貢獻(xiàn)是,提出了一種基于MPP架構(gòu)的并行處理的海量數(shù)據(jù)存儲(chǔ)方法,提出了從客戶端發(fā)起請(qǐng)求到數(shù)據(jù)持久化的全程的數(shù)據(jù)存儲(chǔ)方式,并融合了Map/Reduce的思想,將工作分發(fā)到各個(gè)數(shù)據(jù)節(jié)點(diǎn),實(shí)現(xiàn)了數(shù)據(jù)的高可擴(kuò)展性、高可用性、高并發(fā)性。并通過(guò)搭建分布式數(shù)據(jù)節(jié)點(diǎn)進(jìn)行仿真測(cè)試,驗(yàn)證了該海量數(shù)據(jù)存儲(chǔ)方式的可行性。
[Abstract]:In recent years, with the rapid development of information technology, business on the Internet continues to expand, users continue to increase, storage space continues to increase, data shows an unimaginable growth trend. However, the storage capacity is often inversely proportional to the storage performance. The traditional database is very difficult to deal with the massive data, which exposes the problems of low concurrency, poor expansibility, low efficiency and so on. Therefore, mass data storage has become an important research object, and parallel processing distributed database based on MPP (Massive Parallel Processing) architecture is one of the research directions. This paper has done the exploratory research on the massive data storage technology, selected topics from the "11th Five-Year Plan" national key science and technology support project-safe and credible telecom grade reproductive health service operation support system key technology research. It mainly solves the problem of access performance caused by the increasing amount of data in the project, and provides high concurrency, high availability and high scalability storage technology support for the project. The research work of this paper mainly includes the following aspects: 1. Based on the massive data storage technology, the theory of relational data and NoSQL data model, distributed database storage and parallel processing mode based on MPP architecture. This paper summarizes the scheme and new technology of mass data storage. 2, analyzes the characteristics of mass data storage technology, compares the advantages and disadvantages of distributed mass data storage technology used at home and abroad, and designs a distributed storage model of mass data. The implementation methods of SQL parse module, data segmentation module, parallel query module and result module are described in detail. 3. Based on the design of massive data storage model and the technology of data parallel query storage. A storage architecture'DB Mapping 'system based on MPP architecture is developed in this paper. The solution of mass data storage with good scalability and large scale parallel processing is realized. The main contributions of this paper are as follows: a parallel data storage method based on MPP architecture is proposed, a data storage method from client initiation request to data persistence is proposed, and the idea of Map/Reduce is integrated. The work is distributed to each data node to achieve high scalability, high availability and high concurrency. The feasibility of the massive data storage method is verified by building distributed data nodes for simulation test.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP333;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 汪劍;郭朗;;分布式遠(yuǎn)程教育數(shù)據(jù)庫(kù)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];成都大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年04期

2 王亞剛;楊康平;;大規(guī)模并行處理技術(shù)應(yīng)用綜述[J];電腦知識(shí)與技術(shù);2009年12期

3 姜宇鳴;;海量數(shù)據(jù)存儲(chǔ)系統(tǒng)研究[J];電腦知識(shí)與技術(shù);2011年08期

4 黨鵬飛;;淺談分布式數(shù)據(jù)庫(kù)在電視臺(tái)管理信息系統(tǒng)中的應(yīng)用[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2012年14期

5 劉云生,覃飆;分布式實(shí)時(shí)事務(wù)提交協(xié)議[J];計(jì)算機(jī)研究與發(fā)展;2002年07期

6 彭宏;杜楠;;基于并行數(shù)據(jù)庫(kù)的海量商務(wù)數(shù)據(jù)管理系統(tǒng)研究[J];計(jì)算機(jī)應(yīng)用研究;2009年02期

7 覃雄派;王會(huì)舉;杜小勇;王珊;;大數(shù)據(jù)分析——RDBMS與MapReduce的競(jìng)爭(zhēng)與共生[J];軟件學(xué)報(bào);2012年01期

8 李文虎;;分布式數(shù)據(jù)庫(kù)系統(tǒng)的設(shè)計(jì)淺析[J];科技資訊;2009年34期

相關(guān)碩士學(xué)位論文前1條

1 周敏;Anthill：一種基于MapReduce的分布式DBMS[D];暨南大學(xué);2010年

，

本文編號(hào)：2398278

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2398278.html

上一篇：基于FPGA的TFT觸摸屏顯示與控制系統(tǒng)的設(shè)計(jì)
下一篇：基于異步語(yǔ)言Balsa的異步微處理器設(shè)計(jì)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

海量數(shù)據(jù)分布式存儲(chǔ)技術(shù)的研究與應(yīng)用