可擴展數(shù)據(jù)庫管理系統(tǒng)中的數(shù)據(jù)復制
本文關鍵詞:可擴展數(shù)據(jù)庫管理系統(tǒng)中的數(shù)據(jù)復制 出處:《華東師范大學》2017年碩士論文 論文類型:學位論文
更多相關文章: 分布式數(shù)據(jù)庫 基于日志結構存儲 數(shù)據(jù)復制 數(shù)據(jù)導出 日志解析
【摘要】:隨著互聯(lián)網(wǎng)的不斷發(fā)展,數(shù)據(jù)規(guī)模不斷增大,數(shù)據(jù)庫系統(tǒng)的存儲與計算的橫向擴展能力將會越來越重要。因此,分布式數(shù)據(jù)庫系統(tǒng)以其良好的擴展性受到了工業(yè)界和學術界的廣泛關注。其中,基于日志結構存儲(Log-StructuredStorage)的分布式系統(tǒng)成為了一種新的趨勢,這種讀寫分離的架構已應用于分布式數(shù)據(jù)庫系統(tǒng)中,如阿里巴巴的開源關系型數(shù)據(jù)庫管理系統(tǒng)OceanBase。數(shù)據(jù)導出是數(shù)據(jù)復制常見的技術之一,常用于企業(yè)級應用,來提高系統(tǒng)的可用性、可擴展性,以及保證數(shù)據(jù)的可靠性。在采用讀寫分離架構的分布式數(shù)據(jù)庫系統(tǒng)中,由于數(shù)據(jù)分為靜態(tài)數(shù)據(jù)和動態(tài)數(shù)據(jù),并且靜態(tài)數(shù)據(jù)存儲于不同的物理節(jié)點上,數(shù)據(jù)復制成為了一種既消耗時間,也浪費系統(tǒng)資源的一種操作。本文主要分析了在讀寫分離的分布式數(shù)據(jù)庫架構下,數(shù)據(jù)復制存在的問題,并提出了有效的解決方法。本文工作的主要貢獻如下:1.設計并實現(xiàn)了一種考慮負載均衡的靜態(tài)數(shù)據(jù)導出方法。首先,針對分布式數(shù)據(jù)庫的架構特點,直接向不同物理節(jié)點發(fā)起并發(fā)查詢請求,減少數(shù)據(jù)的網(wǎng)絡傳輸次數(shù),縮短響應時間。其次,采用生產者消費者模型加快數(shù)據(jù)寫磁盤速度并解決占用大量內存的問題。最后,根據(jù)數(shù)據(jù)多副本的特點,將查詢請求均勻的發(fā)送給各個節(jié)點,使系統(tǒng)中的各個節(jié)點負載均衡,同時也能提高整體數(shù)據(jù)導出的性能。2.設計并實現(xiàn)了一種基于日志解析的動態(tài)數(shù)據(jù)捕獲方法。一方面,實現(xiàn)日志同步和日志拉取功能,保證數(shù)據(jù)的正確性。另一方面,在日志解析過程中精簡對同一元組的頻繁操作,避免冗余操作,降低應用更新的代價。3.通過基準測試YCSB生成測試數(shù)據(jù)集并設計多組實驗,驗證了本文提出的數(shù)據(jù)導出方法的可行性與高效性。并在開源數(shù)據(jù)庫CEDAR上實現(xiàn)了本文提出的數(shù)據(jù)導出方法。實驗結果展示了本文提出的數(shù)據(jù)導出方法能有效的降低響應時間,減少系統(tǒng)資源占用。本文提出的數(shù)據(jù)復制方法在CEDAR中的測試結果表明,該方法極大地提升了數(shù)據(jù)導出的效率。同時,本文提出的方法對同類型的可擴展數(shù)據(jù)庫管理系統(tǒng)的數(shù)據(jù)復制有借鑒意義,也為可擴展數(shù)據(jù)庫管理系統(tǒng)后續(xù)的數(shù)據(jù)復制技術提供了參考。
[Abstract]:With the continuous development of the Internet, the scale of data is increasing, and the lateral expansion of the storage and calculation of the database system will become more and more important. As a result, the distributed database system has attracted wide attention from industry and academia for its good scalability. Among them, the distributed system based on log structure storage (Log-StructuredStorage) has become a new trend. The architecture of reading and writing separation has been applied to distributed database systems, such as Alibaba's open source relational database management system OceanBase. Data export is one of the common technologies of data replication. It is commonly used in enterprise applications to improve the availability and scalability of the system, and to ensure data reliability. In distributed database system with read / write separation architecture, data is divided into static data and dynamic data, and static data are stored on different physical nodes. Data replication has become an operation which consumes time and wastes system resources. This paper mainly analyzes the problems of data replication in the distributed database architecture which is separated by read and write, and puts forward an effective solution. The main contributions of this work are as follows: 1. the design and implementation of a static data export method considering load balancing is designed and implemented. First, aiming at the architecture characteristics of distributed database, it directly initiates concurrent query requests to different physical nodes, reducing the number of network transmission and shortening the response time. Secondly, the producer consumer model is used to speed up the data write disk speed and to solve the problem of large amount of memory. Finally, according to the characteristics of multiple replicates, the query requests are sent to all nodes evenly, so that the load of each node in the system is balanced, and the overall data export performance is also improved. 2. design and implement a dynamic data capture method based on log parsing. On the one hand, log synchronization and log pull are implemented to ensure the correctness of the data. On the other hand, the frequent operation of the same tuple is streamlined in the log parsing process to avoid redundant operations and reduce the cost of application updates. 3. the test data set is generated by the benchmark YCSB and a number of experiments are designed to verify the feasibility and efficiency of the data export method proposed in this paper. The data export method proposed in this paper is implemented on the open source database CEDAR. The experimental results show that the proposed data export method can effectively reduce the response time and reduce the system resource occupancy. The results of the data replication method presented in this paper in CEDAR show that this method greatly improves the efficiency of the data export. At the same time, the method proposed in this paper has reference significance for data replication of the same type of extensible database management system, and also provides a reference for the subsequent data replication technology of the extensible database management system.
【學位授予單位】:華東師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前8條
1 陽振坤;;OceanBase關系數(shù)據(jù)庫架構[J];華東師范大學學報(自然科學版);2014年05期
2 李國杰;程學旗;;大數(shù)據(jù)研究:未來科技及經(jīng)濟社會發(fā)展的重大戰(zhàn)略領域——大數(shù)據(jù)的研究現(xiàn)狀與科學思考[J];中國科學院院刊;2012年06期
3 羅軍;鄧文博;;懶散復制DDBS中基于時標的事務控制[J];計算機工程;2009年23期
4 王琳;楊波;高艷麗;;Web2.0互聯(lián)網(wǎng)應用技術研究[J];中興通訊技術;2008年05期
5 周婧;王意潔;阮煒;李思昆;;面向海量數(shù)據(jù)的數(shù)據(jù)一致性研究[J];計算機科學;2006年04期
6 蓋九宇,張忠能,肖鶴;分布式數(shù)據(jù)庫數(shù)據(jù)復制技術的分析與應用[J];計算機應用與軟件;2005年07期
7 葛衛(wèi)民,張鋼,舒炎泰;基于Oracle高級復制的分布式數(shù)據(jù)庫系統(tǒng)應用研究[J];計算機工程與應用;2003年21期
8 宋興彬;基于Sybase復制技術的分布式數(shù)據(jù)庫系統(tǒng)的建立[J];山東科學;2000年01期
,本文編號:1344771
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1344771.html