可擴(kuò)展數(shù)據(jù)庫管理系統(tǒng)中的數(shù)據(jù)復(fù)制
本文關(guān)鍵詞:可擴(kuò)展數(shù)據(jù)庫管理系統(tǒng)中的數(shù)據(jù)復(fù)制 出處:《華東師范大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 分布式數(shù)據(jù)庫 基于日志結(jié)構(gòu)存儲 數(shù)據(jù)復(fù)制 數(shù)據(jù)導(dǎo)出 日志解析
【摘要】:隨著互聯(lián)網(wǎng)的不斷發(fā)展,數(shù)據(jù)規(guī)模不斷增大,數(shù)據(jù)庫系統(tǒng)的存儲與計算的橫向擴(kuò)展能力將會越來越重要。因此,分布式數(shù)據(jù)庫系統(tǒng)以其良好的擴(kuò)展性受到了工業(yè)界和學(xué)術(shù)界的廣泛關(guān)注。其中,基于日志結(jié)構(gòu)存儲(Log-StructuredStorage)的分布式系統(tǒng)成為了一種新的趨勢,這種讀寫分離的架構(gòu)已應(yīng)用于分布式數(shù)據(jù)庫系統(tǒng)中,如阿里巴巴的開源關(guān)系型數(shù)據(jù)庫管理系統(tǒng)OceanBase。數(shù)據(jù)導(dǎo)出是數(shù)據(jù)復(fù)制常見的技術(shù)之一,常用于企業(yè)級應(yīng)用,來提高系統(tǒng)的可用性、可擴(kuò)展性,以及保證數(shù)據(jù)的可靠性。在采用讀寫分離架構(gòu)的分布式數(shù)據(jù)庫系統(tǒng)中,由于數(shù)據(jù)分為靜態(tài)數(shù)據(jù)和動態(tài)數(shù)據(jù),并且靜態(tài)數(shù)據(jù)存儲于不同的物理節(jié)點(diǎn)上,數(shù)據(jù)復(fù)制成為了一種既消耗時間,也浪費(fèi)系統(tǒng)資源的一種操作。本文主要分析了在讀寫分離的分布式數(shù)據(jù)庫架構(gòu)下,數(shù)據(jù)復(fù)制存在的問題,并提出了有效的解決方法。本文工作的主要貢獻(xiàn)如下:1.設(shè)計并實(shí)現(xiàn)了一種考慮負(fù)載均衡的靜態(tài)數(shù)據(jù)導(dǎo)出方法。首先,針對分布式數(shù)據(jù)庫的架構(gòu)特點(diǎn),直接向不同物理節(jié)點(diǎn)發(fā)起并發(fā)查詢請求,減少數(shù)據(jù)的網(wǎng)絡(luò)傳輸次數(shù),縮短響應(yīng)時間。其次,采用生產(chǎn)者消費(fèi)者模型加快數(shù)據(jù)寫磁盤速度并解決占用大量內(nèi)存的問題。最后,根據(jù)數(shù)據(jù)多副本的特點(diǎn),將查詢請求均勻的發(fā)送給各個節(jié)點(diǎn),使系統(tǒng)中的各個節(jié)點(diǎn)負(fù)載均衡,同時也能提高整體數(shù)據(jù)導(dǎo)出的性能。2.設(shè)計并實(shí)現(xiàn)了一種基于日志解析的動態(tài)數(shù)據(jù)捕獲方法。一方面,實(shí)現(xiàn)日志同步和日志拉取功能,保證數(shù)據(jù)的正確性。另一方面,在日志解析過程中精簡對同一元組的頻繁操作,避免冗余操作,降低應(yīng)用更新的代價。3.通過基準(zhǔn)測試YCSB生成測試數(shù)據(jù)集并設(shè)計多組實(shí)驗(yàn),驗(yàn)證了本文提出的數(shù)據(jù)導(dǎo)出方法的可行性與高效性。并在開源數(shù)據(jù)庫CEDAR上實(shí)現(xiàn)了本文提出的數(shù)據(jù)導(dǎo)出方法。實(shí)驗(yàn)結(jié)果展示了本文提出的數(shù)據(jù)導(dǎo)出方法能有效的降低響應(yīng)時間,減少系統(tǒng)資源占用。本文提出的數(shù)據(jù)復(fù)制方法在CEDAR中的測試結(jié)果表明,該方法極大地提升了數(shù)據(jù)導(dǎo)出的效率。同時,本文提出的方法對同類型的可擴(kuò)展數(shù)據(jù)庫管理系統(tǒng)的數(shù)據(jù)復(fù)制有借鑒意義,也為可擴(kuò)展數(shù)據(jù)庫管理系統(tǒng)后續(xù)的數(shù)據(jù)復(fù)制技術(shù)提供了參考。
[Abstract]:With the continuous development of the Internet, the scale of data is increasing, and the lateral expansion of the storage and calculation of the database system will become more and more important. As a result, the distributed database system has attracted wide attention from industry and academia for its good scalability. Among them, the distributed system based on log structure storage (Log-StructuredStorage) has become a new trend. The architecture of reading and writing separation has been applied to distributed database systems, such as Alibaba's open source relational database management system OceanBase. Data export is one of the common technologies of data replication. It is commonly used in enterprise applications to improve the availability and scalability of the system, and to ensure data reliability. In distributed database system with read / write separation architecture, data is divided into static data and dynamic data, and static data are stored on different physical nodes. Data replication has become an operation which consumes time and wastes system resources. This paper mainly analyzes the problems of data replication in the distributed database architecture which is separated by read and write, and puts forward an effective solution. The main contributions of this work are as follows: 1. the design and implementation of a static data export method considering load balancing is designed and implemented. First, aiming at the architecture characteristics of distributed database, it directly initiates concurrent query requests to different physical nodes, reducing the number of network transmission and shortening the response time. Secondly, the producer consumer model is used to speed up the data write disk speed and to solve the problem of large amount of memory. Finally, according to the characteristics of multiple replicates, the query requests are sent to all nodes evenly, so that the load of each node in the system is balanced, and the overall data export performance is also improved. 2. design and implement a dynamic data capture method based on log parsing. On the one hand, log synchronization and log pull are implemented to ensure the correctness of the data. On the other hand, the frequent operation of the same tuple is streamlined in the log parsing process to avoid redundant operations and reduce the cost of application updates. 3. the test data set is generated by the benchmark YCSB and a number of experiments are designed to verify the feasibility and efficiency of the data export method proposed in this paper. The data export method proposed in this paper is implemented on the open source database CEDAR. The experimental results show that the proposed data export method can effectively reduce the response time and reduce the system resource occupancy. The results of the data replication method presented in this paper in CEDAR show that this method greatly improves the efficiency of the data export. At the same time, the method proposed in this paper has reference significance for data replication of the same type of extensible database management system, and also provides a reference for the subsequent data replication technology of the extensible database management system.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 陽振坤;;OceanBase關(guān)系數(shù)據(jù)庫架構(gòu)[J];華東師范大學(xué)學(xué)報(自然科學(xué)版);2014年05期
2 李國杰;程學(xué)旗;;大數(shù)據(jù)研究:未來科技及經(jīng)濟(jì)社會發(fā)展的重大戰(zhàn)略領(lǐng)域——大數(shù)據(jù)的研究現(xiàn)狀與科學(xué)思考[J];中國科學(xué)院院刊;2012年06期
3 羅軍;鄧文博;;懶散復(fù)制DDBS中基于時標(biāo)的事務(wù)控制[J];計算機(jī)工程;2009年23期
4 王琳;楊波;高艷麗;;Web2.0互聯(lián)網(wǎng)應(yīng)用技術(shù)研究[J];中興通訊技術(shù);2008年05期
5 周婧;王意潔;阮煒;李思昆;;面向海量數(shù)據(jù)的數(shù)據(jù)一致性研究[J];計算機(jī)科學(xué);2006年04期
6 蓋九宇,張忠能,肖鶴;分布式數(shù)據(jù)庫數(shù)據(jù)復(fù)制技術(shù)的分析與應(yīng)用[J];計算機(jī)應(yīng)用與軟件;2005年07期
7 葛衛(wèi)民,張鋼,舒炎泰;基于Oracle高級復(fù)制的分布式數(shù)據(jù)庫系統(tǒng)應(yīng)用研究[J];計算機(jī)工程與應(yīng)用;2003年21期
8 宋興彬;基于Sybase復(fù)制技術(shù)的分布式數(shù)據(jù)庫系統(tǒng)的建立[J];山東科學(xué);2000年01期
,本文編號:1344771
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1344771.html