基于列存儲(chǔ)的數(shù)據(jù)庫(kù)物理層優(yōu)化研究
發(fā)布時(shí)間:2018-01-22 07:05
本文關(guān)鍵詞: 列存儲(chǔ) 索引技術(shù) 樹(shù)索引 元輔音樹(shù) 出處:《華中科技大學(xué)》2013年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:由于網(wǎng)絡(luò)數(shù)據(jù)的海量增長(zhǎng)、數(shù)據(jù)倉(cāng)庫(kù)和OLAP的飛速發(fā)展以及商務(wù)數(shù)據(jù)分析的需求,在海量數(shù)據(jù)存儲(chǔ)和分析方面占有優(yōu)勢(shì)的列存儲(chǔ)得到很快的成長(zhǎng)。但以列為導(dǎo)向的物理層存儲(chǔ)結(jié)構(gòu)意味著在設(shè)計(jì)列存儲(chǔ)模塊或列數(shù)據(jù)庫(kù)的物理層時(shí),需要采用不同于傳統(tǒng)行存儲(chǔ)的方式。同時(shí),傳統(tǒng)的許多優(yōu)化技術(shù)和方法在列存儲(chǔ)中的效率普遍不高,且存儲(chǔ)代價(jià)較大。其中比較典型的例子是索引技術(shù)。因此,研究列存儲(chǔ)的物理層架構(gòu)和索引技術(shù),對(duì)列數(shù)據(jù)庫(kù)的開(kāi)發(fā)和應(yīng)用具有重要的意義。 基于以上需求,研究了列存儲(chǔ)的物理層架構(gòu),對(duì)物理層各模塊進(jìn)行設(shè)計(jì),實(shí)現(xiàn)了一個(gè)列存儲(chǔ)的原型系統(tǒng)。在數(shù)據(jù)組織上采用固定記錄數(shù)據(jù)塊的方式和基于大內(nèi)存分配的內(nèi)存池管理方式。在壓縮算法上,采用基于字典編碼的LZW壓縮算法,并與基于統(tǒng)計(jì)編碼的PPM壓縮算法進(jìn)行性能對(duì)比。 針對(duì)英文單詞特征的長(zhǎng)字符串類(lèi)型,設(shè)計(jì)了一種旨在減少不相關(guān)檢索數(shù)據(jù)塊的元輔音樹(shù)。首先,針對(duì)列存儲(chǔ)索引的需求和字符串特性,,設(shè)計(jì)了一種精簡(jiǎn)的樹(shù)結(jié)構(gòu);基于該樹(shù)的結(jié)構(gòu),研究了字符串輸入過(guò)程的狀態(tài)變化,并基于此定義了有限自動(dòng)狀態(tài)機(jī)的各元組。之后,針對(duì)該樹(shù)結(jié)構(gòu)和有限自動(dòng)狀態(tài)機(jī)的各元組定義,設(shè)計(jì)了樹(shù)的初始化、存儲(chǔ)、字符串掃描等操作算法;在對(duì)有限自動(dòng)狀態(tài)機(jī)進(jìn)行狀態(tài)轉(zhuǎn)移和狀態(tài)推導(dǎo)的基礎(chǔ)上,設(shè)計(jì)了查詢(xún)匹配算法。 在實(shí)際應(yīng)用于列存儲(chǔ)時(shí),對(duì)元輔音樹(shù)進(jìn)一步改進(jìn),設(shè)計(jì)出元輔音根樹(shù)和數(shù)據(jù)塊元輔音樹(shù)的雙層結(jié)構(gòu),同時(shí)采用單模式和雙模式匹配相結(jié)合的策略,在一次單模式匹配基礎(chǔ)上進(jìn)行二次雙模式匹配,以此更進(jìn)一步提高查詢(xún)效率。
[Abstract]:Because of the huge growth of network data, the rapid development of data warehouse and OLAP and the demand of business data analysis. Column storage, which has an advantage in mass data storage and analysis, has grown rapidly, but a column-oriented physical layer storage structure means when designing a column storage module or column database physical layer. At the same time, many of the traditional optimization techniques and methods in column storage are generally inefficient, and the storage cost is high. The typical example is the index technology. It is of great significance for the development and application of column database to study the physical layer architecture and index technology of column storage. Based on the above requirements, the physical layer architecture of column storage is studied, and each module of the physical layer is designed. A prototype system of column storage is implemented. In data organization, the data block is fixed and the memory pool is managed based on large memory allocation. The LZW compression algorithm based on dictionary coding is adopted, and the performance of PPM compression algorithm based on statistical coding is compared. For the long string type of English word feature, a meta consonant tree is designed to reduce the uncorrelated retrieval data block. Firstly, aiming at the requirement of column storage index and the character of string. A simple tree structure is designed. Based on the structure of the tree, the state change of the string input process is studied, and the tuples of the finite automatic state machine are defined based on this. Then, the tuples of the tree structure and the finite automatic state machine are defined. The algorithms of tree initialization, storage, string scanning and so on are designed. A query matching algorithm is designed based on the state transfer and state derivation of the finite automatic state machine. In the practical application in column storage, the meta-consonant tree is further improved, and the two-layer structure of meta-consonant root tree and data block element consonant tree is designed. At the same time, the strategy of combining single pattern and double pattern matching is adopted. Second double pattern matching is carried out on the basis of single pattern matching so as to further improve the query efficiency.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 張文修,吳偉志;粗糙集理論介紹和研究綜述[J];模糊系統(tǒng)與數(shù)學(xué);2000年04期
2 王梅;楊思簫;樂(lè)嘉錦;;列存儲(chǔ)數(shù)據(jù)庫(kù)中壓縮位圖索引技術(shù)[J];計(jì)算機(jī)工程;2012年18期
3 鄭翠芳;;幾種常用無(wú)損數(shù)據(jù)壓縮算法研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年09期
本文編號(hào):1454030
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1454030.html
最近更新
教材專(zhuān)著