基于GPU的復(fù)雜SQL查詢優(yōu)化方法研究
發(fā)布時間:2018-07-24 15:21
【摘要】:隨著信息技術(shù)的發(fā)展,數(shù)據(jù)庫中數(shù)據(jù)存儲規(guī)模越來越大,呈現(xiàn)出數(shù)據(jù)量大、數(shù)據(jù)類型多、價值密度低的特點。在這個背景下,數(shù)據(jù)庫的查詢操作從傳統(tǒng)的單一維度簡單查詢擴展為多維度的復(fù)雜查詢。復(fù)雜查詢作為數(shù)據(jù)庫系統(tǒng)分析數(shù)據(jù)的重要手段,在實際分析處理數(shù)據(jù)過程中扮演著重要角色。通過查詢請求,企業(yè)決策人員能快速獲得自己最關(guān)注的信息。利用傳統(tǒng)的數(shù)據(jù)庫分析手段對海量數(shù)據(jù)進行提取、存儲、分析得到實時結(jié)果變得越來越困難,,也制約了企業(yè)管理者的決策。 為了提高大規(guī)模數(shù)據(jù)下多維復(fù)雜查詢的速度,本文結(jié)合了圖形處理器并行計算能力和列存儲數(shù)據(jù)庫的存儲特點,提出了適用于并行查詢的列式存儲模型以及GPU并行加速查詢的策略。本文的主要研究內(nèi)容如下: (1)研究數(shù)據(jù)庫復(fù)雜查詢的相關(guān)理論和GPU并行計算模型,并總結(jié)出傳統(tǒng)數(shù)據(jù)庫查詢優(yōu)化技術(shù)。重點分析了不同數(shù)據(jù)庫的存儲策略和壓縮算法; (2)提出一種基于稀疏索引的物理存儲模型,模型在列存儲的基礎(chǔ)上采用分段劃分的策略,同時根據(jù)GPU特點采用差值壓縮算法進行數(shù)據(jù)壓縮處理,并結(jié)合GPU高并行計算能力實現(xiàn)對數(shù)據(jù)的并行壓縮; (3)提出一種基于GPU的復(fù)雜查詢并行執(zhí)行算法:結(jié)合GPU查詢原語操作實現(xiàn)對復(fù)雜查詢的優(yōu)化。其中重點實現(xiàn)了對范圍查詢和分組查詢的優(yōu)化,提出了對分組查詢結(jié)果合并的策略。提出利用流水線調(diào)度策略解決實驗中存在IO時間過長的問題,一定程度上加快了查詢響應(yīng)的速度; (4)通過實驗證明了利用GPU加速壓縮算法和查詢加速算法的優(yōu)越性:將本文提出的查詢模型和傳統(tǒng)數(shù)據(jù)庫采用美國交易處理效能委員會提出TPC-H測試數(shù)據(jù)集進行對比分析,證明了本文查詢模型在大規(guī)模數(shù)據(jù)集下相比于現(xiàn)有GPU數(shù)據(jù)庫取得5-8倍的加速比。
[Abstract]:With the development of information technology, the scale of data storage in database becomes larger and larger, showing the characteristics of large amount of data, many types of data, and low value density. In this context, the query operation of database is extended from simple query with single dimension to complex query with multiple dimensions. As an important means of data analysis in database system, complex query plays an important role in the process of data analysis and processing. Through the query request, the enterprise decision-makers can quickly obtain their most concerned information. It is becoming more and more difficult to extract, store and obtain real-time results by using the traditional database analysis method, which also restricts the decision-making of enterprise managers. In order to improve the speed of multi-dimensional complex query under large-scale data, this paper combines the parallel computing ability of GPU and the storage characteristics of column storage database. A column storage model suitable for parallel query and a strategy of GPU parallel accelerated query are proposed. The main contents of this paper are as follows: (1) the related theories of complex database query and GPU parallel computing model are studied, and the traditional database query optimization techniques are summarized. The storage strategies and compression algorithms of different databases are analyzed emphatically. (2) A physical storage model based on sparse index is proposed. At the same time, according to the characteristics of GPU, the difference compression algorithm is used to compress the data, and the parallel compression of data is realized by combining the high parallel computing ability of GPU. (3) A parallel execution algorithm of complex query based on GPU is proposed. The optimization of complex query is realized by combining GPU query primitive operation. The optimization of range query and grouping query is emphasized, and the strategy of merging the result of grouping query is put forward. The pipeline scheduling strategy is proposed to solve the problem that IO time is too long in the experiment, which speeds up the query response to a certain extent. (4) the superiority of using GPU accelerated compression algorithm and query acceleration algorithm is proved by experiments. The TPC-H test data is presented by the American transaction processing efficiency Commission (TPAEC) by using the query model and traditional database presented in this paper. Set for comparative analysis, It is proved that the query model in this paper has a speedup of 5-8 times compared with the existing GPU database under the large-scale data set.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13;TP333
本文編號:2141804
[Abstract]:With the development of information technology, the scale of data storage in database becomes larger and larger, showing the characteristics of large amount of data, many types of data, and low value density. In this context, the query operation of database is extended from simple query with single dimension to complex query with multiple dimensions. As an important means of data analysis in database system, complex query plays an important role in the process of data analysis and processing. Through the query request, the enterprise decision-makers can quickly obtain their most concerned information. It is becoming more and more difficult to extract, store and obtain real-time results by using the traditional database analysis method, which also restricts the decision-making of enterprise managers. In order to improve the speed of multi-dimensional complex query under large-scale data, this paper combines the parallel computing ability of GPU and the storage characteristics of column storage database. A column storage model suitable for parallel query and a strategy of GPU parallel accelerated query are proposed. The main contents of this paper are as follows: (1) the related theories of complex database query and GPU parallel computing model are studied, and the traditional database query optimization techniques are summarized. The storage strategies and compression algorithms of different databases are analyzed emphatically. (2) A physical storage model based on sparse index is proposed. At the same time, according to the characteristics of GPU, the difference compression algorithm is used to compress the data, and the parallel compression of data is realized by combining the high parallel computing ability of GPU. (3) A parallel execution algorithm of complex query based on GPU is proposed. The optimization of complex query is realized by combining GPU query primitive operation. The optimization of range query and grouping query is emphasized, and the strategy of merging the result of grouping query is put forward. The pipeline scheduling strategy is proposed to solve the problem that IO time is too long in the experiment, which speeds up the query response to a certain extent. (4) the superiority of using GPU accelerated compression algorithm and query acceleration algorithm is proved by experiments. The TPC-H test data is presented by the American transaction processing efficiency Commission (TPAEC) by using the query model and traditional database presented in this paper. Set for comparative analysis, It is proved that the query model in this paper has a speedup of 5-8 times compared with the existing GPU database under the large-scale data set.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13;TP333
【共引文獻】
相關(guān)期刊論文 前1條
1 鞏九洲;馮百明;;基于關(guān)系代數(shù)的異構(gòu)關(guān)系數(shù)據(jù)集成研究[J];計算機技術(shù)與發(fā)展;2014年09期
相關(guān)博士學(xué)位論文 前1條
1 劉勇;基于GPU的內(nèi)存數(shù)據(jù)庫索引技術(shù)研究[D];華南理工大學(xué);2013年
相關(guān)碩士學(xué)位論文 前1條
1 王冬;基于動態(tài)結(jié)點流行度的B~+樹索引研究[D];鄭州大學(xué);2014年
本文編號:2141804
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2141804.html
最近更新
教材專著