基于GPU的復(fù)雜SQL查詢優(yōu)化方法研究

發(fā)布時(shí)間：2018-07-24 15:21

【摘要】：隨著信息技術(shù)的發(fā)展，數(shù)據(jù)庫中數(shù)據(jù)存儲(chǔ)規(guī)模越來越大，呈現(xiàn)出數(shù)據(jù)量大、數(shù)據(jù)類型多、價(jià)值密度低的特點(diǎn)。在這個(gè)背景下，數(shù)據(jù)庫的查詢操作從傳統(tǒng)的單一維度簡(jiǎn)單查詢擴(kuò)展為多維度的復(fù)雜查詢。復(fù)雜查詢作為數(shù)據(jù)庫系統(tǒng)分析數(shù)據(jù)的重要手段，在實(shí)際分析處理數(shù)據(jù)過程中扮演著重要角色。通過查詢請(qǐng)求，企業(yè)決策人員能快速獲得自己最關(guān)注的信息。利用傳統(tǒng)的數(shù)據(jù)庫分析手段對(duì)海量數(shù)據(jù)進(jìn)行提取、存儲(chǔ)、分析得到實(shí)時(shí)結(jié)果變得越來越困難，，也制約了企業(yè)管理者的決策。為了提高大規(guī)模數(shù)據(jù)下多維復(fù)雜查詢的速度，本文結(jié)合了圖形處理器并行計(jì)算能力和列存儲(chǔ)數(shù)據(jù)庫的存儲(chǔ)特點(diǎn)，提出了適用于并行查詢的列式存儲(chǔ)模型以及GPU并行加速查詢的策略。本文的主要研究?jī)?nèi)容如下：（1）研究數(shù)據(jù)庫復(fù)雜查詢的相關(guān)理論和GPU并行計(jì)算模型，并總結(jié)出傳統(tǒng)數(shù)據(jù)庫查詢優(yōu)化技術(shù)。重點(diǎn)分析了不同數(shù)據(jù)庫的存儲(chǔ)策略和壓縮算法；（2）提出一種基于稀疏索引的物理存儲(chǔ)模型，模型在列存儲(chǔ)的基礎(chǔ)上采用分段劃分的策略，同時(shí)根據(jù)GPU特點(diǎn)采用差值壓縮算法進(jìn)行數(shù)據(jù)壓縮處理，并結(jié)合GPU高并行計(jì)算能力實(shí)現(xiàn)對(duì)數(shù)據(jù)的并行壓縮；（3）提出一種基于GPU的復(fù)雜查詢并行執(zhí)行算法：結(jié)合GPU查詢?cè)Z操作實(shí)現(xiàn)對(duì)復(fù)雜查詢的優(yōu)化。其中重點(diǎn)實(shí)現(xiàn)了對(duì)范圍查詢和分組查詢的優(yōu)化，提出了對(duì)分組查詢結(jié)果合并的策略。提出利用流水線調(diào)度策略解決實(shí)驗(yàn)中存在IO時(shí)間過長(zhǎng)的問題，一定程度上加快了查詢響應(yīng)的速度；（4）通過實(shí)驗(yàn)證明了利用GPU加速壓縮算法和查詢加速算法的優(yōu)越性：將本文提出的查詢模型和傳統(tǒng)數(shù)據(jù)庫采用美國(guó)交易處理效能委員會(huì)提出TPC-H測(cè)試數(shù)據(jù)集進(jìn)行對(duì)比分析，證明了本文查詢模型在大規(guī)模數(shù)據(jù)集下相比于現(xiàn)有GPU數(shù)據(jù)庫取得5-8倍的加速比。
[Abstract]:With the development of information technology, the scale of data storage in database becomes larger and larger, showing the characteristics of large amount of data, many types of data, and low value density. In this context, the query operation of database is extended from simple query with single dimension to complex query with multiple dimensions. As an important means of data analysis in database system, complex query plays an important role in the process of data analysis and processing. Through the query request, the enterprise decision-makers can quickly obtain their most concerned information. It is becoming more and more difficult to extract, store and obtain real-time results by using the traditional database analysis method, which also restricts the decision-making of enterprise managers. In order to improve the speed of multi-dimensional complex query under large-scale data, this paper combines the parallel computing ability of GPU and the storage characteristics of column storage database. A column storage model suitable for parallel query and a strategy of GPU parallel accelerated query are proposed. The main contents of this paper are as follows: (1) the related theories of complex database query and GPU parallel computing model are studied, and the traditional database query optimization techniques are summarized. The storage strategies and compression algorithms of different databases are analyzed emphatically. (2) A physical storage model based on sparse index is proposed. At the same time, according to the characteristics of GPU, the difference compression algorithm is used to compress the data, and the parallel compression of data is realized by combining the high parallel computing ability of GPU. (3) A parallel execution algorithm of complex query based on GPU is proposed. The optimization of complex query is realized by combining GPU query primitive operation. The optimization of range query and grouping query is emphasized, and the strategy of merging the result of grouping query is put forward. The pipeline scheduling strategy is proposed to solve the problem that IO time is too long in the experiment, which speeds up the query response to a certain extent. (4) the superiority of using GPU accelerated compression algorithm and query acceleration algorithm is proved by experiments. The TPC-H test data is presented by the American transaction processing efficiency Commission (TPAEC) by using the query model and traditional database presented in this paper. Set for comparative analysis, It is proved that the query model in this paper has a speedup of 5-8 times compared with the existing GPU database under the large-scale data set.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP311.13;TP333

【共引文獻(xiàn)】

相關(guān)期刊論文前1條

1 鞏九洲;馮百明;;基于關(guān)系代數(shù)的異構(gòu)關(guān)系數(shù)據(jù)集成研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2014年09期

相關(guān)博士學(xué)位論文前1條

1 劉勇;基于GPU的內(nèi)存數(shù)據(jù)庫索引技術(shù)研究[D];華南理工大學(xué);2013年

相關(guān)碩士學(xué)位論文前1條

1 王冬;基于動(dòng)態(tài)結(jié)點(diǎn)流行度的B~+樹索引研究[D];鄭州大學(xué);2014年

本文編號(hào)：2141804

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2141804.html

上一篇：基于ULPI協(xié)議的USB接口的FPGA實(shí)現(xiàn)
下一篇：淺談?dòng)?jì)算機(jī)技術(shù)在煤礦安全生產(chǎn)中的應(yīng)用研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于GPU的復(fù)雜SQL查詢優(yōu)化方法研究