分布式OLAP語義緩存算法研究
發(fā)布時間:2018-05-23 18:21
本文選題:封閉立方體 + Spark ; 參考:《昆明理工大學》2017年碩士論文
【摘要】:對數(shù)據(jù)倉庫建模形成的數(shù)據(jù)立方體模型,通過刪除其元組中的非封閉單元進行壓縮并分層形成了分層封閉立方體。Spark是一個基于內存的快速通用的大數(shù)據(jù)并行計算框架,對此本文基于分層封閉立方體,利用Spark,設計和實現(xiàn)了兩種有效的分布式OLAP查詢算法:SLCCQuery及其優(yōu)化算法SLCC_LayeredQuery。不同參數(shù)的數(shù)據(jù)集上的實驗驗證了本文提出的Spark環(huán)境下的分布式OLAP查詢算法的有效性及其優(yōu)化算法的相對高效性。為了進一步提高Spark環(huán)境下的分布式OLAP查詢效率,本文在Spark環(huán)境下設計了一種新的分布式OLAP語義緩存算法。該算法是通過存儲等價類的上下界而不是單個數(shù)據(jù)元組信息,來代表查詢集合中的元組,同時緩存項及不同緩存項間的語義關系組成了代數(shù)格結構,查詢時通過語義關系剪枝,進一步縮小了在緩存中的查找范圍,文中最后的實驗充分驗證了該分布式OLAP語義緩存算法的有效性及其相對高效性。本文主要研究內容如下:(1)通過去掉數(shù)據(jù)立方體中的非封閉單元進行壓縮并分層形成了分層封閉立方體,同時基于Spark,本文設計并實現(xiàn)了兩種有效的分布式OLAP查詢算法:SLCCQuery 及其優(yōu)化算法 SLCC__LayeredQuery;(2)根據(jù)分布式OLAP查詢算法的緩存設計需要,同時針對通常的緩存查詢技術,例如,頁面緩存,元組緩存等沒有利用查詢緩存項中的語義關系的特性,本文提出一種新的OLAP查詢緩存技術——語義OLAP緩存;(3)通過語義OLAP緩存模型,并基于Spark,木文設計了兩種Spark環(huán)境下的分布式OLAP緩存算法,并結合不同的緩存置換策略,實驗驗證了本文提出的分布式OLAP語義緩存的算法的有效性和相對高效性。
[Abstract]:Based on the data cube model modeled by data warehouse, by removing the unclosed unit from the tuple and compressing it into layers, the layered closed cube Spark is a fast and universal big data parallel computing framework based on memory. In this paper, based on the hierarchical closed cube, two effective distributed OLAP query algorithms: OLAP query and its optimization algorithm are designed and implemented by using Spark. Experiments on data sets with different parameters verify the effectiveness of the distributed OLAP query algorithm under the Spark environment and the relative efficiency of the optimization algorithm. In order to improve the efficiency of distributed OLAP query in Spark environment, a new distributed OLAP semantic cache algorithm is designed under Spark environment. The algorithm represents the tuples in the query set by storing the upper and lower bounds of the equivalent class rather than the single data tuple information. At the same time, the semantic relations between the cached items and the different cached items form an algebraic lattice structure, and the query is pruned by semantic relations. Finally, the effectiveness and relative efficiency of the distributed OLAP semantic cache algorithm are fully verified by the experiments in this paper. The main contents of this paper are as follows: (1) by removing the unclosed elements from the data cube, we compress and delaminate to form a layered closed cube. At the same time, based on Spark, this paper designs and implements two effective distributed OLAP query algorithms: SLCCQuery and its optimization algorithm SLCC _ S _ S _ Q _ 2) according to the cache design needs of distributed OLAP query algorithm, and aiming at the common cache query technology, for example, page cache, Tuple caching does not take advantage of the semantic relationship in query cache items. In this paper, a new OLAP query caching technique, semantic OLAP cache, is proposed, which is based on semantic OLAP caching model. Based on Spark, this paper designs two distributed OLAP cache algorithms under Spark environment, and combines different cache replacement strategies to verify the effectiveness and relative efficiency of the proposed distributed OLAP semantic cache algorithm.
【學位授予單位】:昆明理工大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前7條
1 郭朝鵬;王智;韓峰;張一川;宋杰;;HaoLap:基于Hadoop的海量數(shù)據(jù)OLAP系統(tǒng)[J];計算機研究與發(fā)展;2013年S1期
2 涂建新;游進國;周水力;丁軍帥;;語義緩存技術的研究[J];計算機技術 與發(fā)展;2013年09期
3 王珊;王會舉;覃雄派;周p,
本文編號:1925830
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1925830.html
最近更新
教材專著