面向云計算的多維數(shù)據(jù)索引研究

發(fā)布時間：2018-08-20 10:07

【摘要】：云計算技術(shù)的廣泛應(yīng)用使數(shù)據(jù)呈現(xiàn)出爆炸式增長的趨勢,對傳統(tǒng)的數(shù)據(jù)管理技術(shù)提出了新的挑戰(zhàn)。現(xiàn)有的云存儲系統(tǒng)普遍采用分布式哈希表的形式存取數(shù)據(jù),這種基于key-value的模型在單維度查詢時可以獲得較高的訪問效率,但是對多維度查詢的支持不足。當(dāng)用戶提交基于多個屬性列的多維查詢時,由于缺乏有效的二級索引系統(tǒng)的支持,需要運(yùn)行MapReduce任務(wù)掃描整個數(shù)據(jù)集,從而使查詢效率降低。因此,近年來云存儲輔助索引成為學(xué)術(shù)界研究的熱點,相關(guān)成果陸續(xù)發(fā)表在數(shù)據(jù)庫領(lǐng)域國際頂級會議和頂級期刊上。本文對云計算環(huán)境下的多維數(shù)據(jù)索引技術(shù)進(jìn)行研究。論文工作分別從云存儲系統(tǒng)中的多維數(shù)據(jù)索引、基于主從結(jié)構(gòu)的雙層多維數(shù)據(jù)索引、以及純分布式環(huán)境下支持動態(tài)維度擴(kuò)展的多維數(shù)據(jù)索引三個方面展開,主要內(nèi)容概述如下:1.針對現(xiàn)有云存儲系統(tǒng)主要支持單鍵值索引,缺乏有效的多維索引,導(dǎo)致多維度查詢效率較低的問題,本文提出了一種基于UB樹的新型多維云數(shù)據(jù)索引方案:CloudUB。該方案首先利用Z曲線進(jìn)行多維空間的降維,然后沿Z曲線將多維空間劃分成Z區(qū)域,利用B+樹組織Z區(qū)域信息,建立改進(jìn)的UB樹索引。CloudUB在執(zhí)行多維查詢時能夠基于Z區(qū)域濾除不可能包含查詢結(jié)果的數(shù)據(jù)空間,從而提高查詢效率。另外,本文設(shè)計了基于HBase的索引構(gòu)建和維護(hù)機(jī)制,并提出了相應(yīng)的實時和離線索引構(gòu)建算法。該機(jī)制把基于Z曲線降維的B+樹葉節(jié)點保存在HBase中,將原始多維空間的查找問題轉(zhuǎn)化成現(xiàn)有云存儲系統(tǒng)能夠支持的key-value查詢問題,從而支持MapReduce技術(shù)對索引表的高并發(fā)訪問。最后,本文設(shè)計了CloudUB的多維查找算法并進(jìn)行了效率分析。基于Hadoop2.2版本、1000萬級數(shù)據(jù)量的測試結(jié)果表明,CloudUB索引方案支持靈活、高效的實時索引構(gòu)建,多維查詢效率顯著提升。2.通過對云計算系統(tǒng)中數(shù)據(jù)管理方式的深入研究,本文提出了一種符合云計算系統(tǒng)主從管理方式的雙層多維數(shù)據(jù)索引:KD-R。該索引方案為云計算系統(tǒng)中的每一個數(shù)據(jù)服務(wù)器上的本地數(shù)據(jù)建立一個R樹索引,所有本地的R樹索引共同構(gòu)成雙層索引系統(tǒng)的下層索引,然后將每個R樹索引的部分節(jié)點信息發(fā)布到全局服務(wù)器層,由此構(gòu)建一個統(tǒng)一的KD樹索引。針對將哪些局部索引節(jié)點發(fā)布到全局索引的問題,本文設(shè)計了自適應(yīng)的節(jié)點發(fā)布算法,以及選擇發(fā)布節(jié)點的代價模型,該代價模型可以估算局部索引節(jié)點的索引代價。索引系統(tǒng)根據(jù)代價模型對局部數(shù)據(jù)服務(wù)器上的索引節(jié)點進(jìn)行周期性的檢測,然后利用自適應(yīng)節(jié)點發(fā)布算法,調(diào)整發(fā)布的局部索引節(jié)點,達(dá)到動態(tài)優(yōu)化KD-R索引的目的。實驗結(jié)果表明,基于KD-R索引的多維查詢算法具有較高的內(nèi)存利用率和查詢效率,展示了良好的可用性。3.針對云計算系統(tǒng)中用戶的需求具有彈性,存在動態(tài)擴(kuò)展查詢維度的現(xiàn)狀,本文提出了一種基于Chord覆蓋網(wǎng)絡(luò)和分區(qū)位圖的多維云數(shù)據(jù)索引:CB-index。該索引方案采用Chord覆蓋網(wǎng)絡(luò)構(gòu)建全局索引,克服了主從結(jié)構(gòu)帶來的全局服務(wù)器易形成瓶頸的問題,實現(xiàn)了純分布式的雙層索引架構(gòu);同時,本文設(shè)計了分區(qū)位圖編碼機(jī)制,通過分區(qū)位圖構(gòu)建本地數(shù)據(jù)服務(wù)器上的局部數(shù)據(jù)索引,實現(xiàn)了局部索引節(jié)點與Chord覆蓋網(wǎng)絡(luò)的結(jié)合。根據(jù)分區(qū)位圖編碼前綴可擴(kuò)展的特性,本文設(shè)計了動態(tài)的索引維度擴(kuò)展算法,在維度動態(tài)擴(kuò)展的同時避免了索引結(jié)構(gòu)的完全重構(gòu)。除此之外,本文還設(shè)計了自適應(yīng)的索引節(jié)點調(diào)整算法、多維查詢算法和索引維護(hù)算法。實驗結(jié)果表明,CB-index索引具有較高的多維查詢效率,并支持靈活的索引維度擴(kuò)展,能夠適應(yīng)云計算環(huán)境下用戶的動態(tài)查詢需求。
[Abstract]:The widespread application of cloud computing technology makes the data explosively increasing, and brings new challenges to the traditional data management technology. The existing cloud storage systems generally use the form of distributed hash table to access data. This key-value-based model can obtain higher access efficiency in single-dimensional query, but it is more efficient than multi-dimensional query. When users submit multi-dimensional queries based on multiple attribute columns, due to the lack of effective secondary index system support, it is necessary to run the MapReduce task to scan the entire data set, thus reducing the query efficiency. Tables are presented at international top-level conferences and journals in the database field. This paper studies the multi-dimensional data indexing technology in the cloud computing environment. The main contents of this paper are summarized as follows: 1. To solve the problem that the existing cloud storage systems mainly support single-key index and lack effective multi-dimensional index, which leads to low efficiency of multi-dimensional query, this paper proposes a new multi-dimensional cloud data index scheme based on UB tree: CloudUB. Then, the dimension of the query is reduced, and the multi-dimensional space is divided into Z-region along Z-curve, and the Z-region information is organized by B+tree to establish an improved UB tree index. CloudUB can filter out the data space which can not contain the query results based on Z-region, so as to improve the query efficiency. In addition, the index construction and dimension based on HBase are designed. The mechanism saves B+leaf nodes based on Z-curve dimensionality reduction in HBase and transforms the original multi-dimensional search problem into a key-value query problem that can be supported by existing cloud storage systems, thus supporting high concurrent access to index tables by MapReduce technology. Based on Hadoop version 2.2, the test results of 10 million level data show that CloudUB index scheme supports flexible and efficient real-time index construction, and the efficiency of multi-dimensional query is significantly improved. 2. Through the in-depth study of data management in cloud computing system, this paper proposes a new method. KD-R, a two-tier multi-dimensional data index that conforms to the master-slave management of cloud computing system, establishes an R-tree index for local data on each data server in the cloud computing system. All local R-tree indexes together form the underlying index of the double-tier index system, and then part of the nodes of each R-tree index are sent to each other. To solve the problem of which local index nodes are published to the global index, this paper designs an adaptive node publishing algorithm and a cost model for selecting publishing nodes, which can estimate the index cost of local index nodes. The cost model periodically detects the index nodes on the local data server, and then adjusts the published local index nodes by using the adaptive node publishing algorithm to dynamically optimize the KD-R index. The experimental results show that the multi-dimensional query algorithm based on KD-R index has high memory utilization and query efficiency. 3. In view of the elasticity of users'needs and the fact that query dimensions are dynamically extended in cloud computing systems, this paper proposes a multi-dimensional cloud data index: CB-index based on Chord overlay network and zonal bitmap. At the same time, this paper designs a partitioned bitmap encoding mechanism, builds a local data index on the local data server through the partitioned bitmap, and realizes the combination of local index nodes and the Chord overlay network. In addition, an adaptive index node adjustment algorithm, a multi-dimensional query algorithm and an index maintenance algorithm are also designed. The experimental results show that CB-index index has high efficiency in multi-dimensional query and can avoid the complete reconstruction of index structure. It supports flexible index dimension expansion and is able to meet users' dynamic query requirements in cloud computing environment.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2016
【分類號】：TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;本期廣告商索引表[J];電子與電腦;2000年01期

2 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;2000年02期

3 ;本期廣告商索引表[J];電子與電腦;2000年02期

4 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;2000年04期

5 ;本期廣告商索引表[J];電子與電腦;2000年04期

6 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;2000年11期

7 ;本期廣告商索引表[J];電子與電腦;2000年11期

8 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;1999年05期

9 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;1999年08期

10 ;本期編輯內(nèi)容產(chǎn)品索引表[J];電子與電腦;1999年09期

相關(guān)會議論文前9條

1 石瑋峰;楊冬青;唐世渭;關(guān)濤;;COBASE的索引管理技術(shù)[A];第十二屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集[C];1994年

2 王彥祥;王廣林;;“索引之星”的研制和索引編制[A];2004年辭書與數(shù)字化研討會論文集[C];2004年

3 王曉輝;王柏;;通過有效使用索引優(yōu)化Oracle應(yīng)用系統(tǒng)性能[A];第九屆全國青年通信學(xué)術(shù)會議論文集[C];2004年

4 孫云峰;陳渝;史元春;張寶鵬;張曦;江文峰;;基于高精度室內(nèi)定位系統(tǒng)的移動物體軌跡索引[A];第二屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會議(HHME2006)——第2屆中國普適計算學(xué)術(shù)會議(PCC'06)論文集[C];2006年

5 王先勝;喬健;汪衛(wèi);何震瀛;;AX-Tree:基于RDBMS的粒度自適應(yīng)XML數(shù)據(jù)索引[A];第二十五屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（一）[C];2008年

6 邵雄凱;盧炎生;程學(xué)先;;用建立本地廣播索引表的方法改善移動客戶機(jī)的性能[A];第二十屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2003年

7 薛巍;李維佳;穆飛;舒繼武;;PDPI：一種面向多核的可擴(kuò)展并行索引算法[A];全國網(wǎng)絡(luò)與信息安全技術(shù)研討會論文集（下冊）[C];2007年

8 王鵬飛;洪曉光;;基于XML大文檔的動態(tài)索引[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2004年

9 楊彬;洪曉光;;基于XML大文檔的動態(tài)索引[A];’2004計算機(jī)應(yīng)用技術(shù)交流會議論文集[C];2004年

相關(guān)重要報紙文章前1條

1 裘宗燕;輕松做索引[N];中華讀書報;2002年

相關(guān)博士學(xué)位論文前5條

1 何婧;面向云計算的多維數(shù)據(jù)索引研究[D];電子科技大學(xué);2016年

2 馬武彬;面向信息物理融合系統(tǒng)的資源索引構(gòu)建和查詢優(yōu)化技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2014年

3 張帆;搜索引擎中索引表求交和提前停止技術(shù)優(yōu)化研究[D];南開大學(xué);2012年

4 陳旭毅;基于索引云的企業(yè)搜索引擎實現(xiàn)研究[D];武漢大學(xué);2011年

5 余利華;分布式數(shù)據(jù)存儲和處理的若干技術(shù)研究[D];浙江大學(xué);2008年

相關(guān)碩士學(xué)位論文前10條

1 周文輝;基于HBase和內(nèi)存數(shù)據(jù)庫的索引和查詢技術(shù)研究與系統(tǒng)實現(xiàn)[D];南京大學(xué);2014年

2 付佳;基于LSM樹的NoSQL數(shù)據(jù)庫索引研究[D];北京理工大學(xué);2016年

3 王萬樂;基于聚類的海量文檔集分布式索引構(gòu)建方法[D];山東大學(xué);2016年

4 王健;DWMS中索引選擇策略的研究與實現(xiàn)[D];東華大學(xué);2010年

5 胡玉樂;列存儲DWMS中的索引關(guān)鍵技術(shù)研究[D];東華大學(xué);2011年

6 張慧;一種基于位立方體的XML索引方式[D];山東大學(xué);2007年

7 王學(xué);面向SaaS應(yīng)用交付平臺的多租戶數(shù)據(jù)索引研究[D];山東大學(xué);2012年

8 石有滴;XML索引關(guān)鍵技術(shù)研究[D];華南理工大學(xué);2011年

9 陳堅強(qiáng);DB2數(shù)據(jù)庫索引性能調(diào)整與優(yōu)化[D];上海交通大學(xué);2011年

10 葛付江;面向動態(tài)文檔集的大規(guī)模文本索引構(gòu)建技術(shù)的研究[D];哈爾濱工業(yè)大學(xué);2008年

，

本文編號：2193230

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2193230.html

上一篇：具有執(zhí)行器非線性和狀態(tài)約束的機(jī)器人自適應(yīng)控制
下一篇：面向圖數(shù)據(jù)的Top-k檢索算法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向云計算的多維數(shù)據(jù)索引研究