基于元數(shù)據(jù)關(guān)聯(lián)特征的分布式查詢方法研究
發(fā)布時(shí)間:2018-06-01 07:05
本文選題:關(guān)聯(lián)特征 + 元數(shù)據(jù)查詢; 參考:《華中科技大學(xué)》2013年碩士論文
【摘要】:信息技術(shù)的不斷發(fā)展對信息存儲高容量,高性能提出更高的要求,云存儲應(yīng)運(yùn)而生。大規(guī)模存儲系統(tǒng)的應(yīng)用越來越廣泛,存儲容量從以前的TB(Terabyte)級上升到PB(Petabyte)級甚至EB(Exabyte)級。用戶在使用海量存儲空間的同時(shí),也發(fā)現(xiàn)數(shù)據(jù)的查找和管理變得越來越困難,F(xiàn)有的元數(shù)據(jù)管理方法存在可擴(kuò)展性弱、查詢效率低、實(shí)時(shí)性差等缺點(diǎn)。 針對上述缺點(diǎn),提出了一種基于關(guān)聯(lián)特征的元數(shù)據(jù)查詢方法,可以充分利用多維元數(shù)據(jù)的關(guān)聯(lián)特征來提高查詢效率。系統(tǒng)在局部靈敏哈希(LSH)聚集數(shù)據(jù)的基礎(chǔ)上建立分布式索引。全局索引劃分采用LSH哈希表桶內(nèi)劃分的方式來提升系統(tǒng)的擴(kuò)展性,可以避免大量的數(shù)據(jù)遷移。索引的維護(hù)采用分層架構(gòu),每層獨(dú)立配置,同時(shí)具有較好的可擴(kuò)展性,方便系統(tǒng)管理。為了快速更新分布式索引,系統(tǒng)的索引采用文件存儲,并基于版本批量更新。在這種架構(gòu)下,查詢請求采用基于代理的處理模式,每個(gè)請求都會分配一臺查詢服務(wù)器作為代理,,代理節(jié)點(diǎn)負(fù)責(zé)轉(zhuǎn)發(fā)查詢請求、收集查詢結(jié)果并返回給客戶端。 通過測試,這種查詢方法相對于傳統(tǒng)的基于一維索引的方式查詢效率有了顯著的提高,并且系統(tǒng)的響應(yīng)時(shí)間隨著數(shù)據(jù)規(guī)模的增大而呈擬線性增長。同時(shí),系統(tǒng)提出的基于版本的批量更新策略,使得系統(tǒng)的索引更新效率相對于MySQL數(shù)據(jù)庫提升10倍左右。
[Abstract]:With the development of information technology, high capacity and high performance of information storage are required, cloud storage emerges as the times require. Large-scale storage systems are becoming more and more popular, with storage capacity rising from the previous TBU terabyte- to the PB-Petabyte- or even EB-Exabyte-. While using mass storage space, users also find it more and more difficult to find and manage data. The existing metadata management methods have some shortcomings, such as weak scalability, low query efficiency and poor real-time performance. In view of the above shortcomings, a metadata query method based on association features is proposed, which can make full use of the association features of multidimensional metadata to improve the query efficiency. The distributed index is built on the basis of local sensitive hashing LSHs aggregated data. Global index partitioning is based on LSH hash table bucket partitioning to improve the scalability of the system and to avoid a large amount of data migration. The index is maintained in a hierarchical architecture, each layer is independently configured, and it has good scalability and is convenient for system management. In order to update the distributed index quickly, the index of the system is stored in files and updated in batches based on version. In this architecture, the query request is processed in a proxy-based mode, and each request is assigned a query server as a proxy. The proxy node is responsible for forwarding the query request, collecting the query results and returning them to the client. The test results show that the query efficiency of this method is significantly higher than that of the traditional one-dimensional indexing method, and the response time of the system increases with the increase of data scale. At the same time, the index updating efficiency of the system is about 10 times higher than that of MySQL database.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 王強(qiáng),劉東波,王建新;數(shù)據(jù)倉庫元數(shù)據(jù)標(biāo)準(zhǔn)研究[J];計(jì)算機(jī)工程;2002年12期
2 魏小娟;楊婧;李翠平;陳紅;;Skyline查詢處理[J];軟件學(xué)報(bào);2008年06期
本文編號:1963300
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1963300.html
最近更新
教材專著