高擴(kuò)展的RDF數(shù)據(jù)存儲系統(tǒng)研究
發(fā)布時間:2018-04-03 05:26
本文選題:資源描述框架 切入點(diǎn):語義數(shù)據(jù)表達(dá) 出處:《華中科技大學(xué)》2012年碩士論文
【摘要】:由于RDF(Resource Description Framework)數(shù)據(jù)具有表達(dá)靈活,數(shù)據(jù)交換方便等優(yōu)點(diǎn),其數(shù)據(jù)量在以驚人的速度增長。傳統(tǒng)的RDF數(shù)據(jù)存儲系統(tǒng)或以關(guān)系數(shù)據(jù)庫為存儲后端,或以本地存儲方式存儲數(shù)據(jù),但是這些存儲方式在存儲大規(guī)模的RDF數(shù)據(jù)時都面臨著擴(kuò)展性問題。在存儲大規(guī)模RDF數(shù)據(jù)時,需要降低數(shù)據(jù)的存儲空間并加速查詢處理。但目前提出的存儲方式不夠緊湊且存在大量的冗余數(shù)據(jù),導(dǎo)致在生成查詢計劃和執(zhí)行過程中消耗了大量的時間。 高擴(kuò)展的RDF數(shù)據(jù)存儲系統(tǒng)TripleBit旨在為大規(guī)模RDF數(shù)據(jù)提供一個高效的存儲和查詢方案。利用RDF數(shù)據(jù)特點(diǎn),系統(tǒng)將RDF數(shù)據(jù)表達(dá)成一個位圖矩陣。為了降低數(shù)據(jù)占用空間,在數(shù)據(jù)存儲時針對各個數(shù)據(jù)表特征和作用設(shè)計了相應(yīng)的壓縮算法。在底層具體存儲時采用了基于內(nèi)存的存儲方式降低了系統(tǒng)在存儲和查詢時的I/O開銷,,并采用了數(shù)據(jù)分塊的存儲方法,既使得存儲管理方便又使得存儲結(jié)構(gòu)緊湊,加速了查詢處理。為了提高RDF數(shù)據(jù)查找的速度,系統(tǒng)設(shè)計了兩類索引分別加速系統(tǒng)數(shù)據(jù)塊的定位和謂詞未知的查詢處理。在查詢RDF數(shù)據(jù)時,系統(tǒng)基于啟發(fā)式規(guī)則簡單有效地生成查詢計劃。在執(zhí)行查詢計劃時,根據(jù)查詢類型采用不同的執(zhí)行策略,并利用并行執(zhí)行子系統(tǒng)提高連接查詢操作的效率。對于多變量的查詢計劃,采用二步執(zhí)行策略減少查詢過程中產(chǎn)生的中間結(jié)果,并動態(tài)地調(diào)整查詢計劃。 與目前流行RDF數(shù)據(jù)存儲系統(tǒng)RDF-3X進(jìn)行性能對比測試的結(jié)果表明,在存儲空間上比RDF-3X至少降低了40%,在查詢性能上比RDF-3X至少提升了3倍。實(shí)驗(yàn)進(jìn)一步表明,TripleBit所采用的查詢計劃生成方式和索引技術(shù)對查詢處理性能的提升有很大的幫助。
[Abstract]:Due to the advantages of flexible expression and convenient data exchange, the data volume of RDF(Resource Description Framework is increasing at an amazing speed.Traditional RDF data storage systems either use relational databases as the backend or store data locally, but these storage methods are faced with the problem of scalability when storing large-scale RDF data.When storing large-scale RDF data, it is necessary to reduce the storage space and speed up query processing.However, the proposed storage method is not compact enough and there is a large amount of redundant data, which results in a lot of time spent in the process of generating query plan and execution.The high-extended RDF data storage system (TripleBit) aims to provide an efficient storage and query scheme for large-scale RDF data.Based on the characteristics of RDF data, a bitmap matrix of RDF data table is obtained.In order to reduce the data footprint, a compression algorithm is designed for the features and functions of each data table.At the bottom of the storage system, the memory based storage method is used to reduce the I / O overhead of the system when storing and querying, and the data block storage method is adopted, which makes the storage management convenient and the storage structure compact.The query processing is accelerated.In order to improve the speed of RDF data search, two kinds of indexes are designed to accelerate the location of system data blocks and query processing with unknown predicates, respectively.When querying RDF data, the system generates query plan simply and effectively based on heuristic rules.When the query plan is executed, different execution strategies are adopted according to the query type, and the parallel execution subsystem is used to improve the efficiency of the join query operation.For multivariable query plan, two-step execution strategy is used to reduce the intermediate results and dynamically adjust the query plan.The results of performance comparison with RDF-3X, a popular RDF data storage system, show that the storage space is at least 40 less than that of RDF-3X, and the query performance is at least three times higher than that of RDF-3X.The experiment further shows that the query plan generation and index technology used by TripleBit can greatly improve the performance of query processing.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP333;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 葉育鑫;歐陽丹彤;;混合語義約簡和選擇估值優(yōu)化SPARQL[J];電子學(xué)報;2010年05期
2 王進(jìn)鵬;張亞非;苗壯;;SPARQL查詢的關(guān)系代數(shù)表示與轉(zhuǎn)換方法[J];計算機(jī)工程與應(yīng)用;2011年22期
3 杜小勇;王琰;呂彬;;語義Web數(shù)據(jù)管理研究進(jìn)展[J];軟件學(xué)報;2009年11期
本文編號:1703837
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1703837.html
最近更新
教材專著