當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

高擴(kuò)展的RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)研究

發(fā)布時(shí)間：2018-04-03 05:26

本文選題：資源描述框架　切入點(diǎn)：語(yǔ)義數(shù)據(jù)表達(dá)　出處：《華中科技大學(xué)》2012年碩士論文

【摘要】：由于RDF（Resource Description Framework）數(shù)據(jù)具有表達(dá)靈活，數(shù)據(jù)交換方便等優(yōu)點(diǎn)，其數(shù)據(jù)量在以驚人的速度增長(zhǎng)。傳統(tǒng)的RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)或以關(guān)系數(shù)據(jù)庫(kù)為存儲(chǔ)后端，或以本地存儲(chǔ)方式存儲(chǔ)數(shù)據(jù)，但是這些存儲(chǔ)方式在存儲(chǔ)大規(guī)模的RDF數(shù)據(jù)時(shí)都面臨著擴(kuò)展性問題。在存儲(chǔ)大規(guī)模RDF數(shù)據(jù)時(shí)，需要降低數(shù)據(jù)的存儲(chǔ)空間并加速查詢處理。但目前提出的存儲(chǔ)方式不夠緊湊且存在大量的冗余數(shù)據(jù)，導(dǎo)致在生成查詢計(jì)劃和執(zhí)行過程中消耗了大量的時(shí)間。高擴(kuò)展的RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)TripleBit旨在為大規(guī)模RDF數(shù)據(jù)提供一個(gè)高效的存儲(chǔ)和查詢方案。利用RDF數(shù)據(jù)特點(diǎn)，系統(tǒng)將RDF數(shù)據(jù)表達(dá)成一個(gè)位圖矩陣。為了降低數(shù)據(jù)占用空間，在數(shù)據(jù)存儲(chǔ)時(shí)針對(duì)各個(gè)數(shù)據(jù)表特征和作用設(shè)計(jì)了相應(yīng)的壓縮算法。在底層具體存儲(chǔ)時(shí)采用了基于內(nèi)存的存儲(chǔ)方式降低了系統(tǒng)在存儲(chǔ)和查詢時(shí)的I/O開銷，，并采用了數(shù)據(jù)分塊的存儲(chǔ)方法，既使得存儲(chǔ)管理方便又使得存儲(chǔ)結(jié)構(gòu)緊湊，加速了查詢處理。為了提高RDF數(shù)據(jù)查找的速度，系統(tǒng)設(shè)計(jì)了兩類索引分別加速系統(tǒng)數(shù)據(jù)塊的定位和謂詞未知的查詢處理。在查詢RDF數(shù)據(jù)時(shí)，系統(tǒng)基于啟發(fā)式規(guī)則簡(jiǎn)單有效地生成查詢計(jì)劃。在執(zhí)行查詢計(jì)劃時(shí)，根據(jù)查詢類型采用不同的執(zhí)行策略，并利用并行執(zhí)行子系統(tǒng)提高連接查詢操作的效率。對(duì)于多變量的查詢計(jì)劃，采用二步執(zhí)行策略減少查詢過程中產(chǎn)生的中間結(jié)果，并動(dòng)態(tài)地調(diào)整查詢計(jì)劃。與目前流行RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)RDF-3X進(jìn)行性能對(duì)比測(cè)試的結(jié)果表明，在存儲(chǔ)空間上比RDF-3X至少降低了40%，在查詢性能上比RDF-3X至少提升了3倍。實(shí)驗(yàn)進(jìn)一步表明，TripleBit所采用的查詢計(jì)劃生成方式和索引技術(shù)對(duì)查詢處理性能的提升有很大的幫助。
[Abstract]:Due to the advantages of flexible expression and convenient data exchange, the data volume of RDF(Resource Description Framework is increasing at an amazing speed.Traditional RDF data storage systems either use relational databases as the backend or store data locally, but these storage methods are faced with the problem of scalability when storing large-scale RDF data.When storing large-scale RDF data, it is necessary to reduce the storage space and speed up query processing.However, the proposed storage method is not compact enough and there is a large amount of redundant data, which results in a lot of time spent in the process of generating query plan and execution.The high-extended RDF data storage system (TripleBit) aims to provide an efficient storage and query scheme for large-scale RDF data.Based on the characteristics of RDF data, a bitmap matrix of RDF data table is obtained.In order to reduce the data footprint, a compression algorithm is designed for the features and functions of each data table.At the bottom of the storage system, the memory based storage method is used to reduce the I / O overhead of the system when storing and querying, and the data block storage method is adopted, which makes the storage management convenient and the storage structure compact.The query processing is accelerated.In order to improve the speed of RDF data search, two kinds of indexes are designed to accelerate the location of system data blocks and query processing with unknown predicates, respectively.When querying RDF data, the system generates query plan simply and effectively based on heuristic rules.When the query plan is executed, different execution strategies are adopted according to the query type, and the parallel execution subsystem is used to improve the efficiency of the join query operation.For multivariable query plan, two-step execution strategy is used to reduce the intermediate results and dynamically adjust the query plan.The results of performance comparison with RDF-3X, a popular RDF data storage system, show that the storage space is at least 40 less than that of RDF-3X, and the query performance is at least three times higher than that of RDF-3X.The experiment further shows that the query plan generation and index technology used by TripleBit can greatly improve the performance of query processing.
【學(xué)位授予單位】：華中科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP333;TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前3條

1 葉育鑫;歐陽(yáng)丹彤;;混合語(yǔ)義約簡(jiǎn)和選擇估值優(yōu)化SPARQL[J];電子學(xué)報(bào);2010年05期

2 王進(jìn)鵬;張亞非;苗壯;;SPARQL查詢的關(guān)系代數(shù)表示與轉(zhuǎn)換方法[J];計(jì)算機(jī)工程與應(yīng)用;2011年22期

3 杜小勇;王琰;呂彬;;語(yǔ)義Web數(shù)據(jù)管理研究進(jìn)展[J];軟件學(xué)報(bào);2009年11期

本文編號(hào)：1703837

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1703837.html

上一篇：基于數(shù)據(jù)分析的智能手表手勢(shì)直覺化交互研究
下一篇：分布式量子計(jì)數(shù)算法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

高擴(kuò)展的RDF數(shù)據(jù)存儲(chǔ)系統(tǒng)研究