基于壓縮位圖索引的RDF數(shù)據(jù)存儲與管理
發(fā)布時間:2018-06-17 23:27
本文選題:RDF + 數(shù)據(jù)存儲; 參考:《北京交通大學》2017年碩士論文
【摘要】:隨著資源描述框架(Resource Description Framework,RDF)在各個領(lǐng)域的廣泛應(yīng)用,如何對海量RDF數(shù)據(jù)的存儲與管理成為近年來的研究熱點,F(xiàn)有的RDF數(shù)據(jù)管理系統(tǒng)大都采用傳統(tǒng)的關(guān)系型數(shù)據(jù)庫來存儲數(shù)據(jù),這種方式已難以高效地管理海量數(shù)據(jù)。如何設(shè)計一種高性能、可擴展為分布式的RDF數(shù)據(jù)存儲和管理系統(tǒng)具有重要意義。本文設(shè)計了一種基于位圖索引的RDF數(shù)據(jù)存儲方案,并實現(xiàn)了基于該存儲方案的RDF管理系統(tǒng),最后通過系統(tǒng)測試驗證了該方案的可行性與有效性。本文研究工作主要包括以下幾個方面。(1)總結(jié)了現(xiàn)有的RDF數(shù)據(jù)存儲方案。分析了當前主流的數(shù)據(jù)存儲技術(shù)及RDF數(shù)據(jù)存儲模型的優(yōu)缺點,并對其進行了簡單的分析與總結(jié)。(2)提出了一種基于位圖索引的高擴展性底層存儲方案。該方案在持久層將RDF數(shù)據(jù)文件分塊進行順序存儲,實現(xiàn)了系統(tǒng)的可擴展性;同時為RDF關(guān)鍵詞構(gòu)建基于壓縮位圖的查詢索引,降低了運行時內(nèi)存資源消耗。(3)設(shè)計了基于本方案的數(shù)據(jù)查詢算法。該算法能夠充分利用位圖索引邏輯計算的性能優(yōu)勢,保證了高效的查詢效率。(4)實現(xiàn)了基于本方案的RDF數(shù)據(jù)存儲和查詢系統(tǒng)fishdb,并采用測試數(shù)據(jù)集在單機偽分布式系統(tǒng)環(huán)境下對該系統(tǒng)進行了性能測試。與開源RDF管理系統(tǒng)Google Cayley的相比,fishdb能夠以較小的內(nèi)存資源消耗為代價換取較高的查詢性能提升,驗證了本方案的可行性和有效性。
[Abstract]:With the wide application of Resource description Framework (RDF) in various fields, how to store and manage massive RDF data has become a hot topic in recent years. Most of the existing RDF data management systems use traditional relational databases to store data, which is difficult to manage mass data efficiently. How to design a high performance and extensible RDF data storage and management system is of great significance. In this paper, a RDF data storage scheme based on bitmap index is designed, and the RDF management system based on this storage scheme is implemented. Finally, the feasibility and effectiveness of the scheme are verified by system test. The main work of this paper includes the following aspects: 1) summarize the existing RDF data storage scheme. This paper analyzes the advantages and disadvantages of the current mainstream data storage technology and RDF data storage model, and gives a simple analysis and summary of the RDF data storage model. In the persistence layer, the RDF data file is stored sequentially, and the system scalability is realized. At the same time, the query index based on compressed bitmap is constructed for the RDF keyword. The data query algorithm based on this scheme is designed. This algorithm can make full use of the performance advantage of bitmap index logic computing. The RDF data storage and query system fishdbbased on this scheme is implemented, and the performance of the system is tested by using the test data set in the single machine pseudo-distributed system environment. Compared with the open source RDF management system Google Cayley, fishdb can improve the query performance at the cost of less memory resource consumption, which verifies the feasibility and effectiveness of this scheme.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP333;TP315
【參考文獻】
相關(guān)碩士學位論文 前2條
1 朱敏;基于HBase的RDF數(shù)據(jù)存儲與查詢研究[D];南京大學;2013年
2 金強;基于HBase的RDF存儲系統(tǒng)的研究與設(shè)計[D];浙江大學;2011年
,本文編號:2032927
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2032927.html
最近更新
教材專著