基于結(jié)構(gòu)化索引的RDF數(shù)據(jù)存儲及查詢方法的研究與實現(xiàn)
發(fā)布時間:2018-04-20 22:24
本文選題:RDF + HBase; 參考:《北京交通大學(xué)》2013年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)和物聯(lián)網(wǎng)的發(fā)展,網(wǎng)絡(luò)中的數(shù)據(jù)量出現(xiàn)爆發(fā)式的增長,對數(shù)據(jù)共享與處理提出的新的要求,更多復(fù)雜的語義關(guān)系在大數(shù)據(jù)的條件下需要處理和應(yīng)用。大規(guī)模RDF數(shù)據(jù)的存儲、查詢,及有效支持?jǐn)?shù)據(jù)挖掘等的數(shù)據(jù)處理方法,對計算機、制造業(yè)、鐵路等多行業(yè)的數(shù)據(jù)處理具有重要的理論和應(yīng)用意義。本文針對鐵路傳感器應(yīng)用的需求,提出一種基于HBase的面向結(jié)構(gòu)化索引的RDF數(shù)據(jù)存儲及查詢方法。 首先,針對大規(guī)模數(shù)據(jù)的存儲要求,提出一種基于結(jié)構(gòu)的RDF數(shù)據(jù)索引方式,通過分析數(shù)據(jù)圖中節(jié)點的連接關(guān)系構(gòu)造索引圖,利用該索引對數(shù)據(jù)進行劃分,滿足同一結(jié)構(gòu)的數(shù)據(jù)集中存儲,以這種方法降低數(shù)據(jù)查詢時的消耗,加快查詢速度。 其次,提出了使用HBase來處理RDF數(shù)據(jù)存儲的方案,根據(jù)結(jié)構(gòu)化索引實現(xiàn)數(shù)據(jù)劃分,并利用“謂詞-主體-客體”的三元組方式實現(xiàn)HBase存儲結(jié)構(gòu),同時提出行鍵值編碼方法以解決RDF數(shù)據(jù)中的多值問題,有效減少目標(biāo)數(shù)據(jù)查詢的范圍,提高查詢效率。 再次,提出了基于結(jié)構(gòu)化索引及SPARQL語句重排的RDF數(shù)據(jù)查詢方法,根據(jù)查詢中不同語句間未知變量的綁定關(guān)系及執(zhí)行一條查詢語句所產(chǎn)生的消耗進行相關(guān)度的計算,以此為依據(jù)對SPARQL進行重排,重排后的語句通過結(jié)構(gòu)化索引及物理查詢兩層操作完成數(shù)據(jù)的查詢,查詢效率得到較好的提升。 最后,針對該鐵路傳感器應(yīng)用場景,對該系統(tǒng)的總體查詢效率進行了實驗驗證,較經(jīng)典的RDF數(shù)據(jù)存儲及檢索系統(tǒng)Sesame獲得了更好的查詢效率。圖29幅,表20張,參考文獻40篇。
[Abstract]:With the development of Internet and Internet of Things , the amount of data in the network increases exponentially , and the new requirements for data sharing and processing are required . More complex semantic relations need to be processed and applied under the condition of large data . The data processing method of large - scale RDF data is of great theoretical and practical significance for data processing in many industries , such as computer , manufacturing , railway and so on . In this paper , based on the demand of railway sensor application , a RDF data storage and query method based on HBase for structured index is proposed .
First , aiming at the storage requirement of large - scale data , a structure - based RDF data index method is proposed , and the index map is constructed by analyzing the connection relationship between nodes in the data graph .
Secondly , the scheme of using HBase to process RDF data storage is put forward . According to the structured index , the data partition is realized , and the HBase storage structure is realized by using the triple way of " predicate - body - object " , meanwhile , a row key value encoding method is proposed to solve the multi - value problem in RDF data , thus effectively reducing the range of the target data query and improving the query efficiency .
Thirdly , based on the structure index and the RDF data query method , based on the binding relationship between the unknown variables and the consumption of executing a query statement , according to the binding relationship between the unknown variables in the query and the consumption of executing a query statement , the query of the data is completed by the structured index and the physical query , so that the query efficiency is improved .
Finally , according to the railway sensor application scene , the overall query efficiency of the system is verified experimentally . Compared with the classical RDF data storage and retrieval system , the better query efficiency is obtained . There are 29 tables , 20 tables and 40 references .
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前1條
1 瞿裕忠,張劍鋒,陳崢,王叢剛;XML語言及相關(guān)技術(shù)綜述[J];計算機工程;2000年12期
,本文編號:1779721
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1779721.html
最近更新
教材專著