大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢技術(shù)研究

發(fā)布時(shí)間：2019-05-08 07:34

【摘要】：目前,語義萬維網(wǎng)被廣泛運(yùn)用于包括醫(yī)學(xué)、生物、地理信息服務(wù)等在內(nèi)的各個(gè)領(lǐng)域。但是隨著大數(shù)據(jù)時(shí)代的來臨和應(yīng)用系統(tǒng)規(guī)模的不斷擴(kuò)大,產(chǎn)生的語義數(shù)據(jù)也在以驚人的速度增長。傳統(tǒng)的以關(guān)系型數(shù)據(jù)庫為基礎(chǔ)的語義數(shù)據(jù)存儲(chǔ)管理技術(shù)和系統(tǒng)已無法有效存儲(chǔ)管理大規(guī)模急速增長的語義數(shù)據(jù),同時(shí)傳統(tǒng)的串行化語義查詢技術(shù)也難以適應(yīng)大規(guī)模語義數(shù)據(jù)查詢處理。在此背景下,通過并行計(jì)算技術(shù)解決大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢已成為學(xué)術(shù)界和工業(yè)界普遍關(guān)注的熱點(diǎn)研究問題。然而并行計(jì)算技術(shù)與應(yīng)用問題緊密相關(guān),且應(yīng)用問題本身具有不同的復(fù)雜性和多樣性,這使得大規(guī)模語義數(shù)據(jù)的處理具有很大的技術(shù)挑戰(zhàn),需要在存儲(chǔ)、查詢等方面都進(jìn)行深入探討和研究。針對上述問題,本文在對資源描述框架RDF (Resource Description Framework)和RDF數(shù)據(jù)查詢語言SPARQL (Simple Protocol and RDF Query Language)等相關(guān)技術(shù)分析的基礎(chǔ)上,利用基于工業(yè)標(biāo)準(zhǔn)OpenRDF Sesame語義數(shù)據(jù)處理框架,研究提出了一種基于HBase和Redis的大規(guī)模分布式語義數(shù)據(jù)存儲(chǔ)和查詢技術(shù)方法。該方法采用混合式索引構(gòu)建分層式的存儲(chǔ)架構(gòu)以提升語義數(shù)據(jù)查詢性能；在此基礎(chǔ)上,本文進(jìn)一步分析了SPAROL查詢引擎的處理過程,并通過構(gòu)建代價(jià)模型來對查詢模型做連接查詢優(yōu)化,利用查詢中間結(jié)果集來優(yōu)化查詢執(zhí)行策略以保證語義數(shù)據(jù)查詢的高效性；為了提高查詢引擎的可靠性和可用性,本文還研究探討了大規(guī)模語義數(shù)據(jù)存儲(chǔ)管理和查詢引擎中的容錯(cuò)性和可擴(kuò)展性設(shè)計(jì)。最后,基于所研究設(shè)計(jì)的存儲(chǔ)架構(gòu)和查詢優(yōu)化方案,本文設(shè)計(jì)實(shí)現(xiàn)了一個(gè)大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢原型系統(tǒng)。實(shí)驗(yàn)結(jié)果表明,所研究實(shí)現(xiàn)的大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢技術(shù)方法是有效的。本文的研究工作主要分為以下兩個(gè)部分：第一部分：研究現(xiàn)有的語義數(shù)據(jù)存儲(chǔ)技術(shù),設(shè)計(jì)大規(guī)模語義數(shù)據(jù)存儲(chǔ)模型,并基于存儲(chǔ)模型提出混合式的索引存儲(chǔ)方法以及分層式存儲(chǔ)架構(gòu),并提出存儲(chǔ)架構(gòu)的容錯(cuò)性和可擴(kuò)展性解決方案。第二部分：分析語義數(shù)據(jù)查詢引擎的執(zhí)行流程,在查詢模型優(yōu)化方面,本文提出一種基于選擇度估值的連接操作優(yōu)化算法；在查詢策略優(yōu)化方面,本文提出一種自適應(yīng)的批查詢方案。
[Abstract]:At present, semantic World wide Web is widely used in many fields, such as medicine, biology, geographic information service and so on. However, with the advent of big data era and the continuous expansion of application system scale, semantic data is growing at an astonishing rate. The traditional semantic data storage management technology and system based on relational database can no longer effectively store and manage the large-scale rapid growth of semantic data. At the same time, the traditional serialized semantic query technology is difficult to adapt to large-scale semantic data query processing. In this context, solving large-scale semantic data storage and query by parallel computing technology has become a hot research issue in academia and industry. However, the parallel computing technology is closely related to the application problem, and the application problem itself has different complexity and diversity, which makes the processing of large-scale semantic data have great technical challenges and needs to be stored. Inquiry and other aspects of in-depth discussion and research. In order to solve the above problems, based on the analysis of resource description framework RDF (Resource Description Framework) and RDF data query language SPARQL (Simple Protocol and RDF Query Language), this paper uses the semantic data processing framework based on industrial standard OpenRDF Sesame. In this paper, a large-scale distributed semantic data storage and query technique based on HBase and Redis is proposed. In this method, the hybrid index is used to construct a hierarchical storage architecture to improve the performance of semantic data query. On this basis, this paper further analyzes the processing process of SPAROL query engine, and optimizes the join query of the query model by constructing the cost model. Using the query intermediate result set to optimize the query execution strategy to ensure the high efficiency of semantic data query; In order to improve the reliability and availability of query engine, this paper also studies and discusses the fault tolerance and extensibility design of large-scale semantic data storage management and query engine. Finally, based on the storage architecture and query optimization scheme, a large-scale semantic data storage and query prototype system is designed and implemented in this paper. The experimental results show that the proposed approach to large-scale semantic data storage and query is effective. The research work of this paper is mainly divided into the following two parts: the first part: the research of the existing semantic data storage technology, the design of large-scale semantic data storage model, Based on the storage model, a hybrid index storage method and hierarchical storage architecture are proposed, and the fault tolerance and scalability solutions of the storage architecture are proposed. In the second part, the execution flow of semantic data query engine is analyzed. In the aspect of query model optimization, this paper proposes a join operation optimization algorithm based on selection degree estimation. In the aspect of query strategy optimization, this paper proposes an adaptive batch query scheme.
【學(xué)位授予單位】：南京大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2014
【分類號(hào)】：TP311.13;TP333

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 馮升華,黃利平,方劍,李建明;多企業(yè)聯(lián)合查詢技術(shù)的研究與實(shí)現(xiàn)[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2001年08期

2 伊莉娜;王培東;;基于“智能克隆”的移動(dòng)查詢技術(shù)[J];哈爾濱理工大學(xué)學(xué)報(bào);2008年06期

3 朱蓉;基于模糊理論的查詢技術(shù)研究[J];計(jì)算機(jī)應(yīng)用研究;2003年05期

4 廖湖聲,鄭玉明;多源空間數(shù)據(jù)庫查詢技術(shù)[J];北京工業(yè)大學(xué)學(xué)報(bào);2004年02期

5 黃希琛;無編碼通用詞庫的高倍邏輯壓縮和反向查詢技術(shù)原理[J];中文信息學(xué)報(bào);1994年02期

6 石靜;劉永山;;基于開放區(qū)域的定量方向關(guān)系查詢技術(shù)[J];計(jì)算機(jī)工程;2007年22期

7 王國華;;計(jì)算機(jī)圖形信息的查詢技術(shù)研究與實(shí)現(xiàn)[J];長沙航空職業(yè)技術(shù)學(xué)院學(xué)報(bào);2006年01期

8 熊劍平,賈惠波,王洪;電子檔案的因特網(wǎng)查詢技術(shù)[J];縮微技術(shù);1997年04期

9 李增祥;;數(shù)據(jù)庫SQL查詢技術(shù)的優(yōu)化策略[J];消費(fèi)導(dǎo)刊;2009年10期

10 許龍飛;;數(shù)據(jù)庫自然語言查詢技術(shù)研究[J];計(jì)算機(jī)科學(xué);1997年05期

相關(guān)會(huì)議論文前7條

1 李永光;王鏑;王國仁;馬宜菲;;基于塊排序索引的生物序列局部比對查詢技術(shù)(英文)[A];第二十二屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2005年

2 魏華;;Delphi應(yīng)用程序中查詢技術(shù)的實(shí)現(xiàn)[A];圖像仿真信息技術(shù)——第二屆聯(lián)合學(xué)術(shù)會(huì)議論文集[C];2002年

3 任詠林;秦勉;任偉林;于重重;;基于XML的查詢技術(shù)[A];第一屆全國Web信息系統(tǒng)及其應(yīng)用會(huì)議（WISA2004）論文集[C];2004年

4 張昱;吳年;;XML數(shù)據(jù)流的過濾與查詢技術(shù)[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2004年

5 胡皓;羅景青;;基于模糊理論的查詢技術(shù)研究[A];2006北京地區(qū)高校研究生學(xué)術(shù)交流會(huì)——通信與信息技術(shù)會(huì)議論文集（下）[C];2006年

6 王佳;楊樹強(qiáng);賈焰;;面向海量數(shù)據(jù)的并行UNION查詢技術(shù)研究與實(shí)現(xiàn)[A];2006年全國開放式分布與并行計(jì)算學(xué)術(shù)會(huì)議論文集（二）[C];2006年

7 張梅;;PB7.0通用任意字段查詢技術(shù)的實(shí)現(xiàn)[A];貴州省自然科學(xué)優(yōu)秀學(xué)術(shù)論文集[C];2005年

相關(guān)博士學(xué)位論文前2條

1 黃玉龍;基于GPU的查詢技術(shù)并行化研究[D];華南理工大學(xué);2013年

2 李先通;圖數(shù)據(jù)查詢技術(shù)的研究[D];哈爾濱工業(yè)大學(xué);2009年

相關(guān)碩士學(xué)位論文前10條

1 張建;大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢技術(shù)研究[D];南京大學(xué);2014年

2 石靜;基于開放區(qū)域的方向關(guān)系查詢技術(shù)研究[D];燕山大學(xué);2006年

3 唐兵兵;達(dá)夢數(shù)據(jù)倉庫中多維查詢技術(shù)的研究[D];華中科技大學(xué);2009年

4 曾锃;基于一階謂詞邏輯的代碼查詢技術(shù)[D];南京大學(xué);2011年

5 李華強(qiáng);本體存儲(chǔ)與查詢技術(shù)研究[D];北京郵電大學(xué);2007年

6 李軍;XML文檔查詢技術(shù)研究及在數(shù)字圖書館中的應(yīng)用[D];湖南師范大學(xué);2009年

7 岳友友;XML查詢技術(shù)研究[D];重慶大學(xué);2006年

8 黃峗煒;RDF-XML文檔的索引查詢技術(shù)研究與實(shí)現(xiàn)[D];解放軍信息工程大學(xué);2007年

9 伊莉娜;基于Agent的移動(dòng)查詢技術(shù)研究[D];哈爾濱理工大學(xué);2008年

10 任俊國;多數(shù)據(jù)源可控查詢技術(shù)的研究與應(yīng)用[D];山東科技大學(xué);2011年

，

本文編號(hào)：2471720

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2471720.html

上一篇：面向軟件定義數(shù)據(jù)中心的資源管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
下一篇：SaaS環(huán)境下基于服務(wù)質(zhì)量的資源分配算法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

大規(guī)模語義數(shù)據(jù)存儲(chǔ)和查詢技術(shù)研究