基于并行處理大數(shù)據(jù)圖查詢研究
發(fā)布時間:2018-08-02 12:39
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,我們逐漸進入一個數(shù)據(jù)為王的時代,不僅數(shù)據(jù)量變得十分巨大而且數(shù)據(jù)變得日益復(fù)雜,如何從這些多而雜的數(shù)據(jù)中查找出有用的數(shù)據(jù)已經(jīng)成為一個非常迫在眉睫需要優(yōu)化的問題。與此同時,在數(shù)據(jù)存儲方式上分布式云存儲已經(jīng)成為一種常用的解決方案,于是問題就轉(zhuǎn)變?yōu)榛诜植际酱鎯Φ臄?shù)據(jù)查詢。對于大規(guī)模分布式存儲的數(shù)據(jù)進行按需查詢,一種常用的有力的工具是圖,圖數(shù)據(jù)結(jié)構(gòu)在具有引用關(guān)系的數(shù)據(jù)上具有很強的優(yōu)勢,因此針對大數(shù)據(jù)的查詢就可以轉(zhuǎn)化為圖查詢算法問題。在圖查詢算法中,有一大類問題就是在數(shù)據(jù)圖中查詢給定兩個節(jié)點,回答這兩個節(jié)點是不是可達(dá)的,也就是圖的可達(dá)查詢問題。在實際應(yīng)用中,圖的可達(dá)查詢問題應(yīng)用范圍廣泛,有很重要的研究意義。傳統(tǒng)的針對圖的可達(dá)查詢問題的解決方法,要么限定在基于樹的圖查詢,要么有的是針對特定的圖數(shù)據(jù)庫系統(tǒng),這些算法大多數(shù)普遍采用索引的方法,但是在處理分布式大數(shù)據(jù)圖的時候在準(zhǔn)確性和性能上有很大的缺陷。針對這些問題,本文提出了基于Hadoop分布式計算平臺下的MapReduce編程模型的并行可達(dá)圖查詢算法,并提出了一個基于六度可達(dá)查詢的索引用來解決局部查詢上的可達(dá)查詢問題。通過這些算法,致力于優(yōu)化分布式大圖的可達(dá)查詢問題,并采用多個實際應(yīng)用中的數(shù)據(jù)集,從多個指標(biāo)和角度,進行了多次實驗評估,驗證了算法的準(zhǔn)確性和高效性。
[Abstract]:With the rapid development of the Internet, we have gradually entered an era of data king, not only the amount of data has become very large, but the data has become increasingly complex. How to find useful data from these data has become a very urgent problem to be optimized. At the same time, distributed cloud storage has become a common solution in data storage, so the problem is transformed into data query based on distributed storage. For large-scale distributed data on demand query, one of the commonly used powerful tool is graph, graph data structure has a strong advantage in referencing data. Therefore, the query for big data can be transformed into graph query algorithm. In the graph query algorithm, there is a kind of problem, which is to query the given two nodes in the data graph and answer whether the two nodes are reachable or not, that is, the reachable query problem of the graph. In practical application, the problem of reachability query of graph has a wide range of applications, which is of great significance. Traditional solutions to the problem of reachable query for graphs are either limited to tree based graph queries or specific graph database systems. Most of these algorithms generally use index methods. However, there are many shortcomings in the accuracy and performance of distributed big data diagrams. In order to solve these problems, a parallel Datuk query algorithm based on MapReduce programming model based on Hadoop distributed computing platform is proposed, and an index based on six-degree reachable query is proposed to solve the problem of local reachable query. Through these algorithms, we make great efforts to optimize the reachable query problem of distributed large graph, and use the data sets in many practical applications to carry out many experiments from many indexes and angles to verify the accuracy and efficiency of the algorithm.
【學(xué)位授予單位】:華北電力大學(xué)(北京)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13;O157.5
本文編號:2159464
[Abstract]:With the rapid development of the Internet, we have gradually entered an era of data king, not only the amount of data has become very large, but the data has become increasingly complex. How to find useful data from these data has become a very urgent problem to be optimized. At the same time, distributed cloud storage has become a common solution in data storage, so the problem is transformed into data query based on distributed storage. For large-scale distributed data on demand query, one of the commonly used powerful tool is graph, graph data structure has a strong advantage in referencing data. Therefore, the query for big data can be transformed into graph query algorithm. In the graph query algorithm, there is a kind of problem, which is to query the given two nodes in the data graph and answer whether the two nodes are reachable or not, that is, the reachable query problem of the graph. In practical application, the problem of reachability query of graph has a wide range of applications, which is of great significance. Traditional solutions to the problem of reachable query for graphs are either limited to tree based graph queries or specific graph database systems. Most of these algorithms generally use index methods. However, there are many shortcomings in the accuracy and performance of distributed big data diagrams. In order to solve these problems, a parallel Datuk query algorithm based on MapReduce programming model based on Hadoop distributed computing platform is proposed, and an index based on six-degree reachable query is proposed to solve the problem of local reachable query. Through these algorithms, we make great efforts to optimize the reachable query problem of distributed large graph, and use the data sets in many practical applications to carry out many experiments from many indexes and angles to verify the accuracy and efficiency of the algorithm.
【學(xué)位授予單位】:華北電力大學(xué)(北京)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13;O157.5
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 樊文飛;懷進鵬;;Querying Big Data: Bridging Theory and Practice[J];Journal of Computer Science & Technology;2014年05期
2 吳廣君;王樹鵬;陳明;李超;;海量結(jié)構(gòu)化數(shù)據(jù)存儲檢索系統(tǒng)[J];計算機研究與發(fā)展;2012年S1期
,本文編號:2159464
本文鏈接:http://sikaile.net/kejilunwen/yysx/2159464.html
最近更新
教材專著