一種Spark環(huán)境下的高效率大規(guī)模圖數(shù)據(jù)處理機(jī)制
發(fā)布時間:2018-11-28 13:52
【摘要】:針對現(xiàn)有的圖處理和圖管理框架存在的效率低下以及數(shù)據(jù)存儲結(jié)構(gòu)等問題,提出了一種適合大規(guī)模圖數(shù)據(jù)的處理機(jī)制。首先分析了目前的一些圖處理模型以及圖存儲框架的優(yōu)勢與存在的不足。其次,通過對分布式計算的特性分析采取適合大規(guī)模圖的分割算法、數(shù)據(jù)抽取的優(yōu)化以及緩存、計算層與持久層結(jié)合機(jī)制三方面來設(shè)計圖數(shù)據(jù)處理框架。最后通過PageRank和SSSP算法設(shè)計實驗,與MapReduce框架和采用HDFS作持久層的Spark框架進(jìn)行性能對比。實驗證明提出的框架要比MapReduce框架快90倍,比采用HDFS作持久層的Spark框架快2倍,能夠滿足高效率圖數(shù)據(jù)處理的應(yīng)用前景。
[Abstract]:Aiming at the inefficiency of the existing graph processing and graph management framework and the data storage structure, a processing mechanism suitable for large-scale graph data is proposed. Firstly, the advantages and disadvantages of some current graph processing models and graph storage framework are analyzed. Secondly, by analyzing the characteristics of distributed computing, we design the graph data processing framework from three aspects: the segmentation algorithm suitable for large-scale graph, the optimization of data extraction and the mechanism of cache, the combination of computing layer and persistence layer. Finally, the performance of PageRank and SSSP algorithm is compared with that of MapReduce framework and Spark framework with HDFS as persistence layer. Experiments show that the proposed framework is 90 times faster than the MapReduce framework and 2 times faster than the Spark framework using HDFS as the persistence layer. It can meet the application prospect of high efficiency graph data processing.
【作者單位】: 云南大學(xué)信息學(xué)院;
【基金】:國家自然科學(xué)基金資助項目(61170222)
【分類號】:TP311.13
,
本文編號:2363025
[Abstract]:Aiming at the inefficiency of the existing graph processing and graph management framework and the data storage structure, a processing mechanism suitable for large-scale graph data is proposed. Firstly, the advantages and disadvantages of some current graph processing models and graph storage framework are analyzed. Secondly, by analyzing the characteristics of distributed computing, we design the graph data processing framework from three aspects: the segmentation algorithm suitable for large-scale graph, the optimization of data extraction and the mechanism of cache, the combination of computing layer and persistence layer. Finally, the performance of PageRank and SSSP algorithm is compared with that of MapReduce framework and Spark framework with HDFS as persistence layer. Experiments show that the proposed framework is 90 times faster than the MapReduce framework and 2 times faster than the Spark framework using HDFS as the persistence layer. It can meet the application prospect of high efficiency graph data processing.
【作者單位】: 云南大學(xué)信息學(xué)院;
【基金】:國家自然科學(xué)基金資助項目(61170222)
【分類號】:TP311.13
,
本文編號:2363025
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2363025.html
最近更新
教材專著