大規(guī)模多維網絡分析模型的研究與實現(xiàn)
發(fā)布時間:2018-10-24 11:40
【摘要】:隨著信息技術的發(fā)展,存儲設備成本的降低,企業(yè)都根據(jù)自己的業(yè)務需求建立了大量的數(shù)據(jù)庫并存儲了海量的數(shù)據(jù)。如何利用這些數(shù)據(jù)為業(yè)務決策提供指引與建議是企業(yè)決策分析人員需要解決的一個難題。聯(lián)機分析處理(OLAP)被公認為是一個有效的解決方案。OLAP能夠高效快速地對海量數(shù)據(jù)進行多維度、跨粒度的分析并提供決策支持。經過二十多年的研究與發(fā)展,OLAP技術已經相對成熟規(guī)范,很多商用的數(shù)據(jù)庫以及數(shù)據(jù)倉庫系統(tǒng)都有OLAP功能的實現(xiàn)。近些年來,社交網絡、生物信息、多源信息融合等新興領域高速發(fā)展,在現(xiàn)實應用中涌現(xiàn)出大量的多維異質網絡,網絡的規(guī)模也在不斷增大。傳統(tǒng)OLAP分析的數(shù)據(jù)是以事實表與維表組織的,事實之間沒有關聯(lián)。使用傳統(tǒng)的OLAP技術無法有效的對多維網絡進行分析。面對這一問題,Graph OLAP技術逐漸發(fā)展起來,這一技術相比于傳統(tǒng)的OLAP技術,改進了信息模型,使用圖立方體代替數(shù)據(jù)立方體,支持網絡數(shù)據(jù)的多維多角度分析。但是Graph OLAP的研究目前仍還處于起步階段,模型分析能力有限,大多的模型不支持對多維異質網絡以及海量數(shù)據(jù)進行有效和高效的分析。本文針對現(xiàn)有Graph OLAP模型的不足,提出了新的分析模型,支持大規(guī)模多維異質網絡的多維度分析,本文的主要研究內容如下:1.設計了新型的多維異質網絡信息模型,定義了異質網絡中的二元關系元路徑,n元關系元路徑,并對這些元路徑的關系進行了研究,作為指導網絡聚集的新方式。2.設計了 TSMH Graph Cube,將傳統(tǒng)的圖立方體擴展為實體超立方體和維度立方體這樣的兩階段立方體。在立方體模型的基礎上,賦予了傳統(tǒng)操作新的語義,并提出了更多的Graph OLAP操作,使得網絡分析更加多樣。3.對實體超立方體,本文提出了并行化的聚集算法并給出了物化策略。對維度立方體,本文對節(jié)點以及維度屬性進行編碼,設計了節(jié)點的編碼算法,使得節(jié)點做維度OLAP操作時無需進行實體表與維度表的連接操作,大大提高了維度OLAP操作的效率。4.為支持海量的數(shù)據(jù)規(guī)模,模型的Graph OLAP操作算法使用并行計算框架實現(xiàn)。通過對大規(guī)模真實以及模擬數(shù)據(jù)的實驗,驗證了模型對大規(guī)模多維異質網絡能夠進行有效和高效的分析。
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前7條
1 王會舉;覃雄派;王珊;張延松;李芙蓉;;面向大規(guī)模機群的可擴展OLAP查詢技術[J];計算機學報;2015年01期
2 陳湘濤;丁平尖;王晶;;異構信息網中基于元路徑的動態(tài)相似性搜索[J];計算機應用;2014年09期
3 黃立威;李德毅;馬于濤;鄭思儀;張海粟;付鷹;;一種基于元路徑的異質信息網絡鏈路預測模型[J];計算機學報;2014年04期
4 古曉艷;王偉平;孟丹;楊秀峰;周江;;高效支持多維網絡OLAP的數(shù)據(jù)立方體模型CI-DCG[J];高技術通訊;2013年10期
5 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期
6 王珊;王會舉;覃雄派;周p,
本文編號:2291299
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2291299.html
最近更新
教材專著