一體化信息基礎設施中面向MapReduce的遞歸層次結構數(shù)據(jù)中心網(wǎng)絡研究
發(fā)布時間:2018-05-30 21:26
本文選題:軍事信息系統(tǒng) + 一體化信息基礎設施。 參考:《國防科學技術大學》2012年博士論文
【摘要】:從底層的硬件架構到上層的數(shù)據(jù)處理方式,數(shù)據(jù)中心網(wǎng)絡和MapReduce共同構成數(shù)據(jù)中心與云計算的核心體系,同時也成為一體化信息基礎設施快速處理大量軍情信息和數(shù)據(jù)的關鍵技術,是未來網(wǎng)絡中心戰(zhàn)中奪取制信息權的必備條件。近年來,各種新的服務需求的不斷涌現(xiàn)對數(shù)據(jù)中心網(wǎng)絡提出了更高的結構性要求。針對這些要求,研究者們設計了一些新的數(shù)據(jù)中心網(wǎng)絡結構。其中遞歸層次結構相對其它結構,具有更強的自組織性、更高的可靠性、更好的可拓展性等優(yōu)點,在軍用和民用領域都具有重要研究價值和廣泛應用前景。但研究者們僅僅從結構本身來考慮如何提高網(wǎng)絡性能,卻忽略了數(shù)據(jù)中心網(wǎng)絡設計的實用性要求,特別是與MapReduce的數(shù)據(jù)處理機制相適應的問題。本文針對一體化信息基礎設施中遞歸層次結構數(shù)據(jù)中心網(wǎng)絡與MapReduce的匹配這一議題展開研究。 主要研究工作和創(chuàng)新點如下: 1)提出了遞歸層次結構數(shù)據(jù)中心網(wǎng)絡的可靠性分析方法 出于軍事需求考慮,提出了分析判斷遞歸層次結構數(shù)據(jù)中心網(wǎng)絡可靠性的系統(tǒng)方法。從拓撲設計的角度分析了遞歸層次結構數(shù)據(jù)中心網(wǎng)絡可靠性的評價指標,包括連通性、聚合性、以及敏感性;谶f歸層次結構數(shù)據(jù)中心網(wǎng)絡的形式化描述,針對每個可靠性評價指標,研究了具體的量化評價分析方法。利用DCell、FiConn和BCube這三種目前最為典型的遞歸層次結構為案例,檢驗了方法的可行性和有效性。通過案例分析發(fā)現(xiàn),雖然FiConn的敏感性最好,但其聚合性和連通性最差,綜合三個評價指標的分析結果,得出BCube的可靠性最高。 2)提出了MapReduce程序設計的合理性分析方法 基于對象Petri網(wǎng),提出了一種全面分析驗證MapReduce程序設計合理性的系統(tǒng)方法。總結出分析MapReduce程序設計合理性的具體目的,并基于這些目的研究給出MapReduce程序的合理性指標,包括具有邏輯上可執(zhí)行的工作流程、不存在Straggler和Map沖突、具有合理的運行時間,以此判斷MapReduce程序是否存在設計不合理問題。由于對象Petri網(wǎng)能夠很好地描述復雜MapReduce程序的內在關系,準確無誤的模擬MapReduce程序中各個步驟的執(zhí)行情況,并且在模擬過程中不需要人工干預,因此利用對象Petri網(wǎng)模擬MapReduce的數(shù)據(jù)處理步驟。針對各個合理性指標采取不同的方法分析對象Petri網(wǎng)的運行過程和結果判斷MapReduce程序是否存在設計不合理的問題,以達到不通過在數(shù)據(jù)中心網(wǎng)絡上運行MapReduce程序就能驗證其合理性的目的。通過針對每條合理性指標的實驗分析和結果,,證明了方法的有效性。 3)設計了一種支持MapReduce的遞歸層次結構 基于BCube和Fat-tree結構,設計了一種支持MapReduce的遞歸層次結構——Hyper-Fat-tree Network(HFN)。HFN的構建方法基于BCube的遞歸規(guī)律,即以一個低層的網(wǎng)絡拓撲作為一個遞歸單元,多個這樣的遞歸單元按照超立方體(hypercube)的節(jié)點連接關系構成一個高一層的網(wǎng)絡拓撲。但不同于BCube的是,HFN的最小遞歸單元采用類似Fat-tree的冗余構造,并依據(jù)MapReduce的執(zhí)行控制過程,明確了主服務器和工作服務器的相對位置和連接關系,以適應MapReduce的數(shù)據(jù)處理機制和提高利用MapReduce進行分布式數(shù)據(jù)處理的可靠性。由于結合了超立方體和Fat-tree的優(yōu)點,HFN具有連通性高、直徑小、可靠性好的特點。HFN還具有較好的可拓展性,HFN可連接的服務器數(shù)量是BCube中服務器數(shù)量的數(shù)倍,有效滿足了一體化信息基礎設施建設發(fā)展對數(shù)據(jù)中心網(wǎng)絡服務器數(shù)量不斷增長的要求。 4)提出了在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上組織維護數(shù)據(jù)文件的方法 基于分布式哈希表的基本原理,提出了一種在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上組織維護數(shù)據(jù)文件的方法。針對遞歸層次結構中服務器的互聯(lián)方式,確定了服務器在組織維護數(shù)據(jù)文件過程中的作用。利用分布式哈希表的基本原理,研究了在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上實現(xiàn)數(shù)據(jù)存儲、讀取和維護的方法。介紹了這些方法中涉及的分布式哈希表的鍵表結構,以及基于此結構進行數(shù)據(jù)文件維護操作的路由方法。針對服務器故障,給出了在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上組織和維護數(shù)據(jù)文件的容錯方法。通過實驗分析比較了在HFN和BCube上進行數(shù)據(jù)操作的平均路徑長度和考慮節(jié)點故障時數(shù)據(jù)操作的成功率,證明了此方法能夠將各種數(shù)據(jù)操作請求信息迅速有效地發(fā)送給相應的目的服務器,利用此方法在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上組織和維護數(shù)據(jù)文件是可行且有效的。 5)提出了在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上運行MapReduce的方法 基于分布式哈希表的基本原理,提出了一種在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上運行MapReduce的方法。針對MapReduce的基本數(shù)據(jù)處理機制,研究了在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上如何區(qū)分主服務器和工作服務器,以及分配Map和Reduce任務、傳輸中間數(shù)據(jù)的方法。介紹了這些方法中涉及的分布式哈希表的鍵表結構,以及基于此結構進行任務分配和中間數(shù)據(jù)傳輸?shù)穆酚煞椒。研究了MapReduce在一般遞歸層次結構數(shù)據(jù)中心網(wǎng)絡上的容錯方法,包括容錯路由以及針對服務器故障的容錯方法。通過實驗分析比較了HFN與BCube在利用此方法運行MapReduce時的網(wǎng)絡性能,包括負載平衡、吞吐量和帶寬,證明了此方法能夠將整個數(shù)據(jù)中心網(wǎng)絡的負載平均分配給各臺服務器,并且在節(jié)點故障率較高的情況下依然能夠滿足帶寬需求量大的MapReduce應用和服務。
[Abstract]:From the underlying hardware architecture to the data processing on the upper level, the data center network and the MapReduce constitute the core of the data center and cloud computing. At the same time, it is also the key technology for the integration of information infrastructure to quickly deal with a large number of military information and data. It is a necessary condition for the acquisition of information right in the future network center war. Over the years, the growing demand for new services has made a higher structural requirement for the data center network. In response to these requirements, researchers have designed some new data center network structures, in which the recursive hierarchy is more self-organized, more reliable, and more expansibility, and so on. It has important research value and wide application prospects in both military and civil fields. But researchers only consider how to improve the network performance from the structure itself, but ignore the practical requirements of the data center network design, especially the problem that adapts to the data processing mechanism of MapReduce. This paper aims at the integrated information infrastructure. The issue of the matching between the recursive hierarchical data center network and MapReduce is studied.
The main research work and innovation are as follows:
1) the reliability analysis method of recursive hierarchical data center network is proposed.
A systematic method of analyzing and judging the reliability of the recurrent hierarchical structure data center network is proposed for the consideration of military needs. The evaluation indexes of the reliability of the recursive hierarchical structure data center network are analyzed from the point of view of topology design, including connectivity, aggregation, and sensitivity. The feasibility and effectiveness of the three most typical recursive hierarchical structures of DCell, FiConn and BCube are examined. The case analysis shows that although the sensitivity of FiConn is the best, its aggregation and connectivity is the worst, Combining the analysis results of three evaluation indexes, it is concluded that the reliability of BCube is the highest.
2) put forward the rationality analysis method of MapReduce programming.
Based on object Petri net, this paper presents a systematic method to comprehensively analyze and verify the rationality of MapReduce program design, and summarizes the specific purpose of analyzing the rationality of MapReduce program design. Based on these purposes, the rationality index of the MapReduce program is given, including the logically executable workflow, and there is no Straggler and Map The conflict, with a reasonable running time, can judge whether the MapReduce program has the problem of unreasonable design. Because the object Petri network can describe the internal relationship of the complex MapReduce program well, accurately simulate the execution of each step in the MapReduce program, and do not need manual intervention in the simulation process. The object Petri network simulates the data processing steps of the MapReduce. According to each reasonable index, it takes different methods to analyze the operation process and the result of the object Petri net to judge whether the MapReduce program has the unreasonable design problem, so as to achieve the purpose of verifying its rationality without running the MapReduce program on the data center network. The effectiveness and effectiveness of the method are proved by analyzing the results of each reasonable index.
3) a recursive hierarchical structure supporting MapReduce is designed.
Based on the structure of BCube and Fat-tree, a recursive hierarchical structure supporting MapReduce, Hyper-Fat-tree Network (HFN).HFN, is designed based on the recursive law of BCube, that is, a low-level network topology is used as a recursive unit, and many such recursion units are connected by the nodes of the hypercube (hypercube). It is a high level network topology. But unlike BCube, the minimum recursive unit of HFN uses redundant constructs similar to Fat-tree. According to the execution control process of MapReduce, the relative position and connection relationship between the main server and the work server is defined to adapt to the MapReduce data processing mechanism and improve the use of MapReduce. The reliability of distributed data processing. Because of the advantages of hypercube and Fat-tree, HFN has high connectivity, small diameter, and good reliability,.HFN also has good expansibility. The number of HFN connected servers is several times the number of servers in BCube, and it is full of integrated information infrastructure development to data. The demand for the growing number of central network servers.
4) a method of organizing and maintaining data files on a general recursive hierarchical data center network is proposed.
Based on the basic principle of distributed hash table, a method of organizing and maintaining data files on a general recursive hierarchical structure data center network is proposed. The application of the server in the process of organizing and maintaining data files is determined. The basic principle of the distributed hash table is studied. The method of data storage, reading and maintenance on the general recursive hierarchical data center network is introduced. The key table structure of the distributed hash table involved in these methods, and the routing method based on this structure for data file maintenance operation are introduced. The general recursive hierarchical structure data center is given for the server failure. The fault tolerance method of organizing and maintaining data files on the network. The average path length of data operation on HFN and BCube and the success rate of data operation considering the node failure are compared by experiment analysis. It is proved that this method can send various data operation request information to the corresponding destination server quickly and effectively, and use this method to use this method. It is feasible and effective to organize and maintain data files on general recursive hierarchical data center networks.
5) a method of running MapReduce on general recursive hierarchical data center network is proposed.
Based on the basic principle of distributed hash table, a method of running MapReduce on a general recursive hierarchical data center network is proposed. Based on the basic data processing mechanism of MapReduce, the paper studies how to distinguish between the master server and the work server on the general recursive hierarchical data center network and the allocation of Map and Reduce. The method of transferring intermediate data. This paper introduces the key table structure of distributed hash table involved in these methods, and the routing method based on this structure for task assignment and intermediate data transmission. It studies the fault tolerant methods of MapReduce in the general recursive hierarchical data center network, including fault-tolerant routing and for the server. Through experimental analysis, the network performance of HFN and BCube when using this method to run MapReduce, including load balance, throughput and bandwidth, proves that this method can allocate the load of the whole data center network to each server, and can still be satisfied when the node failure rate is high. MapReduce applications and services with large bandwidth requirements.
【學位授予單位】:國防科學技術大學
【學位級別】:博士
【學位授予年份】:2012
【分類號】:TP308
【參考文獻】
相關期刊論文 前4條
1 李慧波;王源;;一體化信息基礎設施環(huán)境下的WebGIS研究[J];兵工自動化;2011年02期
2 劉俊先,羅雪山;對象Petri網(wǎng)及其在C~4ISR系統(tǒng)仿真中的應用[J];計算機仿真;2003年03期
3 張路青;;海上一體化信息基礎設施總體技術框架研究[J];艦船電子工程;2010年03期
4 張培珍;楊根源;平殿發(fā);武志東;王煥章;;基于GIG應用的服務質量體系結構研究[J];計算機測量與控制;2010年04期
本文編號:1956872
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1956872.html
最近更新
教材專著