大規(guī)模分布式系統(tǒng)監(jiān)控技術(shù)研究與應(yīng)用
發(fā)布時(shí)間:2018-01-19 08:29
本文關(guān)鍵詞: 分布式系統(tǒng)監(jiān)控 調(diào)用鏈 監(jiān)控采樣 故障診斷 聚合操作 出處:《浙江大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:分布式系統(tǒng)是規(guī)模以及復(fù)雜度不斷擴(kuò)大的計(jì)算機(jī)應(yīng)用的主要表現(xiàn)形式。其中,分布式追蹤系統(tǒng)以及分布式性能監(jiān)控系統(tǒng)是大型分布式系統(tǒng)診斷異常、性能監(jiān)控、保證系統(tǒng)穩(wěn)定的重要手段,分布式追蹤系統(tǒng)負(fù)責(zé)監(jiān)控分布式系統(tǒng)各服務(wù)間調(diào)用情況,分布式性能監(jiān)控系統(tǒng)負(fù)責(zé)監(jiān)控分布式系統(tǒng)各組件對(duì)資源的消耗情況。分布式系統(tǒng)中存在難以快速準(zhǔn)確定位錯(cuò)誤、監(jiān)控采集的數(shù)據(jù)價(jià)值不高、監(jiān)控?cái)?shù)據(jù)采集查詢時(shí)資源消耗高等問(wèn)題,本論文就監(jiān)控?cái)?shù)據(jù)采樣、數(shù)據(jù)分析以及監(jiān)控?cái)?shù)據(jù)存儲(chǔ)索引等方面提出了快速異常診斷以及降低監(jiān)控?cái)?shù)據(jù)采集查詢資源消耗的方案,具體工作如下:1.提出了一種后驗(yàn)式調(diào)用鏈采集方案。現(xiàn)有大規(guī)模分布式系統(tǒng)中異常調(diào)用鏈的比例非常小。針對(duì)這種情況,該方案通過(guò)節(jié)點(diǎn)預(yù)判調(diào)用是否異常,僅還原出錯(cuò)調(diào)用鏈并存儲(chǔ)。較之傳統(tǒng)分布式系統(tǒng)監(jiān)控追蹤系統(tǒng)采用固定采樣率,提升了存儲(chǔ)調(diào)用監(jiān)控日志數(shù)據(jù)的價(jià)值,節(jié)省了網(wǎng)絡(luò)、存儲(chǔ)資源的消耗。2.提出了一種基于決策樹分類方法的調(diào)用鏈故障診斷方法用于解決分布式系統(tǒng)中遇到錯(cuò)誤難以快速準(zhǔn)確定位原因的問(wèn)題。該方法通過(guò)對(duì)已知的異常調(diào)用鏈數(shù)據(jù)集進(jìn)行特征提取,分類錯(cuò)誤調(diào)用鏈為不同錯(cuò)誤類型?焖俣ㄎ诲e(cuò)誤原因,解決分布式系統(tǒng)難以快速準(zhǔn)確診斷故障的問(wèn)題。3.提出了一種基于散列概要森林的時(shí)序數(shù)據(jù)索引方法,優(yōu)化監(jiān)控?cái)?shù)據(jù)規(guī)模龐大時(shí)對(duì)大跨度時(shí)間范圍中對(duì)時(shí)序數(shù)據(jù)進(jìn)行統(tǒng)計(jì)、聚合查詢時(shí)的資源時(shí)間消耗。該方法結(jié)合概要森林樹形索引方案,優(yōu)化時(shí)序數(shù)據(jù)聚合操作速度,并結(jié)合一種基于Hbase的線段樹散列存儲(chǔ)方案,解決Hbase分布式存儲(chǔ)時(shí)序數(shù)據(jù)產(chǎn)生熱點(diǎn)問(wèn)題;谝陨蠋c(diǎn),本文構(gòu)建了錢塘分布式追蹤系統(tǒng)(JTang Tracer),該系統(tǒng)對(duì)應(yīng)分布式系統(tǒng)調(diào)用鏈追蹤與分析,并可視化調(diào)用數(shù)據(jù),較之傳統(tǒng)分布式監(jiān)控系統(tǒng),該系統(tǒng)可以節(jié)省更多的資源以及采集更有價(jià)值的數(shù)據(jù)。
[Abstract]:Distributed system is the main form of computer application with increasing scale and complexity, in which distributed tracking system and distributed performance monitoring system are large-scale distributed systems to diagnose anomalies and monitor performance. The distributed tracking system is responsible for monitoring the calls between the services of the distributed system, which is an important means to ensure the stability of the system. The distributed performance monitoring system is responsible for monitoring the resource consumption of each component of the distributed system. In the distributed system, it is difficult to locate the data quickly and accurately, and the value of the data collected is not high. The problem of high resource consumption in monitoring data acquisition and query is discussed in this paper. Data analysis and monitoring data storage index and other aspects of the rapid exception diagnosis and reduce the cost of monitoring data collection and query resources. The specific work is as follows: 1. A post-call chain acquisition scheme is proposed. The proportion of abnormal call chains in existing large-scale distributed systems is very small. In view of this situation. This scheme can only restore the error call chain and store it. Compared with the traditional distributed system monitoring and tracking system, it adopts a fixed sampling rate, which improves the value of storing call log data. Save the Internet. 2. A fault diagnosis method of call chain based on decision tree classification method is proposed to solve the problem that it is difficult to locate the fault quickly and accurately in distributed system. For feature extraction by calling the chain dataset. Classification error call chain is different types of errors. Quickly locate the error causes and solve the problem that distributed system can not diagnose faults quickly and accurately. 3. A method of indexing temporal data based on hash summary forest is proposed. When the monitoring data scale is large, the time-series data are counted in a large span of time, and the resource time consumption of aggregate query is collected. This method is combined with the outline forest tree index scheme. Optimizing the operation speed of sequential data aggregation and combining a line segment tree hash storage scheme based on Hbase to solve the hot problem of Hbase distributed storage temporal data. Based on the above several points. In this paper, a distributed tracking system of Qiantang, JTang tracker, is constructed, which corresponds to the tracing and analysis of the distributed system call chain, and the visual transfer of data, compared with the traditional distributed monitoring system. The system can save more resources and collect more valuable data.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP277
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 黃向東;鄭亮帆;邱明明;張金瑞;王建民;;支持時(shí)序數(shù)據(jù)聚合函數(shù)的索引[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年03期
2 陳勇旭;陳夢(mèng)杰;劉雪冰;宋杰;;基于MapReduce的連接聚集查詢算法研究[J];計(jì)算機(jī)研究與發(fā)展;2013年S1期
3 宋麗華;郭銳;任強(qiáng);鹿全禮;鄭雷雷;;東營(yíng)云計(jì)算系統(tǒng)架構(gòu)關(guān)鍵技術(shù)的研究[J];計(jì)算機(jī)應(yīng)用與軟件;2011年10期
4 郭艷霞;顏軍;;海量數(shù)據(jù)存儲(chǔ)模式的研究[J];計(jì)算機(jī)與數(shù)字工程;2008年11期
,本文編號(hào):1443436
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1443436.html
最近更新
教材專著