基于云平臺(tái)的集群故障監(jiān)控的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-30 05:21
本文選題:云平臺(tái) + 監(jiān)控系統(tǒng) ; 參考:《北京郵電大學(xué)》2014年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)普及和信息化技術(shù)的不斷提高,社會(huì)上各個(gè)領(lǐng)域?qū)π畔⒒囊笤絹碓礁?處理的數(shù)據(jù)也不斷增加。云計(jì)算已從概念落實(shí)到實(shí)際應(yīng)用中,發(fā)展已臻成熟,已發(fā)展為可個(gè)性化定制、伸縮可擴(kuò)展、面向服務(wù)的公有云或私有云。云平臺(tái)的服務(wù)質(zhì)量對(duì)于云平臺(tái)有著重要的意義,監(jiān)控是云計(jì)算平臺(tái)的重要組成部分,它是云計(jì)算平臺(tái)中很多諸如網(wǎng)絡(luò)分析、系統(tǒng)管理、作業(yè)調(diào)度、負(fù)載均衡、事件預(yù)測(cè)、故障檢測(cè)以及恢復(fù)操作的前提,可以幫助云計(jì)算平臺(tái)動(dòng)態(tài)量化資源使用、檢測(cè)服務(wù)缺陷、發(fā)現(xiàn)用戶使用模式、輔助資源調(diào)度模塊決策,可以提高云計(jì)算平臺(tái)的服務(wù)質(zhì)量。 BC-PDM (Big Cloud of Parallel Data Mining)是全球最大的電信運(yùn)營企業(yè)的商務(wù)智能應(yīng)用需求背景,旨在針對(duì)海量數(shù)據(jù)提供高效、準(zhǔn)確、便捷的數(shù)據(jù)分析服務(wù)。本系統(tǒng)是基于Hadoop集群開發(fā)的,本論文主要介紹了Hadoop集群的故障監(jiān)控的研究與實(shí)現(xiàn)過程。 本文首先介紹了研究背景和研究現(xiàn)狀,然后針對(duì)項(xiàng)目本身的需求,給出總體功能設(shè)計(jì)和各模塊設(shè)計(jì)。本文使用Ganglia和Nagios這兩個(gè)開源監(jiān)控工具,通過對(duì)工具的深入調(diào)研,總結(jié)了其工作原理及優(yōu)勢(shì)、缺點(diǎn)等,將Ganglia和Nagios優(yōu)勢(shì)結(jié)合,同時(shí)優(yōu)化Ganglia的容錯(cuò)機(jī)制,實(shí)現(xiàn)故障監(jiān)控和資源監(jiān)控的功能。Ganglia和Nagios的監(jiān)控?cái)?shù)據(jù)在存儲(chǔ)方面都存在一些問題,系統(tǒng)通過持久化存儲(chǔ)工具將監(jiān)控?cái)?shù)據(jù)轉(zhuǎn)存到Mysql數(shù)據(jù)庫中,進(jìn)行監(jiān)控?cái)?shù)據(jù)統(tǒng)一管理和分析,優(yōu)化監(jiān)控?cái)?shù)據(jù)存儲(chǔ)問題。 本系統(tǒng)利用開源監(jiān)控工具Ganglia和Nagios,通過系統(tǒng)需求分析、系統(tǒng)關(guān)鍵點(diǎn)研究,最后完成了資源監(jiān)控和故障監(jiān)控功能。實(shí)現(xiàn)了對(duì)云平臺(tái)中的物理資源、虛擬資源、服務(wù)資源等的全面監(jiān)控和資源利用率的分析,并根據(jù)分析實(shí)現(xiàn)郵件、短信等多種方式的故障監(jiān)控,以達(dá)到資源監(jiān)控和故障監(jiān)控的目的,保證云平臺(tái)的正常運(yùn)行。 最后應(yīng)用以上的研究實(shí)現(xiàn)了一個(gè)云平臺(tái)監(jiān)控系統(tǒng),其運(yùn)行效果表明本文的策略是有效可行的。
[Abstract]:With the popularization of Internet technology and the continuous improvement of information technology, the requirements of information technology in various fields of society are becoming higher and higher, and the number of data processed is also increasing. Cloud computing has been implemented from the concept to practical applications, the development has matured, has developed into personalized customization, scalable and scalable, service-oriented public or private cloud. Monitoring is an important part of cloud computing platform. It is a lot of cloud computing platform such as network analysis, system management, job scheduling, load balancing, event prediction. The premise of fault detection and recovery operation can help cloud computing platform to dynamically quantify resource usage, detect service defects, discover user usage patterns, and assist resource scheduling module decision-making. BC-PDM (Big Cloud of parallel data Mining) is the business intelligence application requirement background of the world's largest telecom operators, aiming at providing efficient, accurate and convenient data analysis services for mass data. This system is based on Hadoop cluster. This paper mainly introduces the research and implementation of Hadoop cluster fault monitoring. This paper first introduces the research background and research status, then according to the requirements of the project itself, gives the overall function design and each module design. This paper uses ganglia and Nagios, two open source monitoring tools, through the in-depth investigation of the tool, summarizes its working principle and advantages, shortcomings, etc., combines ganglia and Nagios advantages, and optimizes the fault-tolerant mechanism of ganglia. There are some problems in storing the monitoring data of Ganglia and Nagios, which can realize the functions of fault monitoring and resource monitoring. The system transfers the monitoring data to MySQL database through persistent storage tools, and manages and analyzes the monitoring data uniformly. Optimization of monitoring data storage problem. This system uses open source monitoring tools ganglia and Nagios, through system requirement analysis, system key points research, finally completed the resource monitoring and fault monitoring functions. It realizes the overall monitoring of physical resources, virtual resources and service resources in the cloud platform and the analysis of resource utilization. According to the analysis, it realizes the malfunction monitoring of mail, short message, etc. In order to achieve the purpose of resource monitoring and fault monitoring, ensure the normal operation of cloud platform. Finally, a cloud platform monitoring system is implemented by using the above research. The results show that the strategy is effective and feasible.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 劉家良;孫俊麗;姜利群;;一種面向云計(jì)算的QoS評(píng)價(jià)模型[J];電腦知識(shí)與技術(shù);2010年31期
2 崔建群;吳黎兵;彭熙;肖德寶;施輝;;支持QoS屬性的Web服務(wù)PFS模型研究[J];計(jì)算機(jī)工程;2006年21期
3 劉進(jìn)軍;陳桂林;胡成祥;;基于負(fù)載特征的虛擬機(jī)遷移調(diào)度策略[J];計(jì)算機(jī)工程;2011年17期
,本文編號(hào):2085085
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2085085.html
最近更新
教材專著