面向超級計(jì)算機(jī)的自適應(yīng)故障預(yù)測算法研究
本文選題:系統(tǒng)容錯(cuò) + 超級計(jì)算機(jī) ; 參考:《重慶大學(xué)》2014年碩士論文
【摘要】:隨著信息技術(shù)的發(fā)展,云計(jì)算等大型分布式系統(tǒng)開始廣泛投入部署和應(yīng)用。然而隨著應(yīng)用系統(tǒng)軟硬件復(fù)雜性的增加,如何保證系統(tǒng)能夠長時(shí)間正確運(yùn)行,為廣大用戶提供高質(zhì)量服務(wù),成為了大型系統(tǒng)設(shè)計(jì)開發(fā)過程中需要考慮的問題。大型系統(tǒng)如果能夠通過故障預(yù)測策略實(shí)現(xiàn)自我診斷,那么其容錯(cuò)能力和資源調(diào)度能力就能得到很大的提升,從而保證系統(tǒng)的高可用性和高可靠性。超級計(jì)算機(jī)擁有復(fù)雜的計(jì)算機(jī)系統(tǒng),針對超級計(jì)算機(jī)的故障預(yù)測研究對于提高超級計(jì)算機(jī)的運(yùn)算性能和系統(tǒng)容錯(cuò)能力具有重要意義,并且有效的故障預(yù)測策略也可以應(yīng)用于其它大型系統(tǒng)中,以此提高這些系統(tǒng)的容錯(cuò)能力。 本文以超級計(jì)算機(jī)的系統(tǒng)運(yùn)行日志為基礎(chǔ),首先設(shè)計(jì)并實(shí)現(xiàn)了基于語義和時(shí)間相關(guān)的過濾算法(Semantic Time Filter Algorithm,簡記STF),對日志記錄進(jìn)行預(yù)處理。STF算法考慮日志記錄之間的語義相關(guān)度和時(shí)間相關(guān)度,根據(jù)兩個(gè)相關(guān)度對原始日志記錄中的冗余記錄進(jìn)行過濾。通過實(shí)驗(yàn)發(fā)現(xiàn),過濾后的日志記錄序列能夠有效地反映系統(tǒng)中非故障事件到故障事件的演變過程,對于后續(xù)分析并建立故障預(yù)測模型有很大幫助。 通過對過濾后的日志記錄進(jìn)行分析,本文運(yùn)用數(shù)據(jù)挖掘中的分類預(yù)測思想,將時(shí)間軸劃分為一定大小的時(shí)間窗,針對時(shí)間窗進(jìn)行特征提取,以時(shí)間窗為單位進(jìn)行故障預(yù)測。本文使用AdaBoost算法在SVM分類器的訓(xùn)練學(xué)習(xí)過程中,根據(jù)訓(xùn)練集動態(tài)調(diào)整分類器核心參數(shù),使分類器進(jìn)行自適應(yīng)學(xué)習(xí)提升,建立了自適應(yīng)故障預(yù)測模型AdaBoostSVM。 本文以超級計(jì)算機(jī)BlueGene/L215天的系統(tǒng)運(yùn)行日志為實(shí)驗(yàn)數(shù)據(jù)集,經(jīng)過預(yù)處理后,在該數(shù)據(jù)集上進(jìn)行預(yù)測模型的對比實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果表明:本文的AdaBoostSVM模型較基于故障記錄之間時(shí)間間隔(Time Between Failure TBF)、基于kNN、RIPPER以及SVM的故障預(yù)測模型具有更好的分類預(yù)測性能,特別是在故障預(yù)測中的重要指標(biāo)召回率方面,自適應(yīng)故障預(yù)測模型AdaBoostSVM的召回率要高出其它預(yù)測模型10%-20%。
[Abstract]:With the development of information technology, cloud computing and other large-scale distributed systems have been widely deployed and applied. However, with the increasing complexity of the software and hardware of the application system, how to ensure that the system can run correctly for a long time and provide high quality service for the majority of users has become a problem to be considered in the process of large-scale system design and development. If a large system can diagnose itself by fault prediction strategy, its fault-tolerant ability and resource scheduling ability can be greatly improved, thus ensuring the high availability and high reliability of the system. Supercomputers have complex computer systems. The study of fault prediction for supercomputers is of great significance to improve the performance of supercomputers and the fault tolerance of systems. Effective fault prediction strategies can also be applied to other large systems to improve their fault tolerance. This paper is based on the system running log of supercomputer, Firstly, a filtering algorithm based on semantic and temporal correlation is designed and implemented, which is abbreviated to STF. The preprocessing. STF algorithm considers the semantic correlation and time correlation between log records. The redundant records in the original log records are filtered according to the two correlations. It is found through experiments that the filtered logging sequence can effectively reflect the evolution process from non-fault events to fault events in the system, which is of great help to the subsequent analysis and the establishment of fault prediction models. Based on the analysis of filtered log records, this paper uses the idea of classification and prediction in data mining, divides the time axis into time windows of a certain size, extracts features from time windows, and makes fault prediction based on time windows. In this paper, the AdaBoost algorithm is used in the training process of SVM classifier. According to the dynamic adjustment of the kernel parameters of the classifier, the classifier is promoted by adaptive learning, and an adaptive fault prediction model, AdaBoostSVM, is established. In this paper, the system running log of the supercomputer BlueGeneR / L 215 days is taken as the experimental data set. After preprocessing, the prediction model is compared on the data set. The experimental results show that the proposed AdaBoost SVM model has better classification performance than that based on time interval between fault records and between time between fault records, kNNNNNRIPPER and SVM, especially on the recall rate of important indexes in fault prediction. The recall rate of adaptive fault prediction model AdaBoostSVM is higher than that of other prediction models.
【學(xué)位授予單位】:重慶大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP338
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 丁世飛;齊丙娟;譚紅艷;;支持向量機(jī)理論與算法研究綜述[J];電子科技大學(xué)學(xué)報(bào);2011年01期
2 嚴(yán)超;王元慶;李久雪;張兆揚(yáng);;AdaBoost分類問題的理論推導(dǎo)[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年04期
3 余雯;蔣盛益;黃興全;;基于聚類和Ripper的稀有類分類方法[J];暨南大學(xué)學(xué)報(bào)(自然科學(xué)與醫(yī)學(xué)版);2009年01期
4 田曲波;邱德紅;張奇峰;孫蕾;;超級計(jì)算機(jī)錯(cuò)誤預(yù)測模型研究[J];計(jì)算機(jī)工程與應(yīng)用;2010年20期
5 宋楓溪,高林;文本分類器性能評估指標(biāo)[J];計(jì)算機(jī)工程;2004年13期
6 蔣句平,龐征斌,周興銘;高性能計(jì)算機(jī)RAS技術(shù)現(xiàn)狀與趨勢[J];計(jì)算機(jī)工程與科學(xué);2005年01期
7 張曉龍;任芳;;支持向量機(jī)與AdaBoost的結(jié)合算法研究[J];計(jì)算機(jī)應(yīng)用研究;2009年01期
8 劉海濤;黃敏;朱啟兵;王聰;;基于支持向量機(jī)的不平衡數(shù)據(jù)分類算法的研究[J];計(jì)算機(jī)應(yīng)用研究;2009年08期
9 王曉丹;孫東延;鄭春穎;張宏達(dá);趙學(xué)軍;;一種基于AdaBoost的SVM分類器[J];空軍工程大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年06期
10 劉曉華;;基于WEKA的數(shù)據(jù)挖掘技術(shù)在物流系統(tǒng)中的應(yīng)用[J];科技情報(bào)開發(fā)與經(jīng)濟(jì);2007年22期
相關(guān)博士學(xué)位論文 前2條
1 伊衛(wèi)國;基于關(guān)聯(lián)規(guī)則與決策樹的預(yù)測方法研究及其應(yīng)用[D];大連海事大學(xué);2012年
2 楊杰明;文本分類中文本表示模型和特征選擇算法研究[D];吉林大學(xué);2013年
,本文編號:2049966
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2049966.html