面向多種處理器RAS機(jī)制的故障注入工具設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-29 13:11
本文選題:可靠性 + 可用性。 參考:《哈爾濱工業(yè)大學(xué)》2012年碩士論文
【摘要】:面向關(guān)鍵業(yè)務(wù)的服務(wù)器系統(tǒng)不僅需要擁有強(qiáng)大的業(yè)務(wù)處理能力,同時(shí)也應(yīng)具備較強(qiáng)的容錯(cuò)能力。當(dāng)前,各計(jì)算機(jī)廠商為了提高系統(tǒng)的容錯(cuò)能力及可用性,,設(shè)計(jì)并實(shí)現(xiàn)了多種多樣的RAS機(jī)制。結(jié)合RAS機(jī)制評估的系統(tǒng)容錯(cuò)能力可以為廠商提供有針對性的改進(jìn)意見,而使用故障注入的方法無疑會(huì)使評測效率大幅提高。另外,對現(xiàn)有的服務(wù)器系統(tǒng)的容錯(cuò)能力使用統(tǒng)一測試用例進(jìn)行橫向?qū)Ρ,也能夠發(fā)現(xiàn)不同架構(gòu)的容錯(cuò)機(jī)制具有的優(yōu)勢與缺陷。 本文首先參照FARM模型,從錯(cuò)誤產(chǎn)生的原因出發(fā),劃分錯(cuò)誤的嚴(yán)重等級(jí),建立了面向RAS機(jī)制的故障集,為多種平臺(tái)容錯(cuò)能力的評測構(gòu)建了故障注入模型。然后,按照驅(qū)動(dòng)層模擬和基于可測試性接口兩種故障注入思想,設(shè)計(jì)并實(shí)現(xiàn)了CPU故障注入工具與寄存器故障注入工具。故障注入工具可用于x86、ia64和sparc三種架構(gòu)及其相應(yīng)的Linux與Solaris兩種操作系統(tǒng),具備了向cache、TLB等內(nèi)部部件及寄存器、針對系統(tǒng)關(guān)鍵進(jìn)程和應(yīng)用程序進(jìn)程植入數(shù)據(jù)內(nèi)容一位或多位錯(cuò)誤的能力。 為了驗(yàn)證故障注入工具的有效性,本文選擇了四種不同的計(jì)算機(jī)系統(tǒng)進(jìn)行cache和寄存器故障注入實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果證實(shí)了本文設(shè)計(jì)并實(shí)現(xiàn)的故障注入工具的有效性,能夠使用相同的測試用例對四種系統(tǒng)進(jìn)行測試,并將所得結(jié)果進(jìn)行橫向?qū)Ρ,對系統(tǒng)的容錯(cuò)能力進(jìn)行定性評價(jià)。本文在實(shí)驗(yàn)中分析了MCA機(jī)制與預(yù)測性恢復(fù)機(jī)制各自在錯(cuò)誤處理方面的優(yōu)勢與不足,通過對不同服務(wù)器系統(tǒng)進(jìn)行寄存器故障注入測試,得出提高容錯(cuò)能力需要硬件搭載合適操作系統(tǒng)的結(jié)論。
[Abstract]:The server system for critical business needs not only strong service processing ability, but also strong fault tolerance ability. At present, in order to improve the fault tolerance and availability of the system, computer manufacturers design and implement a variety of RAS mechanisms. The fault-tolerant ability of the system evaluated with RAS mechanism can provide some suggestions for the manufacturers to improve the system fault tolerance, and the evaluation efficiency will be greatly improved by using the method of fault injection. In addition, the advantages and disadvantages of the fault-tolerant mechanism of different architectures can be found by comparing the fault tolerance ability of the existing server system with the unified test cases. In this paper, the fault set oriented to the RAS mechanism is established according to the FARM model and the cause of the error, and the fault injection model is constructed for the evaluation of the fault tolerance ability of various platforms. Then, according to the idea of driver layer simulation and fault injection based on testability interface, CPU fault injection tool and register fault injection tool are designed and implemented. The fault injection tool can be used in three kinds of architectures, x86nia64 and sparc, and the corresponding Linux and Solaris operating systems, with internal components and registers, such as cache-TLB, etc. The ability to populate one or more bit errors of data content for system critical processes and application processes. In order to verify the effectiveness of fault injection tools, four different computer systems are selected for cache and register fault injection experiments. The experimental results show the effectiveness of the fault injection tool designed and implemented in this paper. The four systems can be tested with the same test cases. The results obtained are compared horizontally and the fault tolerance of the system is evaluated qualitatively. In this paper, the advantages and disadvantages of MCA mechanism and predictive recovery mechanism in error handling are analyzed, and register fault injection test is carried out on different server systems. It is concluded that it is necessary for hardware to be equipped with a suitable operating system to improve fault tolerance.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP302.8
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 王建瑩,楊孝宗,徐海智;用軟件實(shí)現(xiàn)的故障注入工具評估錯(cuò)誤檢測機(jī)制[J];小型微型計(jì)算機(jī)系統(tǒng);2000年05期
本文編號(hào):1950997
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1950997.html
最近更新
教材專著