高性能計算機系統(tǒng)能耗管理技術方法研究
發(fā)布時間:2018-06-28 05:41
本文選題:高性能計算 + 能耗管理; 參考:《國防科學技術大學》2012年碩士論文
【摘要】:高性能計算是繼理論和實驗之后人類認識世界的第三大工具和方法。高性能計算技術已廣泛應用于石油勘探數據處理、生物醫(yī)藥研發(fā)、工程設計與仿真、新能源新材料、環(huán)境科學研究,以及多領域的基礎科學等。為滿足日益增長的高性能計算需求,世界各國紛紛制定高性能計算機發(fā)展計劃,不斷提升高性能計算機的性能,相應的計算機系統(tǒng)的能耗也不斷提高。例如,現在最高性能計算機的峰值性能已經達到27Pflops,相應的能耗也已達到8.2MW。巨大的能耗不僅給高性能計算機的運營帶來了高昂的成本開銷,而且對高性能計算機的可靠性和可用性造成直接和潛在的不良影響。高性能計算機的能耗問題已成為國內外學者關注和研究的熱點問題之一。 本文分析了高性能計算機的主要能耗源和能耗分布情況。經過分析,我們發(fā)現高性能計算機的計算子系統(tǒng)為整個系統(tǒng)中最主要的能耗源。通常,高性能計算機是向多個用戶(數十或者上百)同時提供7×24小時計算服務(共享使用),超級計算中心通常是按周或者月向用戶提供一定數量的計算資源。由于不同用戶使用計算機的時間和需要計算資源數量不確定,這往往容易造成每周不同日期不同時間段計算資源使用不均衡,造成一定計算資源的閑置,,導致計算資源和相應的電能浪費。本文主要研究如何在對系統(tǒng)中運行作業(yè)影響最小的情況下,有效管理計算子系統(tǒng)中的計算資源,提高計算資源的利用率和提高能耗的利用率。 本文首先以TH-1A系統(tǒng)為例,分析了TH-1A計算子系統(tǒng)的能耗使用情況、計算結點支持的節(jié)能控制管理機制和資源管理系統(tǒng)SLURM。通過對SLURM節(jié)能模塊的實驗,本文分析了SLURM節(jié)能模塊的不足,針對這些不足,提出了基于活躍資源利用率閾值的能耗管理策略。該策略使系統(tǒng)維持一定的空閑資源,使系統(tǒng)在降低能耗的同時盡量減少作業(yè)的平均等待時間。然后,面向未來高性能計算機系統(tǒng),本文提出了基于負載規(guī)律的混合能耗管理策略。該策略一方面注重對高性能計算機系統(tǒng)負載規(guī)律的利用,另一方面將結點睡眠和結點關閉兩種節(jié)能技術結合使用,進一步降低系統(tǒng)的能耗。最后,本文設計并實現了面向高性能計算機的能耗管理方法模擬器,用以評估上述兩種能耗管理策略的效果。實驗結果表明,本文所設計的兩種能耗管理策略可有效降低系統(tǒng)的能耗。
[Abstract]:High performance computing is the third major tool and method for human understanding of the world after theory and experiment. High performance computing technology has been widely used in oil exploration data processing, biomedical research and development, engineering design and simulation, new energy and new materials, environmental science research, and the basic science of multi collar fields. In order to calculate the demand, countries in the world have formulated high performance computer development plans, constantly improving the performance of high-performance computers, and increasing the energy consumption of corresponding computer systems. For example, the peak performance of the highest performance computer has reached 27Pflops, and the corresponding energy consumption has reached the huge energy consumption of 8.2MW. not only for high performance calculation. The operation of the machine has brought high cost cost, and it has a direct and potential adverse effect on the reliability and availability of high performance computers. The energy consumption of high performance computer has become one of the hot issues of attention and research by scholars at home and abroad.
This paper analyzes the main energy source and energy consumption distribution of high performance computers. After analysis, we find that the computing subsystem of high performance computer is the most important energy source in the whole system. Usually, high performance computers offer 7 x 24 hours computing service (shared use) to multiple users (tens or hundreds), supercomputing. The center usually provides a certain amount of computing resources to the user according to the week or the month. Due to the uncertainty of the amount of resources used for the time and needs of the different users, it tends to cause unbalance of the use of computing resources at different dates and different periods of the week, resulting in the idle resources, which leads to the calculation of resources and the corresponding results. In this paper, we mainly study how to effectively manage the computing resources in the computing subsystem, improve the utilization of computing resources and improve the utilization of energy consumption in the case of the least impact of operation on the operation of the system.
Taking the TH-1A system as an example, this paper analyzes the energy consumption of the TH-1A computing subsystem, calculates the energy saving control management mechanism of the node support and the resource management system SLURM. through the experiment on the SLURM energy saving module. This paper analyzes the insufficiency of the SLURM energy saving module, and puts forward the threshold based on the active resource utilization ratio. The strategy makes the system maintain a certain number of idle resources, make the system reduce the energy consumption while minimizing the average waiting time of the operation. Then, facing the future high performance computer system, this paper proposes a hybrid energy management strategy based on the load rule. On the other hand, on the other hand, the two energy saving technologies are combined to reduce the energy consumption of the system. Finally, the energy management method simulator for high performance computer is designed and implemented to evaluate the effectiveness of the two energy consumption management strategies. The experimental results show that the design of this paper is designed. The two energy management strategies can effectively reduce the energy consumption of the system.
【學位授予單位】:國防科學技術大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP38
【參考文獻】
相關期刊論文 前3條
1 戴永涌;楊樹軍;;基于資源調度的集群節(jié)能系統(tǒng)的設計與實現[J];計算機工程與科學;2009年S1期
2 田寶華;蔣句平;李寶峰;張曉明;屈婉霞;;基于統(tǒng)一資源管理的超級計算機系統(tǒng)節(jié)能方案[J];計算機應用;2012年03期
3 姚信安;宋飛;胡世平;;高性能計算機系統(tǒng)電源設計[J];計算機應用;2012年04期
本文編號:2076956
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2076956.html