基于強化學習方法的多成品率衰變生產(chǎn)系統(tǒng)維護策略研究
發(fā)布時間:2018-12-18 02:21
【摘要】:在制造業(yè)系統(tǒng)中,設(shè)備的狀態(tài)會由于疲勞、磨損、老化等原因發(fā)生衰變。運行狀態(tài)衰變的設(shè)備會導(dǎo)致產(chǎn)品質(zhì)量的下降以及生產(chǎn)成本的提高。維護行動如檢測、修理或更換可以防止設(shè)備在較差的狀態(tài)下運轉(zhuǎn)?墒,過度的維護又會造成生產(chǎn)的中斷、增加設(shè)備停機時間和系統(tǒng)維護成本。因此,制定合理的設(shè)備維護策略對制造業(yè)系統(tǒng)非常重要。盡管國內(nèi)外的許多學者已經(jīng)從多個角度對生產(chǎn)系統(tǒng)中的設(shè)備維護問題進行了大量的研究,然而,與產(chǎn)品質(zhì)量管理相關(guān)的主題卻很少在文獻中提及。在現(xiàn)實生產(chǎn)系統(tǒng)中,設(shè)備的狀態(tài)往往會影響其產(chǎn)品質(zhì)量水平,存在多成品率質(zhì)量問題,即設(shè)備隨著其狀態(tài)的惡化會以較高的概率生產(chǎn)次品。因此,可以依據(jù)產(chǎn)品質(zhì)量檢測數(shù)據(jù)對設(shè)備狀態(tài)進行推斷,確定最優(yōu)維護策略。 近年來,流水線系統(tǒng)設(shè)備維護策略的研究吸引了學者們越來越多的關(guān)注,尤其是由上、下游兩臺串行設(shè)備和一個中間庫存緩沖區(qū)組成的兩設(shè)備流水線系統(tǒng),簡稱2M1B系統(tǒng)。然而,大部分研究工作都是基于較強的假設(shè)條件,例如,生產(chǎn)時間和維護時間是單位時間,維護資源充足并隨時可以獲取等。依據(jù)上述假設(shè)條件進行的維護決策缺乏現(xiàn)實依據(jù)。因此,本文以具有多成品率質(zhì)量問題單臺衰變設(shè)備的預(yù)防維護策略研究為基礎(chǔ),嘗試分析2M1B流水線系統(tǒng)中衰變設(shè)備的預(yù)防維護策略,并進一步探討有限的維護資源對預(yù)防維護策略的影響。最后,改進研究中使用的模型求解方法。主要研究內(nèi)容和成果如下: (1)針對具有多成品率質(zhì)量問題的單臺衰變設(shè)備提出一種預(yù)測維護方法,主要通過兩個階段實現(xiàn)。首先,利用一個連續(xù)時間、離散狀態(tài)半馬爾科夫模型描述設(shè)備的衰變過程,采用基于策略迭代的強化學習方法求解該模型并獲得基于設(shè)備觀測狀態(tài)的維護策略。之后,應(yīng)用學習到的維護策略重新仿真系統(tǒng)模型估計未來的維護時間。通過算例分析發(fā)現(xiàn),設(shè)備未來的維護時間隨著生產(chǎn)產(chǎn)品總數(shù)的增加而下降,同時在給定生產(chǎn)產(chǎn)品總數(shù)的前提下也會隨著次品數(shù)的增加而下降。而且,不斷增加的維護次數(shù)也會引發(fā)維護時間的提前。 (2)在單臺衰變設(shè)備維護策略研究的基礎(chǔ)上,分析2M1B流水線系統(tǒng)中衰變設(shè)備的維護策略。建立兩Agent半馬爾科夫決策過程模型描述系統(tǒng)中設(shè)備的衰變過程。提出一種分布式多Agent強化學習方法,即costs-sharing-RL方法求解該模型。以最小化系統(tǒng)長期期望平均成本率為目標,考慮每個智能體所做的局部決策與全局最優(yōu)目標之間的聯(lián)系,獲取系統(tǒng)最優(yōu)的維護策略。 (3)進一步,思考在維護資源有限情況下2M1B流水線系統(tǒng)中衰變設(shè)備的維護策略。假設(shè)有限的維護資源導(dǎo)致設(shè)備的不完美維護,建立連續(xù)時間、離散狀態(tài)半馬爾科夫模型描述設(shè)備的衰變過程。采用基于資源受限的分布式多Agent強化學習方法,即RC-costs-sharing-RL方法求解該模型。通過2M1B流水線系統(tǒng)的數(shù)值實例證明RC-costs-sharing-RL方法優(yōu)于其他兩種方法如sequential PM方法及independent-RL方法,并可以獲得系統(tǒng)最優(yōu)的維護策略。 (4)從實際應(yīng)用的角度出發(fā),以2M1B流水線系統(tǒng)衰變設(shè)備維護問題為背景,提出一種啟發(fā)式加速的多Agent強化學習方法,即HAMSL方法。目標是在最小化系統(tǒng)平均成本率的前提下,利用啟發(fā)式函數(shù)提高多Agent強化學習方法的學習效率。實驗結(jié)果表明提出的HAMSL方法的學習效率要優(yōu)于一些基于傳統(tǒng)啟發(fā)式搜索技術(shù)的強化學習方法,如ε-貪婪多Agent強化學習方法、鄰域搜索多Agent強化學習方法、模擬退火搜索多Agent強化學習方法及禁忌搜索多Agent強化學習方法。
[Abstract]:In the manufacturing system, the state of the equipment will decay due to fatigue, wear, aging, etc. a device operating in a state of decay may result in a decrease in product quality and an increase in the production cost. maintenance actions such as detection, repair or replacement can prevent the device from operating in a poor state. However, excessive maintenance can cause interruption of production, increase equipment downtime and system maintenance costs. Therefore, the development of reasonable equipment maintenance strategy is very important to the manufacturing system. Although many scholars at home and abroad have done a lot of research on the maintenance of equipment in the production system from various angles, the subject matter related to product quality management is seldom mentioned in the literature. In the real production system, the state of the equipment tends to affect the product quality level, and there is a problem of multi-yield quality, that is, the equipment can produce defective products with higher probability with the deterioration of its state. Therefore, the device state can be inferred according to the product quality detection data, and the optimal maintenance strategy can be determined. In recent years, the research of the maintenance strategy of the pipeline system has attracted more and more attention from the scholars, especially the two-device pipeline system composed of the upper and the downstream serial devices and an intermediate stock buffer, which is called the 2M1B system. Series. However, most of the research work is based on strong assumptions, for example, production time and maintenance time is unit time, maintenance resources are adequate and can be obtained at any time and the like. The maintenance decision based on the above-mentioned hypothesis is lack of reality. Therefore, based on the research of the prevention and maintenance strategy of single-stage decays with a multi-yield quality problem, this paper tries to analyze the preventive maintenance strategy of the decay equipment in the 2M1B pipeline system, and further discusses the reflection of the limited maintenance resources on the prevention and maintenance strategy. In the end, the model solver used in the study is improved The main content and results of the study, for example (1) A method for predicting and maintaining a single-stage decay device with a multi-yield quality problem is proposed, The method comprises the following steps of: firstly, using a continuous time, a discrete state semi-Markov model to describe the decay process of the equipment, solving the model by adopting a strengthened learning method based on a policy iteration and obtaining a dimension based on the observation state of the equipment, After that, apply the learned maintenance policy to re-simulate the system model to estimate the future dimension It is found that the maintenance time of the equipment will decrease with the increase of the total number of production products, and the number of defective products will increase with the increase of the number of defective products. and the increasing number of maintenance times will also lead to maintenance time (2) Analysis of the decay equipment in the 2M1B pipeline system on the basis of the study of the maintenance strategy of the single-stage decay equipment The maintenance strategy of the two-agent semi-Markov decision-making process model is established. A distributed multi-agent enhanced learning method, that is, a costs-sharing-RL method, is presented. The model is solved. In order to minimize the system's long-term expected average cost rate as the target, consider the relationship between the local decision-making and the global optimal target, and get the optimal system. maintenance strategy. (3) Further, consider the decay of the 2M1B pipeline system under limited resources Equipment maintenance strategy. It is assumed that limited maintenance resources result in imperfect maintenance of equipment, establishment of continuous time, discrete state semi-Markov model description Decay process of equipment. A distributed multi-agent-based enhanced learning method based on resource-constrained is used, that is, RC-costs-sharing-RL By means of the numerical example of the 2M1B pipeline system, the RC-costs-sharing-RL method is proved to be superior to the other two methods, such as the sequential PM method and the inependent-RL method, and the system can be obtained. The optimal maintenance strategy. (4) Based on the practical application, this paper presents a heuristic acceleration multi-agent enhanced learning method based on the maintenance of the 2M1B pipeline system. The goal is to improve the multi-agent strengthening with the heuristic function on the premise of minimizing the average cost rate of the system. The experimental results show that the learning efficiency of the HMSL method is better than that of some traditional heuristic search techniques, such as the one-greedy multi-agent enhanced learning method, the neighborhood search multi-Ag, The Method of Reinforcement Learning, Simulated Annealing, and Multi-Agent Reinforcement Learning and the Tabu-search of Multi-Ag
【學位授予單位】:華中科技大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:TP18;TH17
本文編號:2385159
[Abstract]:In the manufacturing system, the state of the equipment will decay due to fatigue, wear, aging, etc. a device operating in a state of decay may result in a decrease in product quality and an increase in the production cost. maintenance actions such as detection, repair or replacement can prevent the device from operating in a poor state. However, excessive maintenance can cause interruption of production, increase equipment downtime and system maintenance costs. Therefore, the development of reasonable equipment maintenance strategy is very important to the manufacturing system. Although many scholars at home and abroad have done a lot of research on the maintenance of equipment in the production system from various angles, the subject matter related to product quality management is seldom mentioned in the literature. In the real production system, the state of the equipment tends to affect the product quality level, and there is a problem of multi-yield quality, that is, the equipment can produce defective products with higher probability with the deterioration of its state. Therefore, the device state can be inferred according to the product quality detection data, and the optimal maintenance strategy can be determined. In recent years, the research of the maintenance strategy of the pipeline system has attracted more and more attention from the scholars, especially the two-device pipeline system composed of the upper and the downstream serial devices and an intermediate stock buffer, which is called the 2M1B system. Series. However, most of the research work is based on strong assumptions, for example, production time and maintenance time is unit time, maintenance resources are adequate and can be obtained at any time and the like. The maintenance decision based on the above-mentioned hypothesis is lack of reality. Therefore, based on the research of the prevention and maintenance strategy of single-stage decays with a multi-yield quality problem, this paper tries to analyze the preventive maintenance strategy of the decay equipment in the 2M1B pipeline system, and further discusses the reflection of the limited maintenance resources on the prevention and maintenance strategy. In the end, the model solver used in the study is improved The main content and results of the study, for example (1) A method for predicting and maintaining a single-stage decay device with a multi-yield quality problem is proposed, The method comprises the following steps of: firstly, using a continuous time, a discrete state semi-Markov model to describe the decay process of the equipment, solving the model by adopting a strengthened learning method based on a policy iteration and obtaining a dimension based on the observation state of the equipment, After that, apply the learned maintenance policy to re-simulate the system model to estimate the future dimension It is found that the maintenance time of the equipment will decrease with the increase of the total number of production products, and the number of defective products will increase with the increase of the number of defective products. and the increasing number of maintenance times will also lead to maintenance time (2) Analysis of the decay equipment in the 2M1B pipeline system on the basis of the study of the maintenance strategy of the single-stage decay equipment The maintenance strategy of the two-agent semi-Markov decision-making process model is established. A distributed multi-agent enhanced learning method, that is, a costs-sharing-RL method, is presented. The model is solved. In order to minimize the system's long-term expected average cost rate as the target, consider the relationship between the local decision-making and the global optimal target, and get the optimal system. maintenance strategy. (3) Further, consider the decay of the 2M1B pipeline system under limited resources Equipment maintenance strategy. It is assumed that limited maintenance resources result in imperfect maintenance of equipment, establishment of continuous time, discrete state semi-Markov model description Decay process of equipment. A distributed multi-agent-based enhanced learning method based on resource-constrained is used, that is, RC-costs-sharing-RL By means of the numerical example of the 2M1B pipeline system, the RC-costs-sharing-RL method is proved to be superior to the other two methods, such as the sequential PM method and the inependent-RL method, and the system can be obtained. The optimal maintenance strategy. (4) Based on the practical application, this paper presents a heuristic acceleration multi-agent enhanced learning method based on the maintenance of the 2M1B pipeline system. The goal is to improve the multi-agent strengthening with the heuristic function on the premise of minimizing the average cost rate of the system. The experimental results show that the learning efficiency of the HMSL method is better than that of some traditional heuristic search techniques, such as the one-greedy multi-agent enhanced learning method, the neighborhood search multi-Ag, The Method of Reinforcement Learning, Simulated Annealing, and Multi-Agent Reinforcement Learning and the Tabu-search of Multi-Ag
【學位授予單位】:華中科技大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:TP18;TH17
【參考文獻】
相關(guān)期刊論文 前7條
1 金光;肖磊;厲海濤;;基于半馬氏決策過程的電容器維修策略優(yōu)化[J];國防科技大學學報;2012年01期
2 張逸民;航空渦輪發(fā)動機翻修壽命與可靠性分析[J];航空學報;1981年04期
3 蘇春;周小荃;;基于半馬爾科夫決策過程的風力機狀態(tài)維修優(yōu)化[J];機械工程學報;2012年02期
4 徐正國;周東華;;一類動態(tài)系統(tǒng)的可靠性實時預(yù)測方法研究[J];控制工程;2008年01期
5 岳文輝;劉德順;黃良沛;;面向維護的礦山機電系統(tǒng)服役年齡動態(tài)仿真與近似建模[J];煤炭學報;2007年07期
6 呂文元;朱清香;方淑芬;;利用故障數(shù)據(jù)和估計的檢查數(shù)據(jù)建立維修優(yōu)化模型[J];數(shù)學的實踐與認識;2007年10期
7 徐艮;;制約企業(yè)設(shè)備維修進度的因素及對策[J];設(shè)備管理與維修;2012年12期
,本文編號:2385159
本文鏈接:http://sikaile.net/kejilunwen/jixiegongcheng/2385159.html
最近更新
教材專著