基于智能體的多機器人系統(tǒng)學習方法研究
[Abstract]:Compared with a single robot, multi-robot system (MRS) has many advantages and good prospects for development, and has become a research hotspot in the field of robotics. Multi-robot system is a complex dynamic system. When designing robot control strategies, it is usually not possible to set all the optimal behaviors for each robot in advance. Behavior-based method can make the multi-robot system show some intelligent characteristics and accomplish complex tasks, which greatly promotes the development of multi-robot system. However, the behavior-based method can not fully adapt to the changing environment and the needs of different tasks, so the multi-robot system can learn independently. It is an important development direction of multi-robot systems to improve the coordination and cooperation ability of individual robots by avoiding the limitation of single learning method. Therefore, it is of great significance to combine different machine learning methods with behavior-based multi-robot systems. The main research contents of multi-robot system include: Firstly, the theory of agent and multi-agent system is studied, several architecture of single robot and multi-robot system are analyzed, and the research idea of multi-robot cooperation is explored by combining behavior-based method with learning-based method. Behavior-based robot formation and soccer system are designed. Learning ability plays an important role in many research contents of multi-robot system. Behavior-based method has the characteristics of robustness and flexibility. Compared with other methods, behavior-based method can make robot accomplish tasks better. With the same machine learning method, for the two main application platforms of multi-robot system: robot formation and soccer, a behavior-based multi-robot system is designed on the basis of robot simulation software Mission Lab and Teambots, which can verify several algorithms proposed in this paper. Secondly, particle swarm optimization algorithm (PS) is studied. O) and Case-based Reasoning (CBR) methods are proposed to integrate PSO and CBR. Traditional behavior-based methods have many advantages, but their fixed behavior parameters are difficult to adapt to the complex environment. CBR is an important technology in artificial intelligence because of its advantages. It is easy to retrieve and store, so it is suitable to provide corresponding parameters for different behaviors. But the traditional CBR method lacks effective learning ability. So this paper proposes PSO as CBR optimizer, which can make CBR get better cases continuously. PSO can also get better initial population through CBR. It is similar to genetic algorithm (GA). In comparison, PSO is also a kind of swarm intelligence method, but it has the characteristics of simpler structure, strong real-time and suitable for continuous problems optimization. It can be said that genetic algorithm can solve the problems, particle swarm optimization algorithm can solve. This paper combines PSO algorithm with CBR method, not only overcomes the shortcomings of CBR, but also meets the real-time requirements. Then, the basic theory of reinforcement learning and the typical Q-learning method are studied to overcome the shortcomings of traditional Q-learning in multi-robot systems. In the absence of information exchange and structure reliability allocation, an improved Q-learning algorithm using experience sharing and filtering techniques is proposed, which improves learning performance and efficiency. The theoretical basis of Q-learning algorithm is Markov decision process. The application of Q-learning directly to multi-robot system destroys this premise. However, Q-learning is still widely used in robot learning because of its simplicity of operation and small size of state-action space. Compared with Multi-Agent Reinforcement learning, traditional Q-learning algorithm lacks information exchange with other agents. Therefore, this paper proposes a method of sharing experience with each agent. In order to speed up the convergence of Q-learning, instead of simply assigning the return signal to each agent, Kalman filter is used in this paper. In the distribution of return signal, the received return signal is regarded as the combination of real return signal and noise signal, which solves the problem of structure reliability allocation to a certain extent. Multi-agent reinforcement learning algorithms Minimax-Q, Nash-Q, FFQ and CE-Q, as well as learning methods based on regret theory, are proposed to overcome the slow convergence speed of traditional CE-Q algorithm: lack of effective behavior exploration strategy. A new CE-Q learning algorithm using no regret strategy is proposed. Markov game theory provides reinforcement learning for multi-agent. Nash Equilibrium plays an important role in Multi-Agent Reinforcement learning, so these algorithms are also called equilibrium-based learning algorithms. Compared with Nash-Q learning algorithm, it is easier to calculate the correlation equilibrium in CE-Q, so CE-Q has a better application prospect. Inspired by the theory of no-regret strategy, if each agent chooses the method of reducing the average regret value as the behavior exploration strategy, the behavior of all agents will tend to converge to a set of set points without regret value. At the same time, it is found that both Nash Equilibrium and correlation Equilibrium belong to rough correlation Equilibrium in essence. Therefore, a new CE-Q learning algorithm is proposed to speed up the convergence of CE-Q learning method by reducing the average regret value. Compared with the traditional CE-Q learning algorithm, the effectiveness of the method is verified.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:TP242
【參考文獻】
相關(guān)期刊論文 前10條
1 項禎楨;蘇劍波;;表征空間中的機器人分層運動規(guī)劃[J];控制理論與應用;2015年09期
2 張瑞雷;李勝;陳慶偉;楊春;;復雜地形環(huán)境下多機器人編隊控制方法[J];控制理論與應用;2014年04期
3 李猛;梁加紅;李石磊;;一種改進的多智能體碰撞避免行為[J];國防科技大學學報;2013年03期
4 黎萍;楊宜民;;基于博弈論的多機器人系統(tǒng)任務分配算法[J];計算機應用研究;2013年02期
5 吳軍;徐昕;連傳強;賀漢根;;協(xié)作多機器人系統(tǒng)研究進展綜述[J];智能系統(tǒng)學報;2011年01期
6 李波;王祥鳳;;基于動態(tài)Leader多機器人隊形控制[J];長春工業(yè)大學學報(自然科學版);2009年02期
7 張英菊;仲秋雁;葉鑫;曲曉飛;;基于案例推理的應急輔助決策方法研究[J];計算機應用研究;2009年04期
8 廖振良;劉宴輝;徐祖信;;基于案例推理的突發(fā)性環(huán)境污染事件應急預案系統(tǒng)[J];環(huán)境污染與防治;2009年01期
9 賈兆紅;陳華平;;基于改進遺傳算法的權(quán)重發(fā)現(xiàn)技術(shù)[J];計算機工程;2007年05期
10 王學寧,徐昕,吳濤,賀漢根;策略梯度強化學習中的最優(yōu)回報基線[J];計算機學報;2005年06期
相關(guān)博士學位論文 前10條
1 李s,
本文編號:2204673
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2204673.html