天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于智能體的多機器人系統(tǒng)學習方法研究

發(fā)布時間:2018-08-26 11:03
【摘要】:與單個機器人相比較,多機器人(MRS)具有很多優(yōu)勢和良好的發(fā)展前景,已經(jīng)成為機器人領(lǐng)域中的研究熱點。多機器人系統(tǒng)是一個復雜的動態(tài)系統(tǒng),在設(shè)計機器人控制策略的時候,通常不能夠預先為每個機器人設(shè)定好所有的最優(yōu)行為;谛袨榈姆椒軌蜃尪鄼C器人系統(tǒng)呈現(xiàn)出一些智能的特點,完成比較復雜的任務,極大地促進了多機器人系統(tǒng)的發(fā)展。但是僅采用基于行為的方法還不能完全適應不斷變化的外界環(huán)境和不同任務的需求,讓多機器人系統(tǒng)具有自主的學習能力,同時避免單一學習方法的局限性,從而不斷提高個體機器人之間的協(xié)調(diào)協(xié)作能力是多機器人系統(tǒng)的重要發(fā)展方向。因此研究將不同的機器學習方法與基于行為的多機器人系統(tǒng)相結(jié)合具有很好的研究意義。本文采用智能體理論對多機器人系統(tǒng)進行研究,其主要的研究內(nèi)容包括:首先,研究了智能體及多智能體系統(tǒng)的理論,分析了單機器人和多機器人系統(tǒng)的幾種體系結(jié)構(gòu),提出將基于行為的方法和基于學習的方法相結(jié)合來探索多機器人協(xié)同的研究思路,同時設(shè)計了基于行為的機器人編隊和足球系統(tǒng)。在多機器人系統(tǒng)眾多的研究內(nèi)容中,學習能力占據(jù)了重要位置;谛袨榈姆椒ň哂恤敯粜詮、靈活的特點,相對于其它的方法能更好地使機器人完成任務。本文以基于行為的方法為基礎(chǔ),結(jié)合不同的機器學習方法,針對多機器人系統(tǒng)的兩個主要應用平臺:機器人編隊和足球,在機器人仿真軟件Mission Lab和Teambots的基礎(chǔ)上,設(shè)計了基于行為的多機器人系統(tǒng),從而可以對本文提出的幾種算法進行驗證。其次,研究了粒子群優(yōu)化算法(PSO)和基于案例的推理(CBR)方法,針對這兩種方法各自的優(yōu)勢,提出了一種融合PSO與CBR的混合系統(tǒng)方法。傳統(tǒng)的基于行為的方法雖然具有很多優(yōu)點,但是其固定的行為參數(shù)難以適應外界復雜的環(huán)境。CBR作為人工智能中的一項重要技術(shù),因為其具有易于檢索和存儲的特點,很適合為不同的行為提供相應的參數(shù)。但是傳統(tǒng)的CBR方法缺乏有效的學習能力,因此本文提出將PSO作為CBR的優(yōu)化器,讓CBR不斷得到更好的案例,同時PSO也可以通過CBR獲得更好的初始種群。與遺傳算法(GA)相比較,PSO也是一種群智能方法,但是具有結(jié)構(gòu)更簡單、實時性強和適合對連續(xù)問題進行優(yōu)化的特點,可以說遺傳算法能夠解決的問題,粒子群優(yōu)化算法都能夠解決。本文將PSO算法與CBR方法相結(jié)合,不僅克服了CBR的缺點,同時也滿足了實時性和對連續(xù)問題進行優(yōu)化的需求。同時以基于行為的機器人編隊為測試平臺,與標準的粒子群優(yōu)化算法相比較,驗證了該方法的有效性。然后,研究了強化學習的基本理論和典型的Q-學習方法,針對傳統(tǒng)Q-學習在多機器人系統(tǒng)中應用的缺點:缺乏信息交流和結(jié)構(gòu)信度分配問題,提出了一種采用經(jīng)驗共享和濾波技術(shù)的改進Q-學習算法,從而改善了學習性能、提高了學習效率。Q-學習算法的理論基礎(chǔ)是馬爾可夫決策過程,直接把Q-學習應用到多機器人系統(tǒng)中雖然破壞了這個前提,但是Q-學習因為具有運算簡單、狀態(tài)-動作空間規(guī)模小的特點,在機器人學習中還是得到了廣泛應用。與多智能體強化學習方法相比較,傳統(tǒng)的Q-學習算法缺乏與其它智能體的信息交流,因此本文提出了采用經(jīng)驗共享的方式,每個智能體共享其它智能體的Q值信息,在學習的過程中采用了漸進的學習方式,利用?-Greedy策略以1-?的概率來選取其它智能體的學習經(jīng)驗。同時為了加速Q(mào)-學習的收斂,不同于簡單地把回報信號統(tǒng)一分配給每個智能體,本文將卡爾曼濾波技術(shù)運用到回報信號的分配中,即把接收到的回報信號看作是真實的回報信號與噪聲信號的結(jié)合,在一定程度上解決了結(jié)構(gòu)信度分配問題。以機器人足球為測試平臺,與傳統(tǒng)的Q-學習算法相比較,驗證了該方法的有效性。最后,研究了幾種典型的多智能體強化學習算法Minimax-Q、Nash-Q、FFQ和CE-Q和基于后悔理論的學習方法,針對傳統(tǒng)的CE-Q算法收斂速度慢的缺點:缺乏有效的行為探索策略,提出了一種采用無悔策略的新型CE-Q學習算法。馬爾可夫?qū)Σ呃碚摓槎嘀悄荏w強化學習提供了很好的理論基礎(chǔ),納什均衡在多智能體強化學習中起到了重要作用,因此這些算法也被稱作基于均衡的學習算法。與Nash-Q學習算法中計算納什均衡相比較,計算CE-Q中的相關(guān)均衡更容易,因此CE-Q有著更好的應用前景。但是傳統(tǒng)的CE-Q學習方法缺乏有效的行為探索策略,因此影響了CE-Q學習方法的收斂速度。從無悔策略的理論中得到啟發(fā),如果每個智能體都選擇減少平均后悔值的方法作為行為探索策略,那么所有智能體的行為將趨向于收斂到一組沒有后悔值的集合點,這組集合點也被稱為粗糙相關(guān)均衡集合。同時經(jīng)過分析得到,納什均衡和相關(guān)均衡在本質(zhì)上都屬于粗糙相關(guān)均衡。因此本文提出了采用減少平均后悔值的新型CE-Q學習算法,加快CE-Q學習方法的收斂速度。最后以機器人足球為測試平臺,與傳統(tǒng)的CE-Q學習算法相比較,驗證了該方法的有效性。
[Abstract]:Compared with a single robot, multi-robot system (MRS) has many advantages and good prospects for development, and has become a research hotspot in the field of robotics. Multi-robot system is a complex dynamic system. When designing robot control strategies, it is usually not possible to set all the optimal behaviors for each robot in advance. Behavior-based method can make the multi-robot system show some intelligent characteristics and accomplish complex tasks, which greatly promotes the development of multi-robot system. However, the behavior-based method can not fully adapt to the changing environment and the needs of different tasks, so the multi-robot system can learn independently. It is an important development direction of multi-robot systems to improve the coordination and cooperation ability of individual robots by avoiding the limitation of single learning method. Therefore, it is of great significance to combine different machine learning methods with behavior-based multi-robot systems. The main research contents of multi-robot system include: Firstly, the theory of agent and multi-agent system is studied, several architecture of single robot and multi-robot system are analyzed, and the research idea of multi-robot cooperation is explored by combining behavior-based method with learning-based method. Behavior-based robot formation and soccer system are designed. Learning ability plays an important role in many research contents of multi-robot system. Behavior-based method has the characteristics of robustness and flexibility. Compared with other methods, behavior-based method can make robot accomplish tasks better. With the same machine learning method, for the two main application platforms of multi-robot system: robot formation and soccer, a behavior-based multi-robot system is designed on the basis of robot simulation software Mission Lab and Teambots, which can verify several algorithms proposed in this paper. Secondly, particle swarm optimization algorithm (PS) is studied. O) and Case-based Reasoning (CBR) methods are proposed to integrate PSO and CBR. Traditional behavior-based methods have many advantages, but their fixed behavior parameters are difficult to adapt to the complex environment. CBR is an important technology in artificial intelligence because of its advantages. It is easy to retrieve and store, so it is suitable to provide corresponding parameters for different behaviors. But the traditional CBR method lacks effective learning ability. So this paper proposes PSO as CBR optimizer, which can make CBR get better cases continuously. PSO can also get better initial population through CBR. It is similar to genetic algorithm (GA). In comparison, PSO is also a kind of swarm intelligence method, but it has the characteristics of simpler structure, strong real-time and suitable for continuous problems optimization. It can be said that genetic algorithm can solve the problems, particle swarm optimization algorithm can solve. This paper combines PSO algorithm with CBR method, not only overcomes the shortcomings of CBR, but also meets the real-time requirements. Then, the basic theory of reinforcement learning and the typical Q-learning method are studied to overcome the shortcomings of traditional Q-learning in multi-robot systems. In the absence of information exchange and structure reliability allocation, an improved Q-learning algorithm using experience sharing and filtering techniques is proposed, which improves learning performance and efficiency. The theoretical basis of Q-learning algorithm is Markov decision process. The application of Q-learning directly to multi-robot system destroys this premise. However, Q-learning is still widely used in robot learning because of its simplicity of operation and small size of state-action space. Compared with Multi-Agent Reinforcement learning, traditional Q-learning algorithm lacks information exchange with other agents. Therefore, this paper proposes a method of sharing experience with each agent. In order to speed up the convergence of Q-learning, instead of simply assigning the return signal to each agent, Kalman filter is used in this paper. In the distribution of return signal, the received return signal is regarded as the combination of real return signal and noise signal, which solves the problem of structure reliability allocation to a certain extent. Multi-agent reinforcement learning algorithms Minimax-Q, Nash-Q, FFQ and CE-Q, as well as learning methods based on regret theory, are proposed to overcome the slow convergence speed of traditional CE-Q algorithm: lack of effective behavior exploration strategy. A new CE-Q learning algorithm using no regret strategy is proposed. Markov game theory provides reinforcement learning for multi-agent. Nash Equilibrium plays an important role in Multi-Agent Reinforcement learning, so these algorithms are also called equilibrium-based learning algorithms. Compared with Nash-Q learning algorithm, it is easier to calculate the correlation equilibrium in CE-Q, so CE-Q has a better application prospect. Inspired by the theory of no-regret strategy, if each agent chooses the method of reducing the average regret value as the behavior exploration strategy, the behavior of all agents will tend to converge to a set of set points without regret value. At the same time, it is found that both Nash Equilibrium and correlation Equilibrium belong to rough correlation Equilibrium in essence. Therefore, a new CE-Q learning algorithm is proposed to speed up the convergence of CE-Q learning method by reducing the average regret value. Compared with the traditional CE-Q learning algorithm, the effectiveness of the method is verified.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:TP242

【參考文獻】

相關(guān)期刊論文 前10條

1 項禎楨;蘇劍波;;表征空間中的機器人分層運動規(guī)劃[J];控制理論與應用;2015年09期

2 張瑞雷;李勝;陳慶偉;楊春;;復雜地形環(huán)境下多機器人編隊控制方法[J];控制理論與應用;2014年04期

3 李猛;梁加紅;李石磊;;一種改進的多智能體碰撞避免行為[J];國防科技大學學報;2013年03期

4 黎萍;楊宜民;;基于博弈論的多機器人系統(tǒng)任務分配算法[J];計算機應用研究;2013年02期

5 吳軍;徐昕;連傳強;賀漢根;;協(xié)作多機器人系統(tǒng)研究進展綜述[J];智能系統(tǒng)學報;2011年01期

6 李波;王祥鳳;;基于動態(tài)Leader多機器人隊形控制[J];長春工業(yè)大學學報(自然科學版);2009年02期

7 張英菊;仲秋雁;葉鑫;曲曉飛;;基于案例推理的應急輔助決策方法研究[J];計算機應用研究;2009年04期

8 廖振良;劉宴輝;徐祖信;;基于案例推理的突發(fā)性環(huán)境污染事件應急預案系統(tǒng)[J];環(huán)境污染與防治;2009年01期

9 賈兆紅;陳華平;;基于改進遺傳算法的權(quán)重發(fā)現(xiàn)技術(shù)[J];計算機工程;2007年05期

10 王學寧,徐昕,吳濤,賀漢根;策略梯度強化學習中的最優(yōu)回報基線[J];計算機學報;2005年06期

相關(guān)博士學位論文 前10條

1 李s,

本文編號:2204673


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2204673.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3b29c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com