基于SARSA算法的足球機(jī)器人決策系統(tǒng)的研究與設(shè)計(jì)
發(fā)布時(shí)間:2018-12-08 21:20
【摘要】:Robo Cup 2D仿真機(jī)器人足球比賽平臺(tái)是多智能體機(jī)器人系統(tǒng)研究的一種平臺(tái),研究人員可以在該平臺(tái)上測(cè)試不同的機(jī)器學(xué)習(xí)算法。強(qiáng)化學(xué)習(xí)是機(jī)器學(xué)習(xí)算法中的重要算法之一,它允許智能體通過與環(huán)境不斷地進(jìn)行交互以獲得最大的累積獎(jiǎng)勵(lì)回報(bào)。在一定的條件下,強(qiáng)化學(xué)習(xí)可以保證智能體的學(xué)習(xí)能夠收斂到最優(yōu)策略上。強(qiáng)化學(xué)習(xí)已經(jīng)被廣泛應(yīng)用于圍棋、五子棋、俄羅斯方塊、虛幻競(jìng)技場(chǎng)等游戲當(dāng)中并取得了成功,但是它在Robo Cup 2D仿真比賽中并沒有被充分研究。本文將SARSA算法引入到Robo Cup 2D仿真比賽中,并對(duì)其進(jìn)行改進(jìn)。根據(jù)防守球員的位置和球的位置對(duì)球員智能體的狀態(tài)空間進(jìn)行映射,并根據(jù)空間狀態(tài)的映射獲得其對(duì)應(yīng)的前提條件函數(shù),作為SARSA算法進(jìn)行動(dòng)作選擇的依據(jù),對(duì)SARSA算法在Helios框架中進(jìn)行了設(shè)計(jì)與實(shí)現(xiàn);谧闱蝾I(lǐng)域知識(shí),本文提出了兩種基于領(lǐng)域知識(shí)的獎(jiǎng)勵(lì)修正函數(shù),包括基于球隊(duì)分散度的獎(jiǎng)勵(lì)修正函數(shù)和基于足球轉(zhuǎn)移距離的獎(jiǎng)勵(lì)修正函數(shù),以使球隊(duì)有更好的表現(xiàn)。在多智能體系統(tǒng)中,單智能體獨(dú)立地進(jìn)行強(qiáng)化學(xué)習(xí)得到Q表往往是稀疏的,無(wú)法代表整個(gè)系統(tǒng)的全局情況,為了解決這種問題,本文對(duì)多智能體共享Q表的方法進(jìn)行了研究,并提出了多Q表融合算法,使得球隊(duì)在比賽中獲得更高的勝率。由于強(qiáng)化學(xué)習(xí)算法的設(shè)計(jì)需要保證Q表的收斂,本文首先對(duì)比了自適應(yīng)?-greedy動(dòng)作選擇策略與固定?-greedy動(dòng)作選擇策略的收斂性,并最終選擇了能夠收斂的自適應(yīng)?-greedy動(dòng)作選擇策略;然后對(duì)于獎(jiǎng)勵(lì)回報(bào)函數(shù)的設(shè)計(jì)本文對(duì)比了不同獎(jiǎng)勵(lì)值對(duì)進(jìn)球得分的影響,確定了正確的獎(jiǎng)勵(lì)值,并對(duì)比了SARSA算法在引入兩種獎(jiǎng)勵(lì)修正后球隊(duì)的勝率,實(shí)驗(yàn)證明獎(jiǎng)勵(lì)修正的引入有利于提高球隊(duì)勝率;最后與參加Robo Cup 2D的球隊(duì)進(jìn)行了多場(chǎng)比賽,并對(duì)比賽結(jié)果進(jìn)行了統(tǒng)計(jì)分析,驗(yàn)證了本文算法的有效性。
[Abstract]:Robo Cup 2D simulation robot soccer competition platform is a platform for the research of multi-agent robot system. Researchers can test different machine learning algorithms on the platform. Reinforcement learning is one of the most important algorithms in machine learning. It allows agents to interact with the environment continuously to obtain the maximum cumulative reward. Under certain conditions, reinforcement learning can ensure that agent learning converges to the optimal strategy. Reinforcement learning has been widely used in games such as go, Gobang, Tetris, Unreal Arena and so on, but it has not been fully studied in Robo Cup 2D simulation competition. This paper introduces SARSA algorithm into Robo Cup 2D simulation competition and improves it. According to the position of the defensive player and the position of the ball, the state space of the player agent is mapped, and the corresponding precondition function is obtained according to the mapping of the space state, which is used as the basis for the action selection of the SARSA algorithm. The SARSA algorithm is designed and implemented in the framework of Helios. Based on football domain knowledge, this paper proposes two kinds of reward correction functions based on domain knowledge, including one based on team dispersion and one based on football transfer distance, so as to make the team perform better. In order to solve this problem, the method of multi-agent sharing Q table is studied in this paper, in order to solve this problem, the Q table is often sparse and cannot represent the overall situation of the whole system by the single-agent reinforcement learning independently, in order to solve this problem, this paper studies the method of multi-agent sharing Q table. A multi-Q-table fusion algorithm is proposed to make the team win higher in the match. Since the design of reinforcement learning algorithm needs to ensure the convergence of Q table, this paper first compares the convergence of adaptive-greedy action selection strategy and fixed-greedy action selection strategy. Finally, the adaptive greedy action selection strategy which can converge is selected. Then for the design of reward and reward function, this paper compares the effect of different reward values on goal score, determines the correct reward value, and compares the winning rate of the team after introducing two kinds of reward correction by SARSA algorithm. Experimental results show that the introduction of reward correction is beneficial to improve the winning rate of the team. Finally, several games were conducted with the team participating in Robo Cup 2D, and the results were statistically analyzed to verify the effectiveness of this algorithm.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP242
本文編號(hào):2369012
[Abstract]:Robo Cup 2D simulation robot soccer competition platform is a platform for the research of multi-agent robot system. Researchers can test different machine learning algorithms on the platform. Reinforcement learning is one of the most important algorithms in machine learning. It allows agents to interact with the environment continuously to obtain the maximum cumulative reward. Under certain conditions, reinforcement learning can ensure that agent learning converges to the optimal strategy. Reinforcement learning has been widely used in games such as go, Gobang, Tetris, Unreal Arena and so on, but it has not been fully studied in Robo Cup 2D simulation competition. This paper introduces SARSA algorithm into Robo Cup 2D simulation competition and improves it. According to the position of the defensive player and the position of the ball, the state space of the player agent is mapped, and the corresponding precondition function is obtained according to the mapping of the space state, which is used as the basis for the action selection of the SARSA algorithm. The SARSA algorithm is designed and implemented in the framework of Helios. Based on football domain knowledge, this paper proposes two kinds of reward correction functions based on domain knowledge, including one based on team dispersion and one based on football transfer distance, so as to make the team perform better. In order to solve this problem, the method of multi-agent sharing Q table is studied in this paper, in order to solve this problem, the Q table is often sparse and cannot represent the overall situation of the whole system by the single-agent reinforcement learning independently, in order to solve this problem, this paper studies the method of multi-agent sharing Q table. A multi-Q-table fusion algorithm is proposed to make the team win higher in the match. Since the design of reinforcement learning algorithm needs to ensure the convergence of Q table, this paper first compares the convergence of adaptive-greedy action selection strategy and fixed-greedy action selection strategy. Finally, the adaptive greedy action selection strategy which can converge is selected. Then for the design of reward and reward function, this paper compares the effect of different reward values on goal score, determines the correct reward value, and compares the winning rate of the team after introducing two kinds of reward correction by SARSA algorithm. Experimental results show that the introduction of reward correction is beneficial to improve the winning rate of the team. Finally, several games were conducted with the team participating in Robo Cup 2D, and the results were statistically analyzed to verify the effectiveness of this algorithm.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP242
【參考文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 柏愛俊;基于馬爾科夫理論的不確定性規(guī)劃和感知問題研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年
,本文編號(hào):2369012
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2369012.html
最近更新
教材專著