基于SARSA算法的足球機(jī)器人決策系統(tǒng)的研究與設(shè)計(jì)

發(fā)布時(shí)間：2018-12-08 21:20

【摘要】：Robo Cup 2D仿真機(jī)器人足球比賽平臺是多智能體機(jī)器人系統(tǒng)研究的一種平臺,研究人員可以在該平臺上測試不同的機(jī)器學(xué)習(xí)算法。強(qiáng)化學(xué)習(xí)是機(jī)器學(xué)習(xí)算法中的重要算法之一,它允許智能體通過與環(huán)境不斷地進(jìn)行交互以獲得最大的累積獎(jiǎng)勵(lì)回報(bào)。在一定的條件下,強(qiáng)化學(xué)習(xí)可以保證智能體的學(xué)習(xí)能夠收斂到最優(yōu)策略上。強(qiáng)化學(xué)習(xí)已經(jīng)被廣泛應(yīng)用于圍棋、五子棋、俄羅斯方塊、虛幻競技場等游戲當(dāng)中并取得了成功,但是它在Robo Cup 2D仿真比賽中并沒有被充分研究。本文將SARSA算法引入到Robo Cup 2D仿真比賽中,并對其進(jìn)行改進(jìn)。根據(jù)防守球員的位置和球的位置對球員智能體的狀態(tài)空間進(jìn)行映射,并根據(jù)空間狀態(tài)的映射獲得其對應(yīng)的前提條件函數(shù),作為SARSA算法進(jìn)行動作選擇的依據(jù),對SARSA算法在Helios框架中進(jìn)行了設(shè)計(jì)與實(shí)現(xiàn)�；谧闱蝾I(lǐng)域知識,本文提出了兩種基于領(lǐng)域知識的獎(jiǎng)勵(lì)修正函數(shù),包括基于球隊(duì)分散度的獎(jiǎng)勵(lì)修正函數(shù)和基于足球轉(zhuǎn)移距離的獎(jiǎng)勵(lì)修正函數(shù),以使球隊(duì)有更好的表現(xiàn)。在多智能體系統(tǒng)中,單智能體獨(dú)立地進(jìn)行強(qiáng)化學(xué)習(xí)得到Q表往往是稀疏的,無法代表整個(gè)系統(tǒng)的全局情況,為了解決這種問題,本文對多智能體共享Q表的方法進(jìn)行了研究,并提出了多Q表融合算法,使得球隊(duì)在比賽中獲得更高的勝率。由于強(qiáng)化學(xué)習(xí)算法的設(shè)計(jì)需要保證Q表的收斂,本文首先對比了自適應(yīng)?-greedy動作選擇策略與固定?-greedy動作選擇策略的收斂性,并最終選擇了能夠收斂的自適應(yīng)?-greedy動作選擇策略;然后對于獎(jiǎng)勵(lì)回報(bào)函數(shù)的設(shè)計(jì)本文對比了不同獎(jiǎng)勵(lì)值對進(jìn)球得分的影響,確定了正確的獎(jiǎng)勵(lì)值,并對比了SARSA算法在引入兩種獎(jiǎng)勵(lì)修正后球隊(duì)的勝率,實(shí)驗(yàn)證明獎(jiǎng)勵(lì)修正的引入有利于提高球隊(duì)勝率;最后與參加Robo Cup 2D的球隊(duì)進(jìn)行了多場比賽,并對比賽結(jié)果進(jìn)行了統(tǒng)計(jì)分析,驗(yàn)證了本文算法的有效性。
[Abstract]:Robo Cup 2D simulation robot soccer competition platform is a platform for the research of multi-agent robot system. Researchers can test different machine learning algorithms on the platform. Reinforcement learning is one of the most important algorithms in machine learning. It allows agents to interact with the environment continuously to obtain the maximum cumulative reward. Under certain conditions, reinforcement learning can ensure that agent learning converges to the optimal strategy. Reinforcement learning has been widely used in games such as go, Gobang, Tetris, Unreal Arena and so on, but it has not been fully studied in Robo Cup 2D simulation competition. This paper introduces SARSA algorithm into Robo Cup 2D simulation competition and improves it. According to the position of the defensive player and the position of the ball, the state space of the player agent is mapped, and the corresponding precondition function is obtained according to the mapping of the space state, which is used as the basis for the action selection of the SARSA algorithm. The SARSA algorithm is designed and implemented in the framework of Helios. Based on football domain knowledge, this paper proposes two kinds of reward correction functions based on domain knowledge, including one based on team dispersion and one based on football transfer distance, so as to make the team perform better. In order to solve this problem, the method of multi-agent sharing Q table is studied in this paper, in order to solve this problem, the Q table is often sparse and cannot represent the overall situation of the whole system by the single-agent reinforcement learning independently, in order to solve this problem, this paper studies the method of multi-agent sharing Q table. A multi-Q-table fusion algorithm is proposed to make the team win higher in the match. Since the design of reinforcement learning algorithm needs to ensure the convergence of Q table, this paper first compares the convergence of adaptive-greedy action selection strategy and fixed-greedy action selection strategy. Finally, the adaptive greedy action selection strategy which can converge is selected. Then for the design of reward and reward function, this paper compares the effect of different reward values on goal score, determines the correct reward value, and compares the winning rate of the team after introducing two kinds of reward correction by SARSA algorithm. Experimental results show that the introduction of reward correction is beneficial to improve the winning rate of the team. Finally, several games were conducted with the team participating in Robo Cup 2D, and the results were statistically analyzed to verify the effectiveness of this algorithm.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP242

【參考文獻(xiàn)】

相關(guān)博士學(xué)位論文前1條

1 柏愛俊;基于馬爾科夫理論的不確定性規(guī)劃和感知問題研究[D];中國科學(xué)技術(shù)大學(xué);2014年

，

本文編號：2369012

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2369012.html

上一篇：針對機(jī)器人位姿測量立體標(biāo)靶的單目視覺標(biāo)定方法
下一篇：基于布谷鳥搜索算法的圖像檢索系統(tǒng)設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于SARSA算法的足球機(jī)器人決策系統(tǒng)的研究與設(shè)計(jì)