當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于強(qiáng)化學(xué)習(xí)的路徑規(guī)劃問(wèn)題研究

發(fā)布時(shí)間：2018-06-24 04:12

本文選題：未知環(huán)境 + 強(qiáng)化學(xué)習(xí)　；參考：《哈爾濱工業(yè)大學(xué)》2017年碩士論文

【摘要】：不確定條件下的人機(jī)共生研究領(lǐng)域關(guān)注以機(jī)器學(xué)習(xí)為核心的環(huán)境和態(tài)勢(shì)感知、動(dòng)作或路徑規(guī)劃與決策,以及對(duì)決策結(jié)果的評(píng)價(jià)。它即包含科學(xué)理論問(wèn)題,也有許多工程技術(shù)問(wèn)題。研究這些科學(xué)問(wèn)題和工程技術(shù)問(wèn)題有明顯的理論意義和實(shí)用價(jià)值。本課題主要研究未知環(huán)境下智能體路徑規(guī)劃的強(qiáng)化學(xué)習(xí)解決方案。機(jī)器人或智能體在特定環(huán)境下的路徑規(guī)劃是指從指定起點(diǎn)找到一條到達(dá)終點(diǎn)的路徑,該路徑不與障礙物發(fā)生碰撞。路徑規(guī)劃問(wèn)題的研究由來(lái)已久,也產(chǎn)生了許多成熟的算法,但是這些算法多數(shù)基于已知環(huán)境模型,并結(jié)合搜索的方法。然而在很多情況下,環(huán)境的模型難以獲取;另一方面,機(jī)器人執(zhí)行動(dòng)作時(shí)由于控制誤差或環(huán)境因素導(dǎo)致發(fā)出的指令和執(zhí)行結(jié)果產(chǎn)生偏差,無(wú)法按照規(guī)劃好的路徑去行走,甚至無(wú)法到達(dá)終點(diǎn);第三,規(guī)劃出的路徑可能十分曲折,充滿拐點(diǎn),不利于機(jī)器人的實(shí)際行走。針對(duì)以上幾個(gè)問(wèn)題,本文利用強(qiáng)化學(xué)習(xí)中時(shí)間差分法來(lái)解決路徑規(guī)劃問(wèn)題,并且針對(duì)強(qiáng)化學(xué)習(xí)中存在的探索利用平衡問(wèn)題提出了優(yōu)化的解決方法。論文主要內(nèi)容如下:(1)使用強(qiáng)化學(xué)習(xí)中的時(shí)間差分法解決路徑規(guī)劃問(wèn)題。相比于其他算法,優(yōu)勢(shì)在于不需要對(duì)環(huán)境進(jìn)行建模,而且具有一定的自適應(yīng)性和自學(xué)習(xí)能力,能夠應(yīng)對(duì)智能體運(yùn)動(dòng)存在不確定性的情況。利用仿真實(shí)驗(yàn)對(duì)算法進(jìn)行了驗(yàn)證,結(jié)果表明時(shí)間差分法能夠較快收斂,并且可以在任意位置找到到達(dá)目標(biāo)的路徑。(2)改進(jìn)強(qiáng)化學(xué)習(xí)在實(shí)際應(yīng)用中存在的探索與利用平衡問(wèn)題。在強(qiáng)化學(xué)習(xí)中,探索環(huán)境與利用環(huán)境是一直存在的兩個(gè)過(guò)程,過(guò)多的探索會(huì)使訓(xùn)練時(shí)間變長(zhǎng),過(guò)多的利用會(huì)使智能體收斂到不正確的解上,如何平衡探索和利用便成了一個(gè)重要的研究方向。傳統(tǒng)方法通常隨著訓(xùn)練時(shí)間的增加而減少探索,沒(méi)有考慮環(huán)境和問(wèn)題本身的復(fù)雜程度。本文基于路徑規(guī)劃問(wèn)題,以智能體到達(dá)目標(biāo)成功率為指標(biāo)來(lái)衡量智能體對(duì)環(huán)境的掌握程度,從而動(dòng)態(tài)調(diào)整探索因子,使智能體在對(duì)環(huán)境掌握程度較低時(shí)更多地對(duì)環(huán)境進(jìn)行探索,在對(duì)環(huán)境掌握程度變大時(shí)逐漸減少探索,更多地利用環(huán)境。利用仿真實(shí)驗(yàn)進(jìn)行了驗(yàn)證,結(jié)果表明改進(jìn)后的探索方法能夠更好地平衡探索與利用,使智能體更快到達(dá)目標(biāo)點(diǎn)。
[Abstract]:The research field of human-computer symbiosis under uncertain conditions focuses on the environmental and situational awareness, action or path planning and decision making, as well as the evaluation of decision results, which are centered on machine learning. It contains not only scientific theoretical problems, but also many engineering technical problems. It is of great theoretical significance and practical value to study these scientific and technical problems. This paper mainly studies the reinforcement learning solution of agent path planning in unknown environment. The path planning of a robot or an agent in a specific environment refers to finding a path from a specified starting point to an end, which does not collide with an obstacle. The path planning problem has been studied for a long time, and many mature algorithms have been produced, but most of these algorithms are based on known environmental models and combined with search methods. However, in many cases, the environment model is difficult to obtain; on the other hand, the robot can not follow the planned path to walk because of the error of control or environmental factors, which leads to the deviation of the instruction and the result of the execution. Third, the planned path may be very tortuous and full of inflection points, which is not conducive to the actual walking of the robot. In view of the above problems, this paper uses the time-difference method in reinforcement learning to solve the path planning problem, and puts forward an optimized solution to the problem of exploring and utilizing balance in reinforcement learning. The main contents of this paper are as follows: (1) the path planning problem is solved by time difference method in reinforcement learning. Compared with other algorithms, it has the advantage that it does not need to model the environment, and it has the ability of self-adaptability and self-learning, so it can deal with the uncertainty of the agent motion. The simulation results show that the time-difference method can converge quickly and can find the path to the target at any location. (2) the problem of exploring and utilizing balance in practical application of improved reinforcement learning is discussed. In intensive learning, exploring environment and utilizing environment are two processes that exist all the time. Too much exploration will make the training time longer, too much use will make the agent converge to the incorrect solution. How to balance exploration and utilization has become an important research direction. The traditional method usually reduces the exploration as the training time increases, regardless of the complexity of the environment and the problem itself. Based on the path planning problem, this paper uses the success rate of the agent to measure the degree of mastery of the environment, so as to dynamically adjust the exploration factor, so that the agent can explore the environment more when the degree of mastery of the environment is low. Gradually reduce the exploration and make more use of the environment when the degree of environmental mastery becomes larger. The simulation results show that the improved method can balance the exploration and utilization better and make the agent reach the target point more quickly.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP18;TP242

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 朱大奇;顏明重;;移動(dòng)機(jī)器人路徑規(guī)劃技術(shù)綜述[J];控制與決策;2010年07期

2 喬俊飛;侯占軍;阮曉鋼;;基于神經(jīng)網(wǎng)絡(luò)的強(qiáng)化學(xué)習(xí)在避障中的應(yīng)用[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年S2期

3 孟憲權(quán);趙英男;薛青;;遺傳算法在路徑規(guī)劃中的應(yīng)用[J];計(jì)算機(jī)工程;2008年16期

4 赫東鋒;孫樹(shù)棟;;一種在線自學(xué)習(xí)的移動(dòng)機(jī)器人模糊導(dǎo)航方法[J];西安工業(yè)大學(xué)學(xué)報(bào);2007年04期

5 畢盛;朱金輝;閔華清;鐘漢如;;基于模糊邏輯的機(jī)器人路徑規(guī)劃[J];機(jī)電產(chǎn)品開(kāi)發(fā)與創(chuàng)新;2006年01期

相關(guān)博士學(xué)位論文前1條

1 劉傳領(lǐng);基于勢(shì)場(chǎng)法和遺傳算法的機(jī)器人路徑規(guī)劃技術(shù)研究[D];南京理工大學(xué);2012年

相關(guān)碩士學(xué)位論文前1條

1 傅曉霞;基于狀態(tài)預(yù)測(cè)強(qiáng)化學(xué)習(xí)的移動(dòng)機(jī)器人路徑規(guī)劃研究[D];山東大學(xué);2008年

，

本文編號(hào)：2059956

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2059956.html

上一篇：分布式柔性觸覺(jué)傳感陣列的設(shè)計(jì)與力學(xué)建模研究
下一篇：基于BOTDA技術(shù)的分布式光纖傳感系統(tǒng)的研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于強(qiáng)化學(xué)習(xí)的路徑規(guī)劃問(wèn)題研究