基于凸多面體抽象域的自適應(yīng)強(qiáng)化學(xué)習(xí)技術(shù)研究
[Abstract]:Table-driven algorithm is an important method to solve reinforcement learning problem. However, due to the existence of "dimensionality disaster", this method can not be directly applied to solve reinforcement learning problem with continuous state space. There are two methods to solve the problem of dimensionality disaster: discretization of state space and approximation of function. Compared with the function approximation, the table-driven method based on continuous state space discretization has the advantages of intuitive principle, simple program structure and lightweight calculation. The key of discretization method based on continuous state space is to find appropriate discretization mechanism of state space, balance computation and accuracy, and ensure numerical measures based on discrete abstract state space, such as V value function and Q value function. It is possible to evaluate the original reinforcement learning problem and calculate the optimal strategy 蟺 * accurately. In this paper, an adaptive state space discretization method based on convex polyhedron abstract domain is proposed. The adaptive Q (位) reinforcement learning algorithm (Adaptive Polyhedra Domain based Q (位), APDQ (位). Based on convex polyhedron abstract domain is implemented. Convex polyhedron is an abstract state representation method, which is widely used to evaluate the performance of random systems and verify the numerical properties of programs. The mapping of concrete state space to the abstract state space of polyhedron domain is established by abstract function. The computation problem of continuous state space optimal strategy is transformed into a finite size and easy to deal with the computation problem of abstract state space optimal policy. According to the sample set information related to abstract state, several adaptive refinement mechanisms including BoxRefinement,LFRefinement and MVLFRefinement are designed. According to these refinement mechanisms, the abstract state space is continuously refined adaptively to optimize the discretization mechanism of the specific state space, and to produce a statistical reward model consistent with the sample space of online sampling. The algorithm APDQ (位) is realized based on the polyhedron professional computing library PPL (Parma Polyhedra Library) and the high precision numerical calculation library GMP (GNU Multiple Precision), and a case study is carried out. The typical continuous state space reinforcement learning problem (Mountain Car,MC) and acrobatics robot (Acrobatic robot,Acrobot) were selected as experimental objects. The effects of various reinforcement learning parameters and threshold parameters related to adaptive refinement on the performance of APDQ (位) are evaluated in detail, and the mechanism of various parameters in the process of policy optimization under the dynamic change of abstract state space is explored. The experimental results show that when the discount rate 緯 is greater than 0.7, the algorithm shows good comprehensive performance. In the initial stage, the strategy is improved quickly, and the later stage converges gently (as shown in figs. 6 ~ 13). And it has good adaptability to learning rate 偽 and various abstract state space refinement parameters. When the discount rate 緯 is less than 0.6, the performance of the algorithm declines rapidly. Abstract interpretation technology used in statistical learning process is a good idea to solve the continuous reinforcement learning problem. There are many problems worthy of further study and discussion, such as sampling based on approximate model and value function updating and so on.
【作者單位】: 蘇州大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;符號計(jì)算與知識工程教育部重點(diǎn)實(shí)驗(yàn)室(吉林大學(xué));
【基金】:國家自然科學(xué)基金項(xiàng)目(61272005,61303108,61373094,61472262,61502323,61502329) 江蘇省自然科學(xué)基金項(xiàng)目(BK2012616) 江蘇省高校自然科學(xué)研究項(xiàng)目(13KJB520020) 吉林大學(xué)符號計(jì)算與知識工程教育部重點(diǎn)實(shí)驗(yàn)室項(xiàng)目(93K172014K04) 蘇州市應(yīng)用基礎(chǔ)研究計(jì)劃項(xiàng)目(SYG201422) 蘇州大學(xué)高校省級重點(diǎn)實(shí)驗(yàn)室基金項(xiàng)目(KJS1524) 中國國家留學(xué)基金項(xiàng)目(201606920013) 浙江省自然科學(xué)基金(LY16F010019)資助~~
【分類號】:TP181
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 朱向陽,丁漢,熊有倫;凸多面體之間的偽最小平移距離——Ⅰ.定義及其性質(zhì)[J];中國科學(xué)E輯:技術(shù)科學(xué);2001年02期
2 周水生,容曉鋒,周利華;計(jì)算兩個凸多面體間距離的一個新算法[J];蘇州科技學(xué)院學(xué)報;2003年02期
3 許如初,宋恩民,董向鋒;求包含三維空間中給定點(diǎn)集最小凸多面體算法研究[J];武漢交通科技大學(xué)學(xué)報;1997年02期
4 費(fèi)燕瓊,趙錫芳;基于凸多面體邊界元的接觸狀態(tài)判斷[J];機(jī)械工程學(xué)報;2005年01期
5 耿志勇,黃琳;多輸入多輸出系統(tǒng)在凸多面體攝動模式下H_∞魯棒性能[J];控制理論與應(yīng)用;2000年05期
6 周水生,容曉鋒,周利華;判斷兩個凸多面體相交的簡單方法[J];寶雞文理學(xué)院學(xué)報(自然科學(xué)版);2002年01期
7 王建平,馮光濤,趙錫芳;機(jī)器人裝配中的幾何不確定性建模[J];上海交通大學(xué)學(xué)報;2001年12期
8 吳海霞;馮偉;鄒曉兵;;基于凸多面體方法的時滯和連續(xù)系統(tǒng)穩(wěn)定性分析[J];計(jì)算機(jī)應(yīng)用研究;2014年05期
9 耿魁,高洪華,崔丹,任世軍;用神經(jīng)網(wǎng)絡(luò)求解空間中兩凸多面體間最短距離[J];黑龍江水專學(xué)報;2000年01期
10 任世軍,hope.hit.edu.cn,洪炳熔,孟慶鑫;判斷兩個凸多面體是否相交的一個快速算法[J];軟件學(xué)報;2000年04期
相關(guān)會議論文 前3條
1 楚天廣;黃琳;;凸多面體系統(tǒng)族的魯棒正不變集-混合單調(diào)方法[A];1996年中國控制會議論文集[C];1996年
2 蔣衛(wèi)華;黃琳;楚天廣;;離散凸多面體系統(tǒng)族的魯棒正不變集——混合單調(diào)方法[A];1997年中國控制會議論文集[C];1997年
3 郭祥貴;王武;楊富文;陳四雄;;凸多面體不確定系統(tǒng)的魯棒L_2-L_∞控制[A];2007年中國智能自動化會議論文集[C];2007年
相關(guān)博士學(xué)位論文 前2條
1 張彥虎;線性凸多面體不確定離散系統(tǒng)的分析與綜合[D];浙江大學(xué);2006年
2 衷路生;狀態(tài)空間模型辨識方法研究[D];中南大學(xué);2011年
相關(guān)碩士學(xué)位論文 前4條
1 郭曉寶;凸多面體不確定時滯系統(tǒng)均方指數(shù)穩(wěn)定性的研究[D];合肥工業(yè)大學(xué);2012年
2 胡軍;凸多面體不確定離散線性系統(tǒng)的魯棒性分析[D];哈爾濱理工大學(xué);2009年
3 伊騫鶴;基于凸多面體模型的網(wǎng)絡(luò)控制系統(tǒng)設(shè)計(jì)[D];哈爾濱工業(yè)大學(xué);2010年
4 周e鴈,
本文編號:2393244
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2393244.html