基于Stackelberg策略的多Agent強(qiáng)化學(xué)習(xí)警力巡邏路徑規(guī)劃
發(fā)布時(shí)間:2018-02-16 13:10
本文關(guān)鍵詞: 巡邏路線規(guī)劃 Stackelberg強(qiáng)均衡策略 多agent 強(qiáng)化學(xué)習(xí) 出處:《北京理工大學(xué)學(xué)報(bào)》2017年01期 論文類型:期刊論文
【摘要】:為解決現(xiàn)有的巡邏路徑規(guī)劃算法僅僅能夠處理雙人博弈和忽略攻擊者存在的問題,提出一種新的基于多agent的強(qiáng)化學(xué)習(xí)算法.在給定攻擊目標(biāo)分布的情況下,規(guī)劃任意多防御者和攻擊者條件下的最優(yōu)巡邏路徑.考慮到防御者與攻擊者選擇策略的非同時(shí)性,采用了Stackelberg強(qiáng)均衡策略作為每個(gè)agent選擇策略的依據(jù).為了驗(yàn)證算法,在多個(gè)巡邏任務(wù)中進(jìn)行了測試.定量和定性的實(shí)驗(yàn)結(jié)果證明了算法的收斂性和有效性.
[Abstract]:In order to solve the problem that the existing patrol path planning algorithms can only deal with the two-game and ignore the attackers, a new reinforcement learning algorithm based on multiple agent is proposed. The optimal patrol path is planned under the condition of arbitrary multiple defenders and attackers. Considering the non-synchronization of the choice strategy between the defender and the attacker, the Stackelberg strong equilibrium strategy is adopted as the basis of each agent selection strategy. The results of quantitative and qualitative experiments show that the algorithm is convergent and effective.
【作者單位】: 中國人民公安大學(xué)網(wǎng)絡(luò)安全保衛(wèi)學(xué)院;
【基金】:中國人民公安大學(xué)基本科研業(yè)務(wù)費(fèi)項(xiàng)目(2014JKF01132)
【分類號(hào)】:D631.1;TP18
,
本文編號(hào):1515593
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1515593.html
最近更新
教材專著