基于安全強化學(xué)習(xí)的車道保持方法研究及其在SUMO中的驗證
發(fā)布時間:2021-11-03 12:00
自動駕駛在不久的將來將會改變?nèi)藗內(nèi)粘I钪械慕煌ǚ绞?大量的工作已投入到自主駕駛中的決策和運動控制算法。目前,強化學(xué)習(xí)(Reinforcement Learning)一直是應(yīng)用于這方面的主要策略。但是,若將強化學(xué)習(xí)應(yīng)用于自動駕駛,其在進行探索時所采取的行動可能造成安全隱患,而且該算法的收斂速度可能太慢。因此要想將強化學(xué)習(xí)走出實驗室并應(yīng)用于實際的車輛自主學(xué)習(xí)中的話,迫切需要解決強化學(xué)習(xí)中的安全問題。論文提出了一種應(yīng)用于自動駕駛的安全強化學(xué)習(xí)算法(Safe Reinforcement Learning),通過添加約束來確保算法學(xué)習(xí)過程中的安全性。論文提出帶約束的策略優(yōu)化算法(CPO:Constrained Policy Optimization),該算法的關(guān)鍵在于在代價函數(shù)中引入條件約束。CPO算法基于Actor-Critic算法框架,通過設(shè)置硬約束條件降低策略更新的大小來確保策略更新過程中的安全性。論文主要工作內(nèi)容包括CPO算法的理論證明和推導(dǎo),實際應(yīng)用以及仿真結(jié)果分析。論文在多種地圖上比較了提出的算法,評估和分析了算法在不同地圖上的安全性和穩(wěn)定性。同時,論文也比較了CPO算法和傳統(tǒng)強化...
【文章來源】:清華大學(xué)北京市 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:70 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
1.1 GENERAL INTRODUCTION AND BACKGROUND
1.2 PROBLEM STATEMENT
1.3 OBJECTIVE
1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
2.2.2 BELLMAN EQUATION
2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
2.4 REINFORCEMENT LEARNING ALGORITHMS
2.4.1 DYNAMIC PROGRAMMING
2.4.2 Q-LEARNING
2.4.3 SARSA ALGORITHM
2.4.4 POLICY GRADIENT METHODS
2.4.5 ACTOR-CRITIC
2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
3.1 CPO ALGORITHM
3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
4.1 EXPERIMENT DESIGN
4.2 MAP DESIGN AND ANALYSIS
4.2.1 STRAIGHT ROAD
4.2.2 S-SHAPED CURVED ROAD
4.2.3 LOOP
4.2.4 ROUNDABOUT
4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
5.1 SUMO(SIMULATION OF URBAN MOBILITY)
5.2 INTRODUCTION TO TRACI
5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
6.1 SUMMARY AND CONTRIBUTIONS
6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME
本文編號:3473643
【文章來源】:清華大學(xué)北京市 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:70 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
ABSTRACT
CHAPTER 1.INTRODUCTION
1.1 GENERAL INTRODUCTION AND BACKGROUND
1.2 PROBLEM STATEMENT
1.3 OBJECTIVE
1.4 THESIS OUTLINE
CHAPTER 2.LITERATURE REVIEW
2.1 THE RESEARCH STATUS OF REINFORCEMENT LEARNING
2.2 REINFORCEMENT LEARNING THEORY AND STRUCTURE
2.2.1 MARKOV DECISION PROCESS AND STRUCTURE
2.2.2 BELLMAN EQUATION
2.3 REINFORCEMENT LEARNING CLASSIFICATIONS
2.4 REINFORCEMENT LEARNING ALGORITHMS
2.4.1 DYNAMIC PROGRAMMING
2.4.2 Q-LEARNING
2.4.3 SARSA ALGORITHM
2.4.4 POLICY GRADIENT METHODS
2.4.5 ACTOR-CRITIC
2.5 THE RESEARCH STATUS OF SAFE REINFORCEMENT LEARNING
2.5.1 BASED ON THE MODIFICATION IN OPTIMIZATION CRITERIA:
2.5.2 BASED ON THE MODIFICATION IN EXPLORATION PROCESS
CHAPTER 3.CONSTRAINED POLICY OPTIMIZATION
3.1 CPO ALGORITHM
3.1.1 CONSTRAINED MARKOV DECISION PROCESS(CMDP)
3.1.2 TRUST REGION POLICY OPTIMIZATION(TRPO)ALGORITHM
3.1.3 TRUST REGION APPLIED TO CONSTRAINED POLICY OPTIMIZATION
3.2 LANE KEEPING BASED ON CONSTRAINED POLICY OPTIMIZATION ALGORITHM
3.2.1 MARKOV MODELING OF LANE KEEPING PROBLEMS
3.2.2 APPROXIMATE SOLUTION OF CPO ALGORITHM
CHAPTER 4.EXPERIMENT DESIGN& DATA ANALYSIS
4.1 EXPERIMENT DESIGN
4.2 MAP DESIGN AND ANALYSIS
4.2.1 STRAIGHT ROAD
4.2.2 S-SHAPED CURVED ROAD
4.2.3 LOOP
4.2.4 ROUNDABOUT
4.3 RL VS CPO ENHANCED SAFE-RL
CHAPTER 5.SIMULATION ANALYSIS
5.1 SUMO(SIMULATION OF URBAN MOBILITY)
5.2 INTRODUCTION TO TRACI
5.3 ANALYSIS OF LANE KEEPING PERFORMANCE:
5.4 CHAPTER SUMMARY
CHAPTER 6.CONCLUSION AND FUTURE WORK
6.1 SUMMARY AND CONTRIBUTIONS
6.2 FUTURE WORK
REFERENCES
ACKNOWLEDGEMENT
RESUME
本文編號:3473643
本文鏈接:http://sikaile.net/kejilunwen/qiche/3473643.html
最近更新
教材專著