Download presentation
Presentation is loading. Please wait.
1
Deep Reinforcement Learning in Navigation
Anwica Kashfeen
2
Reinforcement Learning
Involves an agent interacts with environment, which provides numerous rewards Goal: learn to take actions that maximize reward
3
Reinforcement Learning
agent environment
4
Reinforcement Learning
5
Reinforcement Learning
Optimal Policy Random Policy Policy: Negative reward for moving further from target
6
Reinforcement Learning: Make robot move forward
Input: Current position, angles of joints Output: Torques applied on joint Reward: 1 at each time the robot moves forward
7
Reinforcement Learning: Balance a pole
Input: Current state of pole Output: Horizontal force applied on cart Reward: 1 at each time the cart in upright
8
Reinforcement Learning: Mastering Atari Game
Input: RGB image of current state Output: paddle’s movement Reward: score Video Link:
9
Challenges Complicated input signals No supervisor
No instantaneous feedback Agent’s action effect environment Model Design Criteria: Use environment’s criticism on agents’ action Input signals refers to the observations it makes
10
Actor-Critic Network Agent: Actor Actor Network: output policy
Moving up: further from target S T Moving down: closer to target
11
Actor-Critic Network Environment: Critic Critic Network: output value
No matter how good the action in next step, it will take at least 5 steps to reach the target S T It’s possible to reach target only in 1 step
12
Actor-Critic Network One single network for both actor and critic
Shares network parameters Two different networks Do not share network parameters Actor needs to know the advantage of being in the current state Choose network model depending on the task
13
Reinforcement Learning
Target-Driven Navigation Collision Avoidance
14
Target-Dirven Navigation
Objective Avoid collision with static objects in environment Find optimal path from source to target
15
Target-Driven Navigation
Global Planning Requires a map Hard to deal with dynamic objects Local Planning Requires perfect sensing of environment
16
Target-Driven Navigation
Local Planning Input: RGB image of current & target state Output Policy: decides agent’s next step Value: Value of new state Reward: +10 for reaching goal +1 for small step
17
Network Architecture
18
Network Architecture One network
Optimize policy and value concurrently Jointly embeds target and current state Video link:
19
Target-Dirven Navigation
Train only scene-specific layer Advantage of embedding target and current state Adaptive to new target Reduce training load
20
Collision Avoidance Objective
Avoid collision with static objects in environment Avoid collision with other agents
21
Collision Avoidance Centralized method: Decentralized method:
Each agent is aware of other agents’ position and velocity Needs perfect communication between each agent and server. Decentralized method: Each agent is aware of only its neighbor agents’ position and velocity Needs perfect sensing capability to obtain neighbor agent’s information
22
Collision Avoidance Social Force: RVO: ORCA
Each agent is considered to be mass particle Agent keeps a certain distance from other agents and borders RVO: Each agent acts independently Select a velocity outside the RVO Same policy for all agents ORCA Identify collision Find alternate collision free velocity
23
Collision Avoidance
24
Network Architecture Architecture of the collision avoidance neural network Actor Network
25
Network Architecture Architecture of the collision avoidance neural network Critic Network
26
Network Architecture Two networks:
Actor: Policy network Critic: Value network Update parameter of two networks independently Critic’s value in incorporated in policy network
27
Collision Avoidance Generalize well to avoid dynamic obstacle
Generalize for heterogeneous group of agents Video link:
28
Uncertainly-Aware Collision Avoidance
Objective Avoid collision with static objects in environment Move cautiously in an unknown environment
29
Uncertainly-Aware Collision Avoidance
30
Uncertainly-Aware Collision Avoidance
Output of NN Uncertainty No action! Cost function Favors slow movement
31
Conclusion Using Reinforcement Learning in three different ways
Target-Dirven Navigation Use traditional actor-critic model, one single network for both Decentralized Multi-Robot Collision Avoidance seperate network for actor and critic Uncertainty-Aware Reinforcement Learning for collision Avoidance Do not use traditional actor-critic model, Cost function favors desired action
32
References Uncertainty-Aware Reinforcement Learning for Collision Avoidance Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine, Berkeley AI Research (BAIR), University of California, Berkeley, OpenAI Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.