Deep Reinforcement Learning in Navigation Anwica Kashfeen
Reinforcement Learning Involves an agent interacts with environment, which provides numerous rewards Goal: learn to take actions that maximize reward
Reinforcement Learning agent environment
Reinforcement Learning
Reinforcement Learning Optimal Policy Random Policy Policy: Negative reward for moving further from target
Reinforcement Learning: Make robot move forward Input: Current position, angles of joints Output: Torques applied on joint Reward: 1 at each time the robot moves forward
Reinforcement Learning: Balance a pole Input: Current state of pole Output: Horizontal force applied on cart Reward: 1 at each time the cart in upright
Reinforcement Learning: Mastering Atari Game Input: RGB image of current state Output: paddle’s movement Reward: score Video Link:
Challenges Complicated input signals No supervisor No instantaneous feedback Agent’s action effect environment Model Design Criteria: Use environment’s criticism on agents’ action Input signals refers to the observations it makes
Actor-Critic Network Agent: Actor Actor Network: output policy Moving up: further from target S T Moving down: closer to target
Actor-Critic Network Environment: Critic Critic Network: output value No matter how good the action in next step, it will take at least 5 steps to reach the target S T It’s possible to reach target only in 1 step
Actor-Critic Network One single network for both actor and critic Shares network parameters Two different networks Do not share network parameters Actor needs to know the advantage of being in the current state Choose network model depending on the task
Reinforcement Learning Target-Driven Navigation Collision Avoidance
Target-Dirven Navigation Objective Avoid collision with static objects in environment Find optimal path from source to target
Target-Driven Navigation Global Planning Requires a map Hard to deal with dynamic objects Local Planning Requires perfect sensing of environment
Target-Driven Navigation Local Planning Input: RGB image of current & target state Output Policy: decides agent’s next step Value: Value of new state Reward: +10 for reaching goal +1 for small step
Network Architecture
Network Architecture One network Optimize policy and value concurrently Jointly embeds target and current state Video link:
Target-Dirven Navigation Train only scene-specific layer Advantage of embedding target and current state Adaptive to new target Reduce training load
Collision Avoidance Objective Avoid collision with static objects in environment Avoid collision with other agents
Collision Avoidance Centralized method: Decentralized method: Each agent is aware of other agents’ position and velocity Needs perfect communication between each agent and server. Decentralized method: Each agent is aware of only its neighbor agents’ position and velocity Needs perfect sensing capability to obtain neighbor agent’s information
Collision Avoidance Social Force: RVO: ORCA Each agent is considered to be mass particle Agent keeps a certain distance from other agents and borders RVO: Each agent acts independently Select a velocity outside the RVO Same policy for all agents ORCA Identify collision Find alternate collision free velocity
Collision Avoidance
Network Architecture Architecture of the collision avoidance neural network Actor Network
Network Architecture Architecture of the collision avoidance neural network Critic Network
Network Architecture Two networks: Actor: Policy network Critic: Value network Update parameter of two networks independently Critic’s value in incorporated in policy network
Collision Avoidance Generalize well to avoid dynamic obstacle Generalize for heterogeneous group of agents Video link:
Uncertainly-Aware Collision Avoidance Objective Avoid collision with static objects in environment Move cautiously in an unknown environment
Uncertainly-Aware Collision Avoidance
Uncertainly-Aware Collision Avoidance Output of NN Uncertainty No action! Cost function Favors slow movement
Conclusion Using Reinforcement Learning in three different ways Target-Dirven Navigation Use traditional actor-critic model, one single network for both Decentralized Multi-Robot Collision Avoidance seperate network for actor and critic Uncertainty-Aware Reinforcement Learning for collision Avoidance Do not use traditional actor-critic model, Cost function favors desired action
References Uncertainty-Aware Reinforcement Learning for Collision Avoidance Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine, Berkeley AI Research (BAIR), University of California, Berkeley, OpenAI Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi