Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Reinforcement Learning in Navigation

Similar presentations


Presentation on theme: "Deep Reinforcement Learning in Navigation"— Presentation transcript:

1 Deep Reinforcement Learning in Navigation
Anwica Kashfeen

2 Reinforcement Learning
Involves an agent interacts with environment, which provides numerous rewards Goal: learn to take actions that maximize reward

3 Reinforcement Learning
agent environment

4 Reinforcement Learning

5 Reinforcement Learning
Optimal Policy Random Policy Policy: Negative reward for moving further from target

6 Reinforcement Learning: Make robot move forward
Input: Current position, angles of joints Output: Torques applied on joint Reward: 1 at each time the robot moves forward

7 Reinforcement Learning: Balance a pole
Input: Current state of pole Output: Horizontal force applied on cart Reward: 1 at each time the cart in upright

8 Reinforcement Learning: Mastering Atari Game
Input: RGB image of current state Output: paddle’s movement Reward: score Video Link:

9 Challenges Complicated input signals No supervisor
No instantaneous feedback Agent’s action effect environment Model Design Criteria: Use environment’s criticism on agents’ action Input signals refers to the observations it makes

10 Actor-Critic Network Agent: Actor Actor Network: output policy
Moving up: further from target S T Moving down: closer to target

11 Actor-Critic Network Environment: Critic Critic Network: output value
No matter how good the action in next step, it will take at least 5 steps to reach the target S T It’s possible to reach target only in 1 step

12 Actor-Critic Network One single network for both actor and critic
Shares network parameters Two different networks Do not share network parameters Actor needs to know the advantage of being in the current state Choose network model depending on the task

13 Reinforcement Learning
Target-Driven Navigation Collision Avoidance

14 Target-Dirven Navigation
Objective Avoid collision with static objects in environment Find optimal path from source to target

15 Target-Driven Navigation
Global Planning Requires a map Hard to deal with dynamic objects Local Planning Requires perfect sensing of environment

16 Target-Driven Navigation
Local Planning Input: RGB image of current & target state Output Policy: decides agent’s next step Value: Value of new state Reward: +10 for reaching goal +1 for small step

17 Network Architecture

18 Network Architecture One network
Optimize policy and value concurrently Jointly embeds target and current state Video link:

19 Target-Dirven Navigation
Train only scene-specific layer Advantage of embedding target and current state Adaptive to new target Reduce training load

20 Collision Avoidance Objective
Avoid collision with static objects in environment Avoid collision with other agents

21 Collision Avoidance Centralized method: Decentralized method:
Each agent is aware of other agents’ position and velocity Needs perfect communication between each agent and server. Decentralized method: Each agent is aware of only its neighbor agents’ position and velocity Needs perfect sensing capability to obtain neighbor agent’s information

22 Collision Avoidance Social Force: RVO: ORCA
Each agent is considered to be mass particle Agent keeps a certain distance from other agents and borders RVO: Each agent acts independently Select a velocity outside the RVO Same policy for all agents ORCA Identify collision Find alternate collision free velocity

23 Collision Avoidance

24 Network Architecture Architecture of the collision avoidance neural network Actor Network

25 Network Architecture Architecture of the collision avoidance neural network Critic Network

26 Network Architecture Two networks:
Actor: Policy network Critic: Value network Update parameter of two networks independently Critic’s value in incorporated in policy network

27 Collision Avoidance Generalize well to avoid dynamic obstacle
Generalize for heterogeneous group of agents Video link:

28 Uncertainly-Aware Collision Avoidance
Objective Avoid collision with static objects in environment Move cautiously in an unknown environment

29 Uncertainly-Aware Collision Avoidance

30 Uncertainly-Aware Collision Avoidance
Output of NN Uncertainty No action! Cost function Favors slow movement

31 Conclusion Using Reinforcement Learning in three different ways
Target-Dirven Navigation Use traditional actor-critic model, one single network for both Decentralized Multi-Robot Collision Avoidance seperate network for actor and critic Uncertainty-Aware Reinforcement Learning for collision Avoidance Do not use traditional actor-critic model, Cost function favors desired action

32 References Uncertainty-Aware Reinforcement Learning for Collision Avoidance Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine, Berkeley AI Research (BAIR), University of California, Berkeley, OpenAI Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi


Download ppt "Deep Reinforcement Learning in Navigation"

Similar presentations


Ads by Google