Deep Reinforcement Learning in Navigation

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

NEURAL NETWORKS Backpropagation Algorithm

Comparing Effectiveness of Bioinspired Approaches to Search and Rescue Scenarios Emily Shaeffer and Shena Cao 4/28/2011Shaeffer and Cao- ESE 313.

Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reciprocal Velocity Obstacles for Real-Time Multi-Agent Navigation Jur van den Berg Ming Lin Dinesh Manocha.

Jur van den Berg, Stephen J. Guy, Ming Lin, Dinesh Manocha University of North Carolina at Chapel Hill Optimal Reciprocal Collision Avoidance (ORCA)

1 Reactive Pedestrian Path Following from Examples Ronald A. Metoyer Jessica K. Hodgins Presented by Stephen Allen.

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Reinforcement Learning

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.

Particle Swarm Optimization Algorithms

A Shaft Sensorless Control for PMSM Using Direct Neural Network Adaptive Observer Authors: Guo Qingding Luo Ruifu Wang Limei IEEE IECON 22 nd International.

Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.

NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.

Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (

AI in games Roger Crawfis/Eric Fosler-Lussier CSE 786 Game Design.

Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.

Mobile Robot Navigation Using Fuzzy logic Controller

Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.

Reactive Pedestrian Path Following from Examples Computer Animation and Social Agents 2003 Ronald A. Metoyer Jessica K. Hodgins Computer Animation and.

AI in games Roger Crawfis CSE 786 Game Design. AI vs. AI for games AI for games poses a number of unique design challenges AI for games poses a number.

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

` Robot Competition for “Dummies” EVQ#t=356 EVQ#t=356.

Dipankar Raychaudhuri, Joseph B. Evans, Srinivasan Seshan Sin-choo Kim

Inverse Kinematics for Robotics using Neural Networks. Authors: Sreenivas Tejomurtula., Subhash Kak

Reinforcement learning (Chapter 21)

Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Ali Ghadirzadeh, Atsuto Maki, Mårten Björkman Sept 28- Oct Hamburg Germany Presented by Jen-Fang Chang 1.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Stut 11 Robot Path Planning in Unknown Environments Using Particle Swarm Optimization Leandro dos Santos Coelho and Viviana Cocco Mariani.

Reinforcement Learning

Intelligent Agents (Ch. 2)

Continuous Control with Prioritized Experience Replay

Introduction of Reinforcement Learning

Deep Reinforcement Learning

Reinforcement Learning

Reinforcement learning (Chapter 21)

Pursuit-Evasion Games with UGVs and UAVs

Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.

Reinforcement learning (Chapter 21)

Accurate Robot Positioning using Corrective Learning

Deep reinforcement learning

Timothy Boger and Mike Korostelev

CS 188: Artificial Intelligence

Reinforcement Learning for Intelligent Control Part 2 Presented at Chinese Youth Automation Conference National Academy of Science, Beijing, 8/22/05.

CIS 488/588 Bruce R. Maxim UM-Dearborn

Reinforcement learning with unsupervised auxiliary tasks

Hybrid computing using a neural network with dynamic external memory

Integrating Learning of Dialog Strategies and Semantic Parsing

"Playing Atari with deep reinforcement learning."

Human-level control through deep reinforcement learning

Reinforcement Learning

Crowd Simulation (INFOMCRWS) - UU Crowd Simulation Software

Workshop II UU Crowd Simulation Framework

Teaching a Machine to Read Maps with Deep Reinforcement Learning

Joseph Xu Soar Workshop 31 June 2011

Reinforcement Learning with Partially Known World Dynamics

CIS 488/588 Bruce R. Maxim UM-Dearborn

Dr. Unnikrishnan P.C. Professor, EEE

Reinforcement Learning

Reinforcement Learning

Visual Navigation Yukun Cui.

Emir Zeylan Stylianos Filippou

Unsupervised Perceptual Rewards For Imitation Learning

Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn

Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction Gyu-Young Hwang

Presentation transcript:

Deep Reinforcement Learning in Navigation Anwica Kashfeen

Reinforcement Learning Involves an agent interacts with environment, which provides numerous rewards Goal: learn to take actions that maximize reward

Reinforcement Learning agent environment

Reinforcement Learning

Reinforcement Learning Optimal Policy Random Policy Policy: Negative reward for moving further from target

Reinforcement Learning: Make robot move forward Input: Current position, angles of joints Output: Torques applied on joint Reward: 1 at each time the robot moves forward

Reinforcement Learning: Balance a pole Input: Current state of pole Output: Horizontal force applied on cart Reward: 1 at each time the cart in upright

Reinforcement Learning: Mastering Atari Game Input: RGB image of current state Output: paddle’s movement Reward: score Video Link: https://www.youtube.com/watch?v=V1eYniJ0Rnk

Challenges Complicated input signals No supervisor No instantaneous feedback Agent’s action effect environment Model Design Criteria: Use environment’s criticism on agents’ action Input signals refers to the observations it makes

Actor-Critic Network Agent: Actor Actor Network: output policy Moving up: further from target S T Moving down: closer to target

Actor-Critic Network Environment: Critic Critic Network: output value No matter how good the action in next step, it will take at least 5 steps to reach the target S T It’s possible to reach target only in 1 step

Actor-Critic Network One single network for both actor and critic Shares network parameters Two different networks Do not share network parameters Actor needs to know the advantage of being in the current state Choose network model depending on the task

Reinforcement Learning Target-Driven Navigation Collision Avoidance

Target-Dirven Navigation Objective Avoid collision with static objects in environment Find optimal path from source to target

Target-Driven Navigation Global Planning Requires a map Hard to deal with dynamic objects Local Planning Requires perfect sensing of environment

Target-Driven Navigation Local Planning Input: RGB image of current & target state Output Policy: decides agent’s next step Value: Value of new state Reward: +10 for reaching goal +1 for small step

Network Architecture

Network Architecture One network Optimize policy and value concurrently Jointly embeds target and current state Video link: https://www.youtube.com/watch?v=SmBxMDiOrvs

Target-Dirven Navigation Train only scene-specific layer Advantage of embedding target and current state Adaptive to new target Reduce training load

Collision Avoidance Objective Avoid collision with static objects in environment Avoid collision with other agents

Collision Avoidance Centralized method: Decentralized method: Each agent is aware of other agents’ position and velocity Needs perfect communication between each agent and server. Decentralized method: Each agent is aware of only its neighbor agents’ position and velocity Needs perfect sensing capability to obtain neighbor agent’s information

Collision Avoidance Social Force: RVO: ORCA Each agent is considered to be mass particle Agent keeps a certain distance from other agents and borders RVO: Each agent acts independently Select a velocity outside the RVO Same policy for all agents ORCA Identify collision Find alternate collision free velocity

Collision Avoidance

Network Architecture Architecture of the collision avoidance neural network Actor Network

Network Architecture Architecture of the collision avoidance neural network Critic Network

Network Architecture Two networks: Actor: Policy network Critic: Value network Update parameter of two networks independently Critic’s value in incorporated in policy network

Collision Avoidance Generalize well to avoid dynamic obstacle Generalize for heterogeneous group of agents Video link: https://www.youtube.com/watch?v=Uj1yAmlL5lk

Uncertainly-Aware Collision Avoidance Objective Avoid collision with static objects in environment Move cautiously in an unknown environment

Uncertainly-Aware Collision Avoidance

Uncertainly-Aware Collision Avoidance Output of NN Uncertainty No action! Cost function Favors slow movement

Conclusion Using Reinforcement Learning in three different ways Target-Dirven Navigation Use traditional actor-critic model, one single network for both Decentralized Multi-Robot Collision Avoidance seperate network for actor and critic Uncertainty-Aware Reinforcement Learning for collision Avoidance Do not use traditional actor-critic model, Cost function favors desired action

References Uncertainty-Aware Reinforcement Learning for Collision Avoidance Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine, Berkeley AI Research (BAIR), University of California, Berkeley, OpenAI Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi