Emir Zeylan Stylianos Filippou

Slides:



Advertisements
Similar presentations
Traffic Light Control Using Reinforcement Learning
Advertisements

AI Pathfinding Representing the Search Space
Accelerometer-based User Interfaces for the Control of a Physically Simulated Character Takaaki Shiratori Jessica K. Hodgins Carnegie Mellon University.
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Multi-agent Planning Amin Atrash. Papers Dynamic Planning for Multiple Mobile Robots –Barry L. Brummit, Anthony Stentz OBDD-based Universal Planning:
1 Reactive Pedestrian Path Following from Examples Ronald A. Metoyer Jessica K. Hodgins Presented by Stephen Allen.
Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.
Trajectory Week 8. Learning Outcomes By the end of week 8 session, students will trajectory of industrial robots.
Brent Dingle Marco A. Morales Texas A&M University, Spring 2002
Kinematics. ILE5030 Computer Animation and Special Effects2 Kinematics The branch of mechanics concerned with the motions of objects without regard to.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Motor Schema Based Navigation for a Mobile Robot: An Approach to Programming by Behavior Ronald C. Arkin Reviewed By: Chris Miles.
Modeling Cross-Episodic Migration of Memory Using Neural Networks by, Adam Britt Definitions: Episodic Memory – Memory of a specific event, combination.
CS274 Spring 01 Lecture 5 Copyright © Mark Meyer Lecture V Higher Level Motion Control CS274: Computer Animation and Simulation.
Automated Planning and HTNs Planning – A brief intro Planning – A brief intro Classical Planning – The STRIPS Language Classical Planning – The STRIPS.
Artificial Neural Networks -Application- Peter Andras
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Definition of an Industrial Robot
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots Xavier Pérez *, Cecilio Angulo *, Sergio Escalera + and Diego Pardo *
Constraints-based Motion Planning for an Automatic, Flexible Laser Scanning Robotized Platform Th. Borangiu, A. Dogar, A. Dumitrache University Politehnica.
© Manfred Huber Autonomous Robots Robot Path Planning.
Swarm Intelligence 虞台文.
Robotics Chapter 5 – Path and Trajectory Planning
Back-Propagation MLP Neural Network Optimizer ECE 539 Andrew Beckwith.
Math / Physics 101 GAM 376 Robin Burke Fall 2006.
Evolving Virtual Creatures & Evolving 3D Morphology and Behavior by Competition Papers by Karl Sims Presented by Sarah Waziruddin.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
UNC Chapel Hill M. C. Lin Introduction to Motion Planning Applications Overview of the Problem Basics – Planning for Point Robot –Visibility Graphs –Roadmap.
Accurate Robot Positioning using Corrective Learning Ram Subramanian ECE 539 Course Project Fall 2003.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Ali Ghadirzadeh, Atsuto Maki, Mårten Björkman Sept 28- Oct Hamburg Germany Presented by Jen-Fang Chang 1.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
A PID Neural Network Controller
CSE Advanced Computer Animation Short Presentation Topic: Locomotion Kang-che Lee 2009 Fall 1.
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Reinforcement Learning
Done Done Course Overview What is AI? What are the Major Challenges?
Mastering the game of Go with deep neural network and tree search
Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell
Accurate Robot Positioning using Corrective Learning
Deep reinforcement learning
"Playing Atari with deep reinforcement learning."
By: Kevin Yu Ph.D. in Computer Engineering
Human-level control through deep reinforcement learning
Deep Reinforcement Learning in Navigation
Navigation In Dynamic Environment
Announcements Homework 3 due today (grace period through Friday)
Reinforcement Learning with Partially Known World Dynamics
CIS 488/588 Bruce R. Maxim UM-Dearborn
Dr. Unnikrishnan P.C. Professor, EEE
Reinforcement Learning
Synthesis of Motion from Simple Animations
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
Intrinsically Motivated Collective Motion
Chapter 4 . Trajectory planning and Inverse kinematics
Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.
Phase-Functioned Neural Networks for Character Control
Distributed Reinforcement Learning for Multi-Robot Decentralized Collective Construction Gyu-Young Hwang
Cengizhan Can Phoebe de Nooijer
Presentation transcript:

Emir Zeylan - 6036791 Stylianos Filippou - 6224598 DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning Emir Zeylan - 6036791 Stylianos Filippou - 6224598

Learning Physics-based locomotion skills Learning Physics-based locomotion is difficult problem Technique for achieving locomotion skills with limited prior knowledge Technique that uses Reinforcement learning

Related work Physics-based Character Control Controllers that are developed around a FSM structure and make use optimization methods to improve the results, like policy search and trajectory optimization. RL learning for Simulated Locomotion Agents in 2D and 3D physics-based simulations that learn specific tasks with minimum prior knowledge. Motion Planning Path planner that can be used to compute steering and forward speed commands to the locomotion controller to navigate in the environment

Two-level hierarchical control framework LLC (Low Level Controller): Responsible for coordinating joint torques to achieve the goal given from HLC HLC (High Level Controller): Responsible for high-level task-specific objectives SIM (Simulation): Simulation of Bipedal Locomotion Overview Two-level hierarchical control framework

Policy representation and learning Both controllers are trained with common policy learning algorithm During training an action is selected based on a stochastic or a deterministic policy. This can be done using the equation: The objective is to get an optimal policy, which maximize the rewards Rt in the long run. Policy representation and learning

Low-level controller LLC State Consists mainly of features describing the character’s configuration LLC Goal Consists a footstep plan including the position for the character swing foot, the target location and root heading for the next step. LLC Action The action aL indicates for each joint the target position for PD controllers. Low-level controller

Reference Motion Joint torques coordination has as purpose to mimic reference motion Helps to achieve a desired walking style At each timestep a reference motion provides a reference pose and a reference velocity. Multiple reference motion clips are used to achieve better result A kinematic controllers is constructed to make use of multiple motion clips. The selection of the most suitable motion is achieved with the use of the extracted clip features

Low Level Reward Reward rL helps the user to guide the behaviour of the agent by changing the reward function provided as input to the system The LLC reward rL is defined as a weighted sum of objectives that encourage the character to imitate the style of the reference motion while following the footstep plan.

Bilinear Phase Transform Helps the LLC to synchronize with reference motion and better distinguish between different phases Inspiration of bilinear pooling models The equation indicates at each phase φ the current state and the current goal that must be achieved

Low Level Network & Training LLC Network The LLC is represented by a 4-layered neural network that receives as input sL and gL, and outputs the aL. LLC Training LLC training proceeds episodically where the character is initialized to a default pose at the beginning of each episode.

High-level controller HLC State The state is consist of information regarding the character and the environment HLC Goal A specified High Level task HLC Action The process of the high level task goal(gH) produce an action (aH) which is the low level goals High-level controller

High Level Network & Training HLC Network Three convolutional layers processing the terrain map. The result is merged with the character features and the goal gH, which will be processed by two convolution layers. Finally, the last layer produces the action aH. HLC Training During training the character is initialize at a default pose and each episode end at the 200s timeout or when the character falls

High Level Tasks Path Following Given a rocky terrain needs the character to navigate through paths carved in this terrain Soccer Dribbling A task which as the name indicating it requires the character to move a ball through several random locations Pillar Obstacles Requires the character to travel across a dense area, Similar to the path following task

High Level Tasks Block Obstacles Another variant of pillar obstacles, with larger blocks Dynamic Obstacles A dynamic changing environment to reach target the location

Results LLC Performance LLC performance depence on motion clips HLC Performance Indicates that HLC is able to learn high level tasks

Contributions of the method With limited amount of prior knowledge the method: Achieves a significant robust bipedal locomotion. Achieves a significantly more natural locomotion. Achieves the ability to walk with multiple styles that can be interpolated. Achieves challenging tasks as soccer dribbling.

Conclusion Hierarchical learning-based framework for 3D bipedal walking skills with minimal prior knowledge Easy-directable control over motion style to produce highly robust controllers Reuse of controllers are allowed by hierarchical decomposition

Any questions?

Discussion

What are the limitations of this method? It proved difficult for the HLC to achieve good performance when face with a more difficult dynamic obstacles environment. Without a reference motion, the LLC fails to learn a successful walk, so it depends on them. Another limitation of LLC is that the training does not consist of stopping but only constant forward walks and turns Without the hierarchical decomposition, LLC failed to perform their respective tasks. It proved difficult for the HLC to achieve good performance when face with a more difficult dynamic obstacles environment. Without a reference motion, the LLC fails to learn a successful walk, so it depends on them. Without normalization LLC’s robustness is not as they expected(After normalizing for character weight and size dierences) //removed. TA limitation of LLC is that the training does not consist of stopping but only constant forward walks and turns Without the hierarchical decomposition, LLC failed to perform their respective tasks. To train the policies without the control hierarchy, the LLC’s inputs are augmented with gH and for the path following task, the terrain map T is also included as part of the input. Convolutional layers are added to the path following LLC. The augmented LLC’s are then trained to imitate the reference motions and perform the high-level tasks. Without the hierarchical decomposition, both LLC’s failed to perform their respective tasks The more dicult dynamic obstacles environment, proved challenging for the HLC, reaching a competent level of performance, but still prone to occasional missteps, particularly when navigating around faster moving obstacles. We note that the default LLC training consists of constant speed forward walks and turns but no stopping, which limits the options available to the HLC when avoiding obstacles.

Could we apply this method to another high level tasks? Yes if we provide the appropriate motion clips for the LLC to learn a different style if necessary and if needed some modification on the HLC (as they did for dribbling task) so that the character is encourage to achieve the specific goal. If the high level task requires the character to stop, then not because of the limitation of this method.

Can this technique be used in industry ? No, but is possible to use in medical area for patients which are struggling with walking. This technique could be applied for simulations to help the patient. Given that the character simulation consist only of moving motion with no stopping, then for games this will be inappropriate. But further improvements of this method could seen this technique to be successful in industry as already achieves really complicated tasks.