Emir Zeylan Stylianos Filippou

Emir Zeylan - 6036791 Stylianos Filippou - 6224598
DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning Emir Zeylan Stylianos Filippou

Learning Physics-based locomotion skills
Learning Physics-based locomotion is difficult problem Technique for achieving locomotion skills with limited prior knowledge Technique that uses Reinforcement learning

Related work Physics-based Character Control
Controllers that are developed around a FSM structure and make use optimization methods to improve the results, like policy search and trajectory optimization. RL learning for Simulated Locomotion Agents in 2D and 3D physics-based simulations that learn specific tasks with minimum prior knowledge. Motion Planning Path planner that can be used to compute steering and forward speed commands to the locomotion controller to navigate in the environment

Two-level hierarchical control framework
LLC (Low Level Controller): Responsible for coordinating joint torques to achieve the goal given from HLC HLC (High Level Controller): Responsible for high-level task-specific objectives SIM (Simulation): Simulation of Bipedal Locomotion Overview Two-level hierarchical control framework

Policy representation and learning
Both controllers are trained with common policy learning algorithm During training an action is selected based on a stochastic or a deterministic policy. This can be done using the equation: The objective is to get an optimal policy, which maximize the rewards Rt in the long run. Policy representation and learning

Low-level controller LLC State
Consists mainly of features describing the character’s configuration LLC Goal Consists a footstep plan including the position for the character swing foot, the target location and root heading for the next step. LLC Action The action aL indicates for each joint the target position for PD controllers. Low-level controller

Reference Motion Joint torques coordination has as purpose to mimic reference motion Helps to achieve a desired walking style At each timestep a reference motion provides a reference pose and a reference velocity. Multiple reference motion clips are used to achieve better result A kinematic controllers is constructed to make use of multiple motion clips. The selection of the most suitable motion is achieved with the use of the extracted clip features

Low Level Reward Reward rL helps the user to guide the behaviour of the agent by changing the reward function provided as input to the system The LLC reward rL is defined as a weighted sum of objectives that encourage the character to imitate the style of the reference motion while following the footstep plan.

Bilinear Phase Transform
Helps the LLC to synchronize with reference motion and better distinguish between different phases Inspiration of bilinear pooling models The equation indicates at each phase φ the current state and the current goal that must be achieved

Low Level Network & Training
LLC Network The LLC is represented by a 4-layered neural network that receives as input sL and gL, and outputs the aL. LLC Training LLC training proceeds episodically where the character is initialized to a default pose at the beginning of each episode.

High-level controller
HLC State The state is consist of information regarding the character and the environment HLC Goal A specified High Level task HLC Action The process of the high level task goal(gH) produce an action (aH) which is the low level goals High-level controller

High Level Network & Training
HLC Network Three convolutional layers processing the terrain map. The result is merged with the character features and the goal gH, which will be processed by two convolution layers. Finally, the last layer produces the action aH. HLC Training During training the character is initialize at a default pose and each episode end at the 200s timeout or when the character falls

High Level Tasks Path Following
Given a rocky terrain needs the character to navigate through paths carved in this terrain Soccer Dribbling A task which as the name indicating it requires the character to move a ball through several random locations Pillar Obstacles Requires the character to travel across a dense area, Similar to the path following task

High Level Tasks Block Obstacles
Another variant of pillar obstacles, with larger blocks Dynamic Obstacles A dynamic changing environment to reach target the location

Results LLC Performance LLC performance depence on motion clips
HLC Performance Indicates that HLC is able to learn high level tasks

Contributions of the method
With limited amount of prior knowledge the method: Achieves a significant robust bipedal locomotion. Achieves a significantly more natural locomotion. Achieves the ability to walk with multiple styles that can be interpolated. Achieves challenging tasks as soccer dribbling.

Conclusion Hierarchical learning-based framework for 3D bipedal walking skills with minimal prior knowledge Easy-directable control over motion style to produce highly robust controllers Reuse of controllers are allowed by hierarchical decomposition

Any questions?

Discussion

What are the limitations of this method?
It proved difficult for the HLC to achieve good performance when face with a more difficult dynamic obstacles environment. Without a reference motion, the LLC fails to learn a successful walk, so it depends on them. Another limitation of LLC is that the training does not consist of stopping but only constant forward walks and turns Without the hierarchical decomposition, LLC failed to perform their respective tasks. It proved difficult for the HLC to achieve good performance when face with a more difficult dynamic obstacles environment. Without a reference motion, the LLC fails to learn a successful walk, so it depends on them. Without normalization LLC’s robustness is not as they expected(After normalizing for character weight and size dierences) //removed. TA limitation of LLC is that the training does not consist of stopping but only constant forward walks and turns Without the hierarchical decomposition, LLC failed to perform their respective tasks. To train the policies without the control hierarchy, the LLC’s inputs are augmented with gH and for the path following task, the terrain map T is also included as part of the input. Convolutional layers are added to the path following LLC. The augmented LLC’s are then trained to imitate the reference motions and perform the high-level tasks. Without the hierarchical decomposition, both LLC’s failed to perform their respective tasks The more dicult dynamic obstacles environment, proved challenging for the HLC, reaching a competent level of performance, but still prone to occasional missteps, particularly when navigating around faster moving obstacles. We note that the default LLC training consists of constant speed forward walks and turns but no stopping, which limits the options available to the HLC when avoiding obstacles.

Could we apply this method to another high level tasks?
Yes if we provide the appropriate motion clips for the LLC to learn a different style if necessary and if needed some modification on the HLC (as they did for dribbling task) so that the character is encourage to achieve the specific goal. If the high level task requires the character to stop, then not because of the limitation of this method.

Can this technique be used in industry ?
No, but is possible to use in medical area for patients which are struggling with walking. This technique could be applied for simulations to help the patient. Given that the character simulation consist only of moving motion with no stopping, then for games this will be inappropriate. But further improvements of this method could seen this technique to be successful in industry as already achieves really complicated tasks.

Emir Zeylan Stylianos Filippou

Similar presentations

Presentation on theme: "Emir Zeylan Stylianos Filippou"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emir Zeylan Stylianos Filippou

Similar presentations

Presentation on theme: "Emir Zeylan Stylianos Filippou"— Presentation transcript:

Similar presentations

About project

Feedback