University of Science and Technology of China

University of Science and Technology of China
Hindsight Experience Replay NIPS2017 by OpenAI Ze Liu University of Science and Technology of China

Background Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL), especially for robotics. To engineer a reward function is a common challenge. It needs RL expertise and domain-specific knowledge. It is not applicable in situations where we do not know what admissible behaviour may look like. One ability humans have can learn almost as much from achieving an undesired outcome as from the desired one.

Change the goal Hindsight --> 后见之明;事后聪明

Changes: ,

Assume: Given a state s we can easily find a goal g which is satisfied in this state. More formally, we assume that there is given a mapping : A strategy for sampling goals for replay

Algorithm

Experiments

Single goal It is advisable to train on multiple goals even if we care only about one of them.

How many goals should we replay each trajectory with and how to choose them?

Deployment on a physical robot

THANK YOU

University of Science and Technology of China

Similar presentations

Presentation on theme: "University of Science and Technology of China"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Science and Technology of China

Similar presentations

Presentation on theme: "University of Science and Technology of China"— Presentation transcript:

Similar presentations

About project

Feedback