Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Science and Technology of China

Similar presentations


Presentation on theme: "University of Science and Technology of China"— Presentation transcript:

1 University of Science and Technology of China
Hindsight Experience Replay NIPS2017 by OpenAI Ze Liu University of Science and Technology of China

2 Background Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL), especially for robotics. To engineer a reward function is a common challenge. It needs RL expertise and domain-specific knowledge. It is not applicable in situations where we do not know what admissible behaviour may look like. One ability humans have can learn almost as much from achieving an undesired outcome as from the desired one.

3 Change the goal Hindsight --> 后见之明;事后聪明

4 Changes: ,

5 Assume: Given a state s we can easily find a goal g which is satisfied in this state. More formally, we assume that there is given a mapping : A strategy for sampling goals for replay

6 Algorithm

7 Experiments

8

9

10

11 Single goal It is advisable to train on multiple goals even if we care only about one of them.

12 How many goals should we replay each trajectory with and how to choose them?

13 Deployment on a physical robot

14 THANK YOU


Download ppt "University of Science and Technology of China"

Similar presentations


Ads by Google