Presentation is loading. Please wait.

Presentation is loading. Please wait.

Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell

Similar presentations


Presentation on theme: "Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell"— Presentation transcript:

1 Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell
Reinforcement Learning in Robotics: Applications and Real-World Challenges Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell 2013 Presenter: Wei Zhang

2 Outline What is Reinforcement Learning (RL) ? Applications using RL
RL in robotics: Archery aiming task Future of RL

3 What is Reinforcement Learning (RL) ?
Trial-and-error, not supervised learning We have agent and environment Three unique features: Rewards Policy function Value function

4 Agent and Environment Picture from the book: Reinforcement Learning: An Introduction

5 Rewards A reward function maps a perceived state of the environment to a single number,a reward, indicate the desirability of the state. The reward can be used to calculate the return function: Gt=Rt+1+Rt+2+Rt+3+……+RT This can be used to calculate the Value function in later stage.

6 Value Function For a Markov Decision Process (MDP), the value function can be calculated as : Then the action-value function can be calculated as following:

7 Policy Function Policy function maps from a perceived states of the environment to actions that an agent can take when in those states. Greedy algorithm: always take the action that maximize the rewards in long run. Exploring algorithm: try new action without prior knowledge.This might have lower rewards as a result.

8 Application in RL Atari Alpha Go Games Pancake Flipping Task
Bipedal Walking Energy Minimization Task Archery Aiming Task

9 Archery Aiming Task Two algorithm used:
Expectation-Maximization (EM) RL algorithm called: Policy learning by weighting algorithm exploration with the returns (PoWER) The reward function is defined as: Drawback: single dimension, take longer to learn.

10 Archery Aiming Task Second algorithm used:
Augmented Reward Chained Regression (ARCHER) We have reward r in 2d and Θ of relative position of the hands in 3d. We define: We can then calculate r1,T: Then, we learn weights w and apply in:

11 Archery Aiming Task

12 Archery Aiming Task

13 Archery Aiming Task

14 Bipedal Energy Minimization

15 Pancake Flipping Task

16 Future of RL Not only in games, but also in medical fields Multi-task
Robotics

17 Thank you!


Download ppt "Petar Kormushev, Sylvain Calinon and Darwin G. Caldwell"

Similar presentations


Ads by Google