Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning optimal behavior

Similar presentations


Presentation on theme: "Learning optimal behavior"— Presentation transcript:

1 Learning optimal behavior
Twan van Laarhoven

2 AIBO robot Walking Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion Nate Kohl, Peter Stone (2004) Goal: speed Why do you want this?

3

4 Parameterization 12 parameters Front ellipse Rear ellipse Height body
etc. Front locus (3 parameters: height, x-pos., y-pos.) Rear locus (3 parameters) Locus length Locus skew multiplier in the x-y plane (for turning) The height of the front of the body The height of the rear of the body The time each foot takes to move through its locus The fraction of time each foot spends on the ground

5 Learning No simulator Not a MDP Gradient Reinforcement Learning
test on actual AIBO expensive Not a MDP No Q-Learning Gradient Reinforcement Learning

6 Gradient Reinforcement Learning
Parameter vector: π = {θ1, …, θN} Random policies: Ri = {θ1+ Δ1, …, θN + ΔN} Each parameter: S-ε,n / S0,n / S+ε,n Averages: Avg-ε,n =avgscore(S-ε,n) Adjust: An= 0 or Avg+ε,n − Avg-ε,n Repeat

7

8 Conclusion Gradient Reinforcement Learning is very simple and gives good results Evaluation can be done in parallel

9 Learning from experts Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation Pieter Abbeel, Dmitri Dolgov, Andrew Y. Ng, Sebastian Thrun (2008)

10 Parking lot navigation
Path planning Many cost functions length backward smoothness off road etc.

11 Cost functions forward length: fwd = ∑fwd || xi - xi-1 ||
reverse length: rev = ∑rev || xi - xi-1 || off-road: road = ∑⌐road(i) || xi - xi-1 || curvature: curv = ∑ (Δxi+1 - Δxi)2 in lane: lane = ∑ D(xi, θi, G) direction: dir = ∑ sin2 (2(θi - αi))

12 Path planning Two step approach Coarse A* search Refinement

13 Cost and paths Total cost: Best path: argmins∈S Φ(s)
Many cost functions how to weigh them? learn from examples Goal: match cost: k(s) ≈ k(sE)

14 Apprenticeship learning
random weights: w(0) = random find paths: si = argmins Φ(s) sum costs: μ(i)k = ∑ k(si) find new weights: w(j+1) ≥ μk – μEk repeat until ||w(j+1)|| ≤ ε

15 Results Nice Sloppy Backwards

16 Results Nice Sloppy Backwards

17 Results Nice Sloppy Backwards

18 Conclusion Always performs as well as expert!
||μ – μE|| ≤ ||w|| ≤ ε Algorithm is difficult to understand Paper uses confusing notation

19 EOF

20 More information Apprenticeship learning via inverse reinforcement learning Pieter Abbeel, Andrew Y. Ng Maximal margin

21 More information Apprenticeship learning via inverse reinforcement learning Pieter Abbeel, Andrew Y. Ng Projection method

22 Apprenticeship learning
random weights: w(0) = random find path(s): si = argmins ∑ w(i)k k(s) sum costs: μ(i)k = ∑ k(si) find weights: minw,x ||w|| st. μk = ∑ xjμ(j)k wk ≥ μk – μEk repeat: w(j+1) = w / ||w|| combine


Download ppt "Learning optimal behavior"

Similar presentations


Ads by Google