Download presentation
Presentation is loading. Please wait.
1
Learning optimal behavior
Twan van Laarhoven
2
AIBO robot Walking Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion Nate Kohl, Peter Stone (2004) Goal: speed Why do you want this?
4
Parameterization 12 parameters Front ellipse Rear ellipse Height body
etc. Front locus (3 parameters: height, x-pos., y-pos.) Rear locus (3 parameters) Locus length Locus skew multiplier in the x-y plane (for turning) The height of the front of the body The height of the rear of the body The time each foot takes to move through its locus The fraction of time each foot spends on the ground
5
Learning No simulator Not a MDP Gradient Reinforcement Learning
test on actual AIBO expensive Not a MDP No Q-Learning Gradient Reinforcement Learning
6
Gradient Reinforcement Learning
Parameter vector: π = {θ1, …, θN} Random policies: Ri = {θ1+ Δ1, …, θN + ΔN} Each parameter: S-ε,n / S0,n / S+ε,n Averages: Avg-ε,n =avgscore(S-ε,n) Adjust: An= 0 or Avg+ε,n − Avg-ε,n Repeat
8
Conclusion Gradient Reinforcement Learning is very simple and gives good results Evaluation can be done in parallel
9
Learning from experts Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation Pieter Abbeel, Dmitri Dolgov, Andrew Y. Ng, Sebastian Thrun (2008)
10
Parking lot navigation
Path planning Many cost functions length backward smoothness off road etc.
11
Cost functions forward length: fwd = ∑fwd || xi - xi-1 ||
reverse length: rev = ∑rev || xi - xi-1 || off-road: road = ∑⌐road(i) || xi - xi-1 || curvature: curv = ∑ (Δxi+1 - Δxi)2 in lane: lane = ∑ D(xi, θi, G) direction: dir = ∑ sin2 (2(θi - αi))
12
Path planning Two step approach Coarse A* search Refinement
13
Cost and paths Total cost: Best path: argmins∈S Φ(s)
Many cost functions how to weigh them? learn from examples Goal: match cost: k(s) ≈ k(sE)
14
Apprenticeship learning
random weights: w(0) = random find paths: si = argmins Φ(s) sum costs: μ(i)k = ∑ k(si) find new weights: w(j+1) ≥ μk – μEk repeat until ||w(j+1)|| ≤ ε
15
Results Nice Sloppy Backwards
16
Results Nice Sloppy Backwards
17
Results Nice Sloppy Backwards
18
Conclusion Always performs as well as expert!
||μ – μE|| ≤ ||w|| ≤ ε Algorithm is difficult to understand Paper uses confusing notation
19
EOF
20
More information Apprenticeship learning via inverse reinforcement learning Pieter Abbeel, Andrew Y. Ng Maximal margin
21
More information Apprenticeship learning via inverse reinforcement learning Pieter Abbeel, Andrew Y. Ng Projection method
22
Apprenticeship learning
random weights: w(0) = random find path(s): si = argmins ∑ w(i)k k(s) sum costs: μ(i)k = ∑ k(si) find weights: minw,x ||w|| st. μk = ∑ xjμ(j)k wk ≥ μk – μEk repeat: w(j+1) = w / ||w|| combine
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.