Download presentation
Presentation is loading. Please wait.
Published byPatricia Stokes Modified over 8 years ago
1
Planning Policies Using Dynamic Optimization Chris Atkeson 2012
2
Example: One Link Swing Up
3
One Link Swing Up State: Action: Cost function:
4
Possible Trajectories
5
What is a policy? Function mapping state to command: u(x)
6
Policy
7
How can we compute a policy? Optimize trajectory from every starting point. The value function is the cost of each of those trajectories. Parameterize the policy u(x,p) and optimize the parameters for some distribution of initial conditions. Dynamic programming.
8
Optimize Trajectory From Every Cell
9
Value Function
10
Types of tasks Regulator tasks: want to stay at x d Trajectory tasks: go from A to B in time T, or attain goal set G Periodic tasks: cyclic behavior such as walking
11
Ways to Parameterize Policies Linear function u(x,p) = p T x = Kx Table Polynomial (nonlinear controller) Associated with trajectory –u(t) = u ff (t) + K(t)(x – x d (t)) Associated with trajectory(ies) –u(x) = u nn (x) + K nn (x)(x – x d nn (x)) nn: nearest neighbor …
12
Optimizing Policies Using Function Optimization
13
Policy Search Parameterized policy u = (x,p), p is vector of adjustable parameters. Simplest approach: Run it for a while, and measure total cost. Use favorite function optimization approach to search for best p. There are tricks to improve policy comparison, such as using the same perturbations in different trials, and terminating trial early if really bad (racing algorithms).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.