Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning Policies Using Dynamic Optimization  Chris Atkeson 2012.

Similar presentations


Presentation on theme: "Planning Policies Using Dynamic Optimization  Chris Atkeson 2012."— Presentation transcript:

1 Planning Policies Using Dynamic Optimization  Chris Atkeson 2012

2 Example: One Link Swing Up

3 One Link Swing Up State: Action: Cost function:

4 Possible Trajectories

5 What is a policy? Function mapping state to command: u(x)

6 Policy

7 How can we compute a policy? Optimize trajectory from every starting point. The value function is the cost of each of those trajectories. Parameterize the policy u(x,p) and optimize the parameters for some distribution of initial conditions. Dynamic programming.

8 Optimize Trajectory From Every Cell

9 Value Function

10 Types of tasks Regulator tasks: want to stay at x d Trajectory tasks: go from A to B in time T, or attain goal set G Periodic tasks: cyclic behavior such as walking

11 Ways to Parameterize Policies Linear function u(x,p) = p T x = Kx Table Polynomial (nonlinear controller) Associated with trajectory –u(t) = u ff (t) + K(t)(x – x d (t)) Associated with trajectory(ies) –u(x) = u nn (x) + K nn (x)(x – x d nn (x)) nn: nearest neighbor …

12 Optimizing Policies Using Function Optimization

13 Policy Search Parameterized policy u =  (x,p), p is vector of adjustable parameters. Simplest approach: Run it for a while, and measure total cost. Use favorite function optimization approach to search for best p. There are tricks to improve policy comparison, such as using the same perturbations in different trials, and terminating trial early if really bad (racing algorithms).


Download ppt "Planning Policies Using Dynamic Optimization  Chris Atkeson 2012."

Similar presentations


Ads by Google