Planning Policies Using Dynamic Optimization  Chris Atkeson 2012.

Planning Policies Using Dynamic Optimization  Chris Atkeson 2012

Example: One Link Swing Up

One Link Swing Up State: Action: Cost function:

Possible Trajectories

What is a policy? Function mapping state to command: u(x)

Policy

How can we compute a policy? Optimize trajectory from every starting point. The value function is the cost of each of those trajectories. Parameterize the policy u(x,p) and optimize the parameters for some distribution of initial conditions. Dynamic programming.

Optimize Trajectory From Every Cell

Value Function

Types of tasks Regulator tasks: want to stay at x d Trajectory tasks: go from A to B in time T, or attain goal set G Periodic tasks: cyclic behavior such as walking

Ways to Parameterize Policies Linear function u(x,p) = p T x = Kx Table Polynomial (nonlinear controller) Associated with trajectory –u(t) = u ff (t) + K(t)(x – x d (t)) Associated with trajectory(ies) –u(x) = u nn (x) + K nn (x)(x – x d nn (x)) nn: nearest neighbor …

Optimizing Policies Using Function Optimization

Policy Search Parameterized policy u =  (x,p), p is vector of adjustable parameters. Simplest approach: Run it for a while, and measure total cost. Use favorite function optimization approach to search for best p. There are tricks to improve policy comparison, such as using the same perturbations in different trials, and terminating trial early if really bad (racing algorithms).

Planning Policies Using Dynamic Optimization  Chris Atkeson 2012.

Similar presentations

Presentation on theme: "Planning Policies Using Dynamic Optimization  Chris Atkeson 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Planning Policies Using Dynamic Optimization  Chris Atkeson 2012.

Similar presentations

Presentation on theme: "Planning Policies Using Dynamic Optimization  Chris Atkeson 2012."— Presentation transcript:

Similar presentations

About project

Feedback