Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao Summarized by Seung-Joon Yi
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Chapter overview Discrete Control Dynamic programming Value iteration / Policy iteration Markov decision process Continuous Control The Hamilton-Jacobi-Bellman equation Deterministic Control Pontryagin’s Maximum Principle Linear-Quadratic-Gaussian Control Riccati equations © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Discrete control setting State: Action: Future state: Cost: Problem: find an action sequence and corresponding state sequence minimizing the total cost © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dynamic Programming Bellman optimality principle If the given state-action sequence is optimal, its subsequence generated by removing its first state and action is also optimal. The optimal value function The Bellman equations for the optimal policy Acyclic graph: can be directly calculated by backward tracking © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Value iteration and Policy iteration Relaxation scheme for graphs with loops Value iteration update Policy iteration update Both algorithms are proved to converge in finite steps © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Markov Decision Process Stochastic transition case Transition function Value function Markov decision process An optimal control problem with discrete state and stochastic state transitions © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Continuous state control Real-valued state: Real-valued control: Controlled Ito diffusion process Total cost function © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
The Hamilton-Jacobi-Bellman equation Apply DP approach to the time-discretized stochastic problem The resulting HJB equation 테일러 1차 근사 위: 벨만 equation for cont. case © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Solving the HJB equation A nonlinear, second-order PDE w.r.t. the unknown function v Do not always have classic solutions Many weak solutions can exist The idea of viscosity solutions provides a reassuring answer Parametric method for approximate solution © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Infinite-horizon case Discounted cost formulation Average-cost-per-stage formulation © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Pontrygin’s Maximum principle Two fundamental ideas of the optimal control theory Bellman’s DP and optimality principle Pontryagin’s maximum principle The Maximum principle Applies only to deterministic problems Yields the same solutions as DP However, the MP avoids the curse of dimensionality! © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Continuous-time maximum principle HJB equation for deterministic dynamics : the costate vector The maximum principle ODE / PDE Linear dynamics, quadratic cost: 해밀토니안 최소화가 간단 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Discrete-time maximum principle Discrete-time optimal control problem The maximum principle Can be solved using gradient descent © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Linear-Quadratic-Gaussian control LQG case Linear dynamics Quadratic costs Additive Gaussian noise Rare closed-form optimal control law Quadratic optimal value function Allows minimization of the Hamiltonian in closed form © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Continuous case LQG condition Guess of the optimal VF in parametric form Optimal control law Continuous time Riccati equation Optimal control law does not depend on the noise variance S -Deterministic case: LQR (Linear-quadratic regular controller) © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Discrete case LQR condition (deterministic) Guess for the optimal VF: Optimal control law: Discrete-time Riccati equation: Optimal control law: linear in x, can be computed offline (Control gain) © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Optimal estimation and Kalman filter The dual to the optimal control problem Kalman filter The most widely used estimator Objective: compute the posterior given observations Kalman filter result -continuous case: Kalman-Bucy filter Square-root filter Information filter Kalman smoother: HMM과 관련 Optimal in many ways © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Beyond the Kalman filter Nonlinear dynamics, non-Gaussian noise, etc. Extended Kalman filter Uses local linearization centered at the current state estimate Unscented filter Uses deterministic sampling Particle filtering Propagates a cloud of points sampled from the posterior © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Duality of optimal control and optimal estimation LQR controller and Kalman filter Two riccati equations Optimal Control and MAP smoothing LQG Control and Kalman smoothing © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Optimal control as a theory of biological movement Brain generates the best behavior it can, subject to the constraints imposed by the body and environment. We can assume that, at least in natural and well-practived tasks, the observed behavior will be close to optimal. Minimum-energy, minimum-jerk, minimum-torque-change models etc. Research Directions Motor learning and adaptation Neural implementation of the optimal control laws Distributed and hierarchical control Inverse optimal control © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/