Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Introduction to Data Assimilation NCEO Data-assimilation training days 5-7 July 2010 Peter Jan van Leeuwen Data Assimilation Research Center (DARC) University.

Mobile Robot Localization and Mapping using the Kalman Filter

PARAMETER ESTIMATION FOR ODES USING A CROSS-ENTROPY APPROACH Wayne Enright Bo Wang University of Toronto.

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Various Regularization Methods in Computer Vision Min-Gyu Park Computer Vision Lab. School of Information and Communications GIST.

Biointelligence Laboratory, Seoul National University

Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.

Reinforcement Learning

Robot Localization Using Bayesian Methods

IR Lab, 16th Oct 2007 Zeyn Saigol

Separating Hyperplanes

Linear Models for Classification: Probabilistic Methods

An Introduction to Markov Decision Processes Sarah Hickmott

Hidden Markov Models Hidden Markov Models Supplement to the Probabilistic Graphical Models Course 2009 School of Computer Science and Engineering Seoul.

280 SYSTEM IDENTIFICATION The System Identification Problem is to estimate a model of a system based on input-output data. Basic Configuration continuous.

Stanford CS223B Computer Vision, Winter 2007 Lecture 12 Tracking Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

SLAM: Simultaneous Localization and Mapping: Part I Chang Young Kim These slides are based on: Probabilistic Robotics, S. Thrun, W. Burgard, D. Fox, MIT.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Stanford CS223B Computer Vision, Winter 2007 Lecture 12 Tracking Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.

Particle Filtering for Non- Linear/Non-Gaussian System Bohyung Han

Lecture 10: Support Vector Machines

Kalman Filtering Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.

Optimality in Motor Control By : Shahab Vahdat Seminar of Human Motor Control Spring 2007.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

AUTOMATIC CONTROL THEORY II Slovak University of Technology Faculty of Material Science and Technology in Trnava.

Markov Localization & Bayes Filtering

Biointelligence Laboratory, Seoul National University

Computer vision: models, learning and inference Chapter 19 Temporal models.

Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München.

Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.

Probabilistic Robotics Bayes Filter Implementations Gaussian filters.

Probabilistic Robotics Bayes Filter Implementations.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

Robotics Research Laboratory 1 Chapter 7 Multivariable and Optimal Control.

Biointelligence Laboratory, Seoul National University

Linear Models for Classification

Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin

An Introduction To The Kalman Filter By, Santhosh Kumar.

Nonlinear State Estimation

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Control Theory Lachlan Blackhall and Tyler Summers.

Probabilistic Robotics Bayes Filter Implementations Gaussian filters.

Bayesian Brain - Chapter 11 Neural Models of Bayesian Belief Propagation Rajesh P.N. Rao Summary by B.-H. Kim Biointelligence Lab School of.

Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton

Biointelligence Laboratory, Seoul National University

Ch 12. Continuous Latent Variables ~ 12

PSG College of Technology

Course: Autonomous Machine Learning

Autonomous Cyber-Physical Systems: Reinforcement Learning for Planning

Probabilistic Models for Linear Regression

Markov Decision Processes

Dynamic Programming Lecture 13 (5/31/2017).

Markov Decision Processes

Filtering and State Estimation: Basic Concepts

Chapter 17 – Making Complex Decisions

Biointelligence Laboratory, Seoul National University

Adaptive Cooperative Systems Chapter 6 Markov Random Fields

Presentation transcript:

Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao Summarized by Seung-Joon Yi

© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Chapter overview Discrete Control Dynamic programming Value iteration / Policy iteration Markov decision process Continuous Control The Hamilton-Jacobi-Bellman equation Deterministic Control Pontryagin’s Maximum Principle Linear-Quadratic-Gaussian Control Riccati equations © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Discrete control setting State: Action: Future state: Cost: Problem: find an action sequence and corresponding state sequence minimizing the total cost © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dynamic Programming Bellman optimality principle If the given state-action sequence is optimal, its subsequence generated by removing its first state and action is also optimal. The optimal value function The Bellman equations for the optimal policy Acyclic graph: can be directly calculated by backward tracking © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Value iteration and Policy iteration Relaxation scheme for graphs with loops Value iteration update Policy iteration update Both algorithms are proved to converge in finite steps © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Markov Decision Process Stochastic transition case Transition function Value function Markov decision process An optimal control problem with discrete state and stochastic state transitions © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Continuous state control Real-valued state: Real-valued control: Controlled Ito diffusion process Total cost function © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

The Hamilton-Jacobi-Bellman equation Apply DP approach to the time-discretized stochastic problem The resulting HJB equation 테일러 1차 근사 위: 벨만 equation for cont. case © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Solving the HJB equation A nonlinear, second-order PDE w.r.t. the unknown function v Do not always have classic solutions Many weak solutions can exist The idea of viscosity solutions provides a reassuring answer Parametric method for approximate solution © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Infinite-horizon case Discounted cost formulation Average-cost-per-stage formulation © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Pontrygin’s Maximum principle Two fundamental ideas of the optimal control theory Bellman’s DP and optimality principle Pontryagin’s maximum principle The Maximum principle Applies only to deterministic problems Yields the same solutions as DP However, the MP avoids the curse of dimensionality! © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Continuous-time maximum principle HJB equation for deterministic dynamics : the costate vector The maximum principle ODE / PDE Linear dynamics, quadratic cost: 해밀토니안 최소화가 간단 © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Discrete-time maximum principle Discrete-time optimal control problem The maximum principle Can be solved using gradient descent © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Linear-Quadratic-Gaussian control LQG case Linear dynamics Quadratic costs Additive Gaussian noise Rare closed-form optimal control law Quadratic optimal value function Allows minimization of the Hamiltonian in closed form © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Continuous case LQG condition Guess of the optimal VF in parametric form Optimal control law Continuous time Riccati equation Optimal control law does not depend on the noise variance S -Deterministic case: LQR (Linear-quadratic regular controller) © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Discrete case LQR condition (deterministic) Guess for the optimal VF: Optimal control law: Discrete-time Riccati equation: Optimal control law: linear in x, can be computed offline (Control gain) © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Optimal estimation and Kalman filter The dual to the optimal control problem Kalman filter The most widely used estimator Objective: compute the posterior given observations Kalman filter result -continuous case: Kalman-Bucy filter Square-root filter Information filter Kalman smoother: HMM과 관련 Optimal in many ways © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Beyond the Kalman filter Nonlinear dynamics, non-Gaussian noise, etc. Extended Kalman filter Uses local linearization centered at the current state estimate Unscented filter Uses deterministic sampling Particle filtering Propagates a cloud of points sampled from the posterior © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Duality of optimal control and optimal estimation LQR controller and Kalman filter Two riccati equations Optimal Control and MAP smoothing LQG Control and Kalman smoothing © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Optimal control as a theory of biological movement Brain generates the best behavior it can, subject to the constraints imposed by the body and environment. We can assume that, at least in natural and well-practived tasks, the observed behavior will be close to optimal. Minimum-energy, minimum-jerk, minimum-torque-change models etc. Research Directions Motor learning and adaptation Neural implementation of the optimal control laws Distributed and hierarchical control Inverse optimal control © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/