BTSM Seminar (Thu) Summarized by Joon Shik Kim

Slides:



Advertisements
Similar presentations
An Introduction to Markov Decision Processes Sarah Hickmott
Advertisements

L7: Stochastic Process 1 Lecture 7: Stochastic Process The following topics are covered: –Markov Property and Markov Stochastic Process –Wiener Process.
Entropy Rates of a Stochastic Process
Planning under Uncertainty
CSCE 641: Forward kinematics and inverse kinematics Jinxiang Chai.
Stochastic Processes A stochastic process describes the way a variable evolves over time that is at least in part random. i.e., temperature and IBM stock.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Discrete random variables Probability mass function Distribution function (Secs )
SLAM: Simultaneous Localization and Mapping: Part I Chang Young Kim These slides are based on: Probabilistic Robotics, S. Thrun, W. Burgard, D. Fox, MIT.
Comparative survey on non linear filtering methods : the quantization and the particle filtering approaches Afef SELLAMI Chang Young Kim.
Manipulator Dynamics Amirkabir University of Technology Computer Engineering & Information Technology Department.
Thermo & Stat Mech - Spring 2006 Class 27 1 Thermodynamics and Statistical Mechanics Random Walk.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Probability theory 2010 Conditional distributions  Conditional probability:  Conditional probability mass function: Discrete case  Conditional probability.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Chapter 13 Stochastic Optimal Control The state of the system is represented by a controlled stochastic process. Section 13.2 formulates a stochastic optimal.
The Monte Carlo Method: an Introduction Detlev Reiter Research Centre Jülich (FZJ) D Jülich
Linear, Exponential, and Quadratic Functions. Write an equation for the following sequences.
13-1 Introduction to Quadratic Equations  CA Standards 14.0 and 21.0  Quadratic Equations in Standard Form.
CHAPTER 7 NON-LINEAR CONDUCTION PROBLEMS
Entropy Rate of a Markov Chain
The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
CH12- WIENER PROCESSES AND ITÔ'S LEMMA
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
ENCI 303 Lecture PS-19 Optimization 2
20/10/2009 IVR Herrmann IVR: Introduction to Control OVERVIEW Control systems Transformations Simple control algorithms.
Robotics Chapter 5 – Path and Trajectory Planning
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Nonlinear localization of light in disordered optical fiber arrays
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
The Logistic Growth SDE. Motivation  In population biology the logistic growth model is one of the simplest models of population dynamics.  To begin.
+ Numerical Integration Techniques A Brief Introduction By Kai Zhao January, 2011.
Learning Theory Reza Shadmehr Optimal feedback control stochastic feedback control with and without additive noise.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
Path Integral Quantum Monte Carlo Consider a harmonic oscillator potential a classical particle moves back and forth periodically in such a potential x(t)=
An Introduction To The Kalman Filter By, Santhosh Kumar.
Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.
Fokker-Planck Equation and its Related Topics
Lecture Fall 2001 Controlling Animation Boundary-Value Problems Shooting Methods Constrained Optimization Robot Control.
Monte Carlo Path Tracing
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Chapter 31 Conditional Probability & Conditional Expectation Conditional distributions Computing expectations by conditioning Computing probabilities by.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Energy-efficient Scheduling policy for collaborative execution in mobile cloud computing INFOCOM '13.
Sergei Nechaev LPTMS, Orsay Thanks to: K. Khanin (Toronto) G. Oshanin (Jussieu) A. Sobolevski (Poncelet Lab, Moscow) O. Vasilyev (Stuttgardt) Shocks in.
Chapter 13 Wiener Processes and Itô’s Lemma 1. Stochastic Processes Describes the way in which a variable such as a stock price, exchange rate or interest.
Modelling Complex Systems Video 4: A simple example in a complex way.
S5.40. Module Structure 30% practical tests / 70% written exam 3h lectures / week (except reading week) 3 x 2h of computer labs (solving problems practicing.
Root Finding Methods Fish 559; Lecture 15 a.
Green’s Function Monte Carlo Fall 2013
High Performance Computing and Monte Carlo Methods
Chapter 2. Mathematical Expression of Conduction
Markov Decision Processes
Unfolding Problem: A Machine Learning Approach
Brownian Motion for Financial Engineers
Markov Decision Processes
Random WALK, BROWNIAN MOTION and SDEs
Hidden Markov Models Part 2: Algorithms
Discrete-time markov chain (continuation)
Hidden Markov Model LR Rabiner
(Thu) Computational Models of Intelligence Joon Shik Kim
The free-energy principle: a rough guide to the brain? K Friston
Chapter 14 Wiener Processes and Itô’s Lemma
Statistical Data Analysis: II
CS 416 Artificial Intelligence
Chapter 4 . Trajectory planning and Inverse kinematics
Presentation transcript:

BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim

Introduction Optimising a sequence of actions to attain some future goal is the general topic of control theory. In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms. The first is a path cost that specifies the energy consumption to contract the muscles. The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it. The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.

Discrete Time Control (1/3) where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. A cost function that assigns a cost to each sequence of controls where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.

Discrete Time Control (3/3) The problem of optimal control is to find the sequence u0:T-1 that minimises C(x0, u0:T-1). The optimal cost-to-go

Discrete Time Control (1/3) The algorithm to compute the optimal control, trajectory, and the cost is given by 1. Initialization: 2. Backwards: For t=T-1,…,0 and for x compute 3. Forwards: For t=0,…,T-1 compute

The HJB Equation (1/2) (Hamilton-Jacobi-Belman equation) The optimal control at the current x, t is given by Boundary condition is

The HJB Equation (2/2) Optimal control of mass on a spring

Stochastic Differential Equations (1/2) Consider the random walk on the line with x0=0. In a closed form, . In the continuous time limit we define The conditional probability distribution (Wiener Process)

Stochastic Optimal Control Theory (2/2) dξ is a Wiener process with . Since <dx2> is of order dt, we must make a Taylor expansion up to order dx2. Stochastic Hamilton-Jacobi-Bellman equation : drift : diffusion

Path Integral Control (1/2) In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transformation of the cost-to-go. HJB becomes

Path Integral Control (2/2) Let describe a diffusion process for defined Fokker-Planck equation (1)

The Diffusion Process as a Path Integral (1/2) Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ. Sampling process and Monte Carlo With probability 1-V(x,t)dt/λ, with probability V(x,t)/λ, in this case, path is killed.

The Diffusion Process as a Path Integral (2/2) where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

Discussion One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function. The path integral method has great potential for application in robotics.