Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka the Helmholtz decomposition Any (curl free) flow specified with reward can only have a fixed point attractor: reward cannot specify itinerant movement or policies Value is produced by flow – not its cause: reward is a consequence of (defined by) behaviour not its cause The inherent tautology of reward: explaining behaviour in terms of maximising reward is like explaining the evolution of the eye by saying it maximises adaptive value Unresolved questions in motor control: A UCL-JHU workshop
A PhysicistAn EngineerAn Economist
Random dynamical systems Random attractors with small measure Kolmogorov forward equation Free-energy formulation Ergodic theorem Helmholtz decomposition Value and reward Free energy upper bounds expected cost
Value and reward Helmholtz decomposition Optimal control theory
Value and reward
Forward models in motor control Intrinsic frame of reference Extrinsic frame of reference hidden states control Optimal control Motor commands Efference copy Forward model State estimation Sensory mapping Cost function Plant kinetics
Predictive coding in motor control Intrinsic frame of reference Extrinsic frame of reference Optimal control Motor commands Efference copy Sensory mapping Cost function Plant kinetics Forward model Top-down predictions Bottom up prediction error sensationscontrol
Active inference Intrinsic frame of reference Extrinsic frame of reference sensations Classical reflex Corollary discharge Sensory mapping Prior beliefs Plant kinetics movements Forward model Bottom up prediction error Proprioceptive predictions
visual input proprioceptive input Action with point attractors cf., equilibria point hypothesis Descending proprioceptive predictions Exteroceptive predictions
action position (x) position (y) observation position (x) Heteroclinic cycle Action with heteroclinic cycles Descending proprioceptive predictions
Unresolved questions in motor control: A UCL-JHU workshop Reasons to be careful about reward A flow (policy) cannot be specified with a scalar function of states: the fundamental theorem of vector calculus – aka the Helmholtz decomposition Any (curl free) flow specified with reward can only have a fixed point attractor: reward cannot specify itinerant movement or policies Value is produced by flow – not its cause: reward is a consequence of (defined by) behaviour not its cause The inherent tautology of reward: explaining behaviour in terms of maximising reward is like explaining the evolution of the eye by saying it maximises adaptive value – cf., Intelligent design