Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical and Computer Engineering Douglas Hittle, Department of Mechanical Engineering Colorado State University, Fort Collins, CO Reinforcement Learning Agent in Parallel with Controller Reinforcement learning algorithm guides adjustment of actor's weights. IQC places bounding box in weight space of actor network, beyond which stability has not been verified. Incorporating Time-Varying IQC in Reinforcement Learning weight space (high-dimensional) initial guaranteed- stable region Step 1 initial weight vector Step 0 trajectory of weights while learning Step 2 must find new stable region Step 3 next guaranteed-stable region Step 4 Now learning can continue until edge of new bounding box is encountered. Step 5 … weight space (high-dimensional) UNSTABLE REGION ! final weight vector weight trajectory with robust contstraints weight trajectory without robust contstraints Trajectory of Weights and Bounds on Regions of StabilityB C D E A initial weight vector Motivation Robust control theory Guarantees stability Results in less aggressive controllers Reinforcement learning Optimizes the performance of a controller No guarantee of stability while learning Experimental HVAC System Reinforcement Learning Subtract right side from left to get algorithm for updating Q Replace expectation with sample (Monte Carlo approach) Temporal-difference error action state policy function value function discount factor reinforcement (|error|) Robust Control based on IQCs Uncertainties (D) Contoller/Plant (M) v w An Integral Quadratic Constraint (IQC) describes the relationship between signals as Stability of the closed loop system is guaranteed if for all w and for e > 0. Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem. Reference Output Good response NominalPerturbed Terrible response Robust Reinforcement Learning Perturbed case, no learning Perturbed case, with learning Through learning, controller has been fine-tuned to actual dynamics of real plant without losing guarantee of stability ! Sum Squared Error Nominal Controller Robust Controller Robust RL Controller Conclusions IQC bounds on parameters of tanh and sigmoid networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability) Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems. Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability. Initial, conservative robust controller becomes more aggressive through adaptation to actual physical system. See Integral Quadratic Constraints M 1122 M M ( ) ( ) Neural Net and Robust Control with IQCs Bounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blue Robust controller and plant in red First Example Second Example Without robust constraints, becomes unstable before learning final stable solution. Third Example Distillation Column Fourth Example 1 st Order 2 nd Order Without robust constraints, becomes unstable before learning final stable solution.