Lectures 5 & 6: Least Squares Parameter Estimation

Name: Lectures 5 & 6: Least Squares Parameter Estimation
Uploaded: 2017-07-13T13:32:07+00:00
Duration: PTM24S52
Description: Lectures 5 & 6: Least Squares Parameter Estimation

Lectures 5 & 6: Least Squares Parameter Estimation
f(q) Dr Martin Brown Room: E1k, Control Systems Centre Telephone: EE-M /7, EF L5&6

L5&6: Resources Core texts Ljung, Chapters 4&7 Norton, Chapter 4
On-line, Chapters 4&5 In these two lectures, we’re looking at basic discrete time representations of linear, time invariant plants and models and seeing how their parameters can be estimated using the normal equations. The key example is the first order, linear, stable RC electrical circuit which we met last week, and which has an exponential response. EE-M /7, EF L5&6

L5&6: Learning Objectives
L5 Linear models and quadratic performance criterion ARX & ARMAX discrete-time, linear systems Predictive models, regression and exemplar data Residual signal Performance criterion L6 Normal equations, interpretation and properties Quadratic cost functions Derive the normal equations for parameter estimation Examples We’re not too concerned with system dynamics today, we’re concentrating on the general form of least squares parameter estimation EE-M /7, EF L5&6

Introduction to Parametric System Identification
In a full, physical, linear model, the model’s structure and coefficients can be determined from first principles In most cases, we have to estimate/tune the parameters because of an incomplete understanding about the full system (unknown drag, …) We can use exemplar data (input/output examples), {x(t), y(t)}, to estimate the unknown parameters Initially assume that the structure is known (unrealistic, but …), and all that remains to be estimated are the parameter values. ^ Model q y(t) u(t) ^ w(t) y(t) Plant q e(t) v(t) EE-M /7, EF L5&6

Recursive Parameter Estimation Framework
y(t) ^ Model q(t-1) ^ - + u(t) y(t) Controller Plant q + + w(t) e(t) v(t) where: q, q(t-1) are the real and estimated parameter vectors, respectively. u(t) is the control input sequence y(t), y(t), are the real and estimated outputs, respectively e(t) is a white noise sequence (output/measurement noise) w(t) is the disturbances from measurable sources ^ ^ EE-M /7, EF L5&6

Basic Assumptions in System Identification
It is assumed that the unobservable disturbances can be aggregated and represented by a single additive noise e(t). There may also be input noise. Generally, it is assumed to be zero-mean, Gaussian The system is assumed to be linear with time-invariant parameters, so q is not time-varying. This is only approximately true within certain limits The input signal u(t) is assumed exactly known. Often there is noise associated with reading/measuring it The system noise e(t) is assumed to be uncorrelated with the input process u(t). This is unlikely to be true for instance to due feedback of y(t) The input signals need to be sufficiently exciting, they need to excite all relevant modes in the model for identification and testing EE-M /7, EF L5&6

Discrete-Time Transfer Function Models
On this course, we’re primarily concerned with discrete time signals and systems. Real-world physical, mechanical, electrical systems are continuous Consider the CT resistor-capacitor circuit: So let q-1 denote the backward shift operator q-1y(t)=y(t-1), then we have NB we can use the c2d() Matlab function to go from the continuous time (transfer function, state space) domain to the discrete time, z-domain. EE-M /7, EF L5&6

Transfer Function/ARX DT LTI Model
The previous model is an example of an AutoRegressive with eXogenous input), which can be expressed more generally as: Some comments about the form of this model. The degree of the polynomials determines the complexity of the system’s response and the number of parameters that have to be estimated. The roots of A(q) determine system stability a0=1, without loss of generality, so the model can be written as a predictive model y(t) = y(t-1) + … + u(t-1) + … b0=0, as it is assumed that an input cannot instantly affect the output, and so there must be at least a delay of one time instant between u & y (assumes a fast enough sample time, relative to the system dynamics). Typically e~N(0,s2) – independent and identically distributed Close relationship between the q-shift and z-transform When n=0, this produces a finite impulse response EE-M /7, EF L5&6

Linear Regression The ARX system’s prediction model can be expressed as Here the model’s parameters can be written as: Treat the model as a deterministic system This is natural if the error term is considered to be insignificant or difficult to guess This denotes the model structure M (linear, time invariant, for example), and a particular model with a parameter value q, is M(q). This can be written as a linear regression structure: where Parameter vector: Input vector: The term regression comes from the statistics literature and provides a powerful set of techniques for determining the parameters and interpretating the models. Need access to previous outputs y(t-1) … EE-M /7, EF L5&6

LTI DT ARMAX Model A more general discrete time, linear time invariant model also includes Moving Average terms on the error/residual signal Here, we describe the equation error term, e(t), as a moving average of white noise (non-iid measurement errors) Simple example y(t) = 0.5y(t-1) + 0.3y(t-2) + 1.2u(t-1) - 0.3u(t-2) + 0.5e(t) + 0.5e(t-1) This can be written as a pseudolinear regression EE-M /7, EF L5&6

Exemplar Training Data
To estimate the unknown parameters q, we need to collect some exemplar input-output data, and system identification is then a process of estimating the parameter values that best fit the data. The data is generated by a system of noisy ARX linear equations of the form where y is a column vector of measured plant outputs (T,1) X is a matrix of input regressors (T,n+m) q is the “true” parameter vector (n+m,1) e is the error vector (T,1) Each row of X represents a single input/output sample. Each column of X represents a time delayed output or input. Note that there is a “burn-in” period to measure the time-delayed outputs y(1), y(2), … which are necessary to form the inputs to the time-delayed vector [y(1), …, y(t-n)] EE-M /7, EF L5&6

Example: Data for 1st Order ARX Model
1st Order model representation First order plant model (exponential decay) with no external disturbances and the measurement noise is additive (Slide 7) Input vector, output signal and parameters At time t, the 1st order DT model is represented as Output y(t) Input x(t) = [y(t-1); u(t-1)] Parameters q = [q1; q2] Data As there are two parameters, if the system is truly first order and there is no measurement noise on any of the signals, we just need two (linearly independent) samples to estimate q. If there is measurement noise in y(t), we need to collect more data to reduce the effect of the random noise. Store X=[y(1) u(1); y(2) u(2); y(3) u(3); …], y=[y(2); y(3); y(4); …] EE-M /7, EF L5&6

Prediction Residual Signal
The residual signal (measured-predicted) is defined as: and can be represented as: A simple regression interpretation is (each x represents an exemplar sample from a single input, single output system) y(t) ^ x(t) r(t) “residual” Model q ^ - + y(t) x(t) Plant q output measurement + + e(t) x x x x x x x EE-M /7, EF L5&6

Measures of Model Goodness
The model’s response can be expressed as y(t) = xT(t)q where q is the model’s estimated parameter vector and x(t) is the input vector If y(t)=y(t), the model’s response is correct for that single time sample. The residual r(t)=y(t)-y(t) is zero. The residual’s magnitude gives us an idea of the “goodness” of the parameter vector estimate for that data point. For a set of measured outputs and predictions {y(t),y(t)}t, the “size” of the residual vector r=y-y, is an estimate of the parameter goodness We can determine the size by looking at the norm of r. ^ ^ ^ ^ ^ ^ ^ EE-M /7, EF L5&6

Residual Norm Measures
A vector p-norm (of a vector r) is defined by: The most common p-norm is the 2-norm: The vector p-norm has the properties that: ||r||  0 ||r|| = 0 iff r = 0 ||kr|| = k||r|| ||r1+r2||  ||r1||+||r2|| For the residual vector, the norm is only zero if all the residuals are zero. Otherwise, a small norm means that, on average, the individual residuals are small in magnitude. EE-M /7, EF L5&6

Sum of Squared Residuals
The most common discrete time performance index is the sum of squared residuals (2-norm squared): For each data point, the model’s output is compared against the plants and error is squared and summed over the remaining points. Any non-zero value for any of the residual values will mean that the performance index is positive The performance function f(q) is a function of the parameter values, because some parameter values will cause large residuals, others will cause small residuals. We want the parameter values that minimize f(q) (0). EE-M /7, EF L5&6

Relationship between Noise & Residual
The aim of parameter estimation is to estimate the values of q that minimize this performance index (sum squared residuals or errors SSE). When the model can predict the model exactly: r(t) = e(t) The residual signal is equal to the additive noise signal Note that the SSE is often replaced by the mean squared error MSE defined by MSE = SSE/T  s2 (the variance of the additive noise signal) This is the variance of the residual signal. This is simply represents the average squared error and ensures that the performance function does not depend on the amount of data Example, when we have 1000 repeated trials (step responses) of 9 data points for the DT electric circuit, with additive noise N(0,0.01) MSE = ||r||22/T =  s2 RMSE =  s` EE-M /7, EF L5&6

Example: DT RC Electrical Circuit
Consider the DT, first order, LTI representation of the RC circuit which is an ARX model (Slide 7 & 12) Assume that D/RC=0.5, then: y(t) = 0.5*y(t-1) + 0.5*u(t-1) Here the system is initially at rest y(0)=0. Note that u here refers to a step signal which is switched on at t=1 & u(0)=0, rather than the control signal Assume that 10 steps are taken, we collect 9 data points for system identification: >> X=[y(1:end-1)’ u(1:end-1)]; >> y1 = y(2:end)’; Gaussian random noise of standard error 0.05 was also added to y1 >> y1e = y1+0.05*randn(size(y1)); EE-M /7, EF L5&6

Example: Noisy Electric Circuit
NB randn(‘state’, ) Note here, we’re cheating a bit by assuming the exact measurement y(t-1) is available to the model’s input but only the noisy measurement ye(t) is available to the model’s output. NB, in these notes, y() generally denotes the noisy output EE-M /7, EF L5&6

Parameter Estimation An important part of system identification is being able to estimate the parameters of a linear model, when a quadratic performance function is used to measure the model’s goodness. This produces the well-known normal equations for least squares estimation This is a closed form solution Efficiently and robustly solved (in Matlab) Permits a statistical interpretation Can be solved recursively Investigated over the next 3-4 lectures EE-M /7, EF L5&6

Noise-free Parameter Determination
Parameter estimation works by assuming a plant/model structure, which is taken to be exactly known. If there are n+m parameters in the model, we can collect n+m pieces of data (linearly independent – to ensure that the input/data matrix, X, is invertible): Xq = y and invert the matrix to find the exact parameter values: q =X-1y In Matlab, both of the following forms are equivalent: theta = inv(X)*y; theta = X\y; theta = [ ] % Previous example EE-M /7, EF L5&6

Linear Model and Quadratic Performance
When the model is linear and the data is noisy (missing inputs, unmeasurable disturbances), the Sum Squared Error (SSE) performance index can be expressed as: This expression is quadratic in q. Typically size(X,1)>>size(X,2) It is of the form (for 2 inputs/parameters): The equivalent system of linear equations Xq=y+e is inconsistent EE-M /7, EF L5&6

Quadratic Matrix Representation
This can also be expressed in matrix form The general form for a quadratic is: where Hessian/covariance matrix Cross-correlation vector EE-M /7, EF L5&6

Normal Equations for a Linear Model
When the parameter vector is optimal: For a quadratic MSE with a linear model: At optimality: In Matlab, the normal equations are: thetaHat = inv(X’*X)*X’*y; thetaHat = pinv(X)*y; thetaHat = X\y; f ^ q EE-M /7, EF L5&6

Example 1: 2 Parameter Model
Data: 3 data and 2 unknowns Find Least Squares solution to: Form variance/covariance matrix and cross correlation vector Invert variance/covariance matrix Least squares solution EE-M /7, EF L5&6

Example 2: Electrical Circuit ARX Model
See slides 7, 12, 18 & 19 9 exemplars and 2 parameters. Additive measurement noise Hessian (variance/covariance) matrix and correlation vector Inverse Hessian matrix Least squares solution NB randn(‘state’, ) EE-M /7, EF L5&6

Investigation into the Performance Function
We can “plot” the performance index against different parameter values in a model As shown earlier, f() is a quadratic function in q It is “centred” at q, I.e. f(q) = min f(q) The shape (contours) depends on the Hessian matrix X, this influences the ability to identify the plant. See next lectures q1 q2 f ^ ^ EE-M /7, EF L5&6

L5&6 Summary ARX and ARMAX discrete time linear models are widely used
System identification is being considered simply as parameter estimation The residual vector is used to assess the quality of the model (parameter vector) The sum, squared error/residual (2-norm) is commonly used to measure the residual’s size because it can be interpreted as the noise variance and because it is analytically convenient For a linear model, the SSE is a quadratic function of the parameters, which can be differentiated to estimate the optimal parameter via the normal equations EE-M /7, EF L5&6

L5&6 Lab Theory Make sure you can derive the normal equations S22-24
Matlab Implement the DT RC circuit simulation, S18, so you can perform a least squares parameter estimation given noisy data about the electrical circuit Set the Gaussian random seed, as per S26 and check your estimates are the same Set different seed and note that the optimal parameter values are different Perform the step experiment 10, 100, 1000, … times and note that the estimated optimal parameter values tend towards the true values of [ ]. Load the data into the identification toolbox GUI and create a first order parametric model with model orders [1 1 1]. NB you do not need to remove the means from the data (why not?). Calculate the model and view the value of the parameters and the model fit, as well as checking the step response and validating the model. EE-M /7, EF L5&6

Lectures 5 & 6: Least Squares Parameter Estimation

Similar presentations

Presentation on theme: "Lectures 5 & 6: Least Squares Parameter Estimation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lectures 5 & 6: Least Squares Parameter Estimation

Similar presentations

Presentation on theme: "Lectures 5 & 6: Least Squares Parameter Estimation"— Presentation transcript:

Similar presentations

About project

Feedback