SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad <<<1.1>>> ###Control System Design### {{{Control, Design}}} Reference: “System Identification Theory For The User” Lennart Ljung
Computing the estimate Lecture 10 Computing the estimate Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Introduction In chapter 7 three basic parameter estimation method considered 1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ. 2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ. 3- The Subspace Approach to estimating state space models. In this chapter we shall discuss how these problems are best solved numerically.
Linear Regression and Least Squares. Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Linear Regression and Least Squares. For linear regression we have: Least-squares criterion leads to An alternative form is: Normal equations Note that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IV method.
Linear Regression and Least Squares. Normal equations R(N) may be ill-conditioned specially when its dimension is high. The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property This class of methods is commonly known as “square-root algorithm” But the term “quadratic methods” is more appropriate. How to derive R? Householder Cholesky decomposition Gram-Schmidt procedure QR decomposition Bjorck decomposition
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. The QR-factorization of an n d matrix A is defined as: Here Q is an unitary n n and R is n d.
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization.
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. Let define Let Q as an unitary matrix, then
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. Now, introduce QR-factorization This means that which clearly is minimized for
Linear Regression and Least Squares. Exercise: Suppose for t=1 to 11 the value of u and y are: Consider the simple model for system 1) Derive from eq. (I) and find the condition number of R(N) 2) Derive from eq. (II) and find the condition number of R1
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. There are three important advantages with this way of solving the LS estimate: Therefore R1 is much better conditioned than R(N). 1- The condition number of R1 is the square root of R(N). 2- R1 is a triangular matrix, so the equation is easy to solve. 3- If the QR-factorization is performed for a regressor size d*, then the solutions for all models with fewer parameter are easily obtained from R0. Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0
Linear Regression and Least Squares. Initial condition: “Windowed” Data The regression vector φ(t) is: Here z(t-1) is an r-dimensional vector. For example, the for ARX model For example, the for AR model R(N) will be:
Linear Regression and Least Squares. Initial condition: “Windowed” Data R(N) will be: If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition 1 - Start the summation at t=n+1 rather than t=1. 2 - Replace the unknown initial condition by zeros.
Numerical Solution by Iterative Search Method Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Numerical Solution by Iterative Search Method In general neither the function nor cannot be minimized or solved by analytical methods. Numerical minimization Methods for numerical minimization of a function V(θ) update the minimizing point iteratively by: f (i) is a search direction based on information about V(θ) α is a positive constant Depending on the information to determine f (i) there is 3 groups 1- Methods using function values only. 2- Methods using values of the function as well as of its gradient. 3- Methods using values of the function, its gradient and of its Hessian..
Numerical Solution by Iterative Search Method Depending on the information to determine f (i) there is 3 groups Methods using values of the function, its gradient and of its Hessian.. Newton algorithms Methods using values of the function V as well as of its gradient. An estimate of Hessian is find and then: Quasi Newton algorithms Methods using function values only. An estimate of gradient is used then Quasi Newton algorithm applied.
Numerical Solution by Iterative Search Method In general consider the function The gradient is: Here, Ψ(t,θ) is:
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case The gradient is: A general family of search routines is given by
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case Let then we have This is the gradient or steepest-descent method. This method is fairly inefficient close to the minimum.
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for solving f(x)=0. Make an initial guess: x0. Draw the tangent line. Its equation is: x1 x2 x0 Let x1 be x-intercept of the tangent line. This intercept is given by the formula: Now repeat x1 as the initial guess. This method is fairly inefficient close to the minimum..
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for solving f(x)=0. Some difficulties of steepest-descent method. Zero derivatives. Diverging. x2 x2 x1 x0
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for finding minimum of f(x)
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for finding minimum of f(x)
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case The gradient or steepest-descent method is fairly inefficient close to the minimum. The gradient and the Hessian of V is: Let then we have This is the Newton method. But it is not an easy task to compute Hessian since of .
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case This is the Newton method. But it is not an easy task to compute Hessian since of . Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t)
Numerical Solution by Iterative Search Method Newton method So choose of in the vicinity of minimum is a good estimate of Hessian. This is known as the Gauss-Newton Method. In the statistical literature it is called the “Method of scoring”. In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.
Numerical Solution by Iterative Search Method Newton method Dennis and Schnabel reserve the term “Guess-Newton” for and for the term “damped Guess-Newton” has been used.
Numerical Solution by Iterative Search Method Newton method Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as “regularization techniques” Goldfeld, Quandt and Trotter suggest Levenberg and Marquardt suggest With λ = 0 we have the Guess-Newton case, increasing λ means that the step size is decreased and the search direction is turned towards the gradient.
Numerical Solution by Iterative Search Method Remember that we want to or Newton method to solve (I) This leads to Correlation Equation Solving equation (II) is quite analogous to the minimization of (I) Newton-Raphson method to solve (II) Substitution method to solve (II)
Computing Gradients Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Computing Gradients The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation. Example 10.1 Consider the ARMAX model the predictor is: Differentiation with respect to ak is: similarly now
Computing Gradients now
Computing Gradients SISO black box model General model structure and its predictor is: so we have
Computing Gradients SISO black box model General model structure and its predictor is: As an special case consider OE model now
Computing Gradients SISO black box model As an special case consider OE model now
Two-Stage and Multistage Method Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Two-Stage and Multistage Method Numerical Solution by Iterative Search Method Combined Guaranteed convergence to a local minimum. Efficiently. Applicability to general model structure. Linear Regression and Least Squares Efficient methods with analytic solution.
Two-Stage and Multistage Method Why we interest in this topic: It helps to understand the identification literature. It is useful to providing initial estimates to use in iterative methods . Some important Two-Stage or Multistage Method 1- Bootstrap Methods. 2- Bilinear Parameterization. 3- Separate Least Squares. 4- High Order AR(X) Models. 5- Separating Dynamics And Noise Models. 6- Determining ARMA Models. 7- Subspace Methods For Estimating State Space Models.
Two-Stage and Multistage Method Bootstrap Methods Consider the correlation formulation This formulation contains a number of common situation IV methods with: IV: instrument variable PLR methods: PLR: Pseudo linear regression Minimizing the quadratic criterion:
Two-Stage and Multistage Method Bootstrap Methods Consider the correlation formulation It is called Bootstrap Method since it alternate between: It does not necessarily converge to a solution. A convergence analysis is given by:
Two-Stage and Multistage Method Bilinear Parameterization. For some models, the predictor is bilinear in the parameters, for example consider ARARX model Now the estimator is Let Bilinear means that is linear in ρ for fixed η and linear in η for fixed ρ.
Two-Stage and Multistage Method Bilinear Parameterization. In ARARX model With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let Exercise 10T.3 Show that this minimization problem is an special case of 10.40. According to exercise 10T.3 Bilinear parameterization is thus indeed a descent method. It converges to a local minimum.
Two-Stage and Multistage Method Separate Least Squares. A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor: The identification criterion then becomes For given η this criterion is an LS criterion and minimized w.r.t. θ by We can thus insert it to VN and define the problem as
Two-Stage and Multistage Method Separate Least Squares. The identification criterion then becomes 1- 2- 3- The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions.
Two-Stage and Multistage Method High Order AR(X) Models. Suppose the true system is: An order M, ARX structure is used Hannan and Kavalieris and Ljung and Wahlberg show that So high-order ARX model is capable of approximating any linear system arbitrary well.
Two-Stage and Multistage Method High Order AR(X) Models. So high-order ARX model is capable of approximating any linear system arbitrary well. It is of course desirable to reduce this high-order to more tractable versions:
Two-Stage and Multistage Method Separating Dynamics And Noise Models. General model structure is:
Two-Stage and Multistage Method Determining ARMA Models.
Two-Stage and Multistage Method Subspace Methods For Estimating State Space Models. The Subspace methods can also be regarded as a two-stage method, being built up from two LS-steps.
Local Solutions and Initial Values Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Local Solutions and Initial Values Local Minima The general numerical schemes in Section 10.2 typically have the property that, with suitably chosen step length μ, they will converge to a solution i.e. May have several solutions. While for positive definite R, we have local minimum of VN(θ,Z) The global minimum interests us. Local minima do not necessary create problem in practice, if a model passes the validation tests. (Sec 16.5 and 16.6) To find the global solution one must start at different feasible initial values. An important possibility is to use some preliminary estimation procedure to produce a good initial value.
Local Solutions and Initial Values Remember from chapter 8 54
Local Solutions and Initial Values Results from SISO Black-box Models General model structure is: Consider the assumption that the system can be described within the model set: SεM The results are listed below for the general SISO model set and refer to
Local Solutions and Initial Values Results from SISO Black-box Models
Local Solutions and Initial Values Initial parameter values Duo to the possible occurrence of undesired local minima in the criterion function, it is worthwhile to put some effort on producing good initial values. Also Newton-type method has good local convergence rate, it is again worthwhile to put some effort on producing good initial values. 1- For a physical parameterized model structure: Use your physical insight. 2- For a linear black-box model structure:
Local Solutions and Initial Values Initial filter condition In some configuration we need initial values φ(0,θ). 1 - Start the summation at t=n+1 rather than t=1. 2 - Consider initial condition by:
Subspace Methods for Estimating State Space Models Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.
Subspace Methods for Estimating State Space Models Let us now consider how to estimate the system matrices A, B, C and D in the ss model Let the output y(t) is a p-dimensional column vector, the input u(t) is a m-dimensional column vector. Also the order of system is n. We also assume that this ss representation is a minimal realization. We know that many different representation can also described the system. They are: Where T is any invertible matrix. We also have
Subspace Methods for Estimating State Space Models Let the ss as: ► Estimating B and D ► Finding A and C from Observability matrix ► Estimating the Extended Observability matrix ► Finding the States and Estimating the noise Statistics.
Subspace Methods for Estimating State Space Models Let the ss as: Subspace procedure. ► Estimating B and D ► Finding A and C from Observability matrix ► Estimating the Extended Observability matrix ► Finding the States and Estimating the noise Statistics.
Subspace Methods for Estimating State Space Models ► Estimating B and D For given and fixed the model structure: It is clearly linear in B and D. If the system operates in open loop. We can thus consistently estimate B and D according to theorem 8.4 even if the noise sequence is non-white.
Subspace Methods for Estimating State Space Models ► Estimating B and D Let us write the predictor in the standard linear regression form
Subspace Methods for Estimating State Space Models ► Estimating B and D
Subspace Methods for Estimating State Space Models ► Estimating B and D If desired, also the initial state x0=x(0) can be estimated in an analogous way, since the predictor with initial values taken into account is Which is linear also in x0. Here is the unit pulse at time 0.
Subspace Methods for Estimating State Space Models ► Finding A and C from Observability matrix Suppose that a dimensional matrix G is given. That is related to the extended observability matrix Or . We have to determine A and C from G. There is two situation: Known System Order. Unknown System Order. Known System Order. Suppose first we know that So that n*=n. To find C is then immediate:
Subspace Methods for Estimating State Space Models ► Finding A and C from Observability matrix Similarly, we can find from the equation
Subspace Methods for Estimating State Space Models ► Finding A and C from Observability matrix Note: Large r leads to numerical problem. Role of the State Space Basis The extended obsevability matrix is depends on the choice of basis in the state-space representation. It is easy to verify that the observability matrix would be
Subspace Methods for Estimating State Space Models Unknown system order. Suppose now the true orders of the system is unknown. And that n*-the number of columns of G is just an upper bound for the order.
Subspace Methods for Estimating State Space Models
Subspace Methods for Estimating State Space Models
Subspace Methods for Estimating State Space Models Now multiplying this by V1 from right. Now multiplying this by S1-1 from right. Or for some invertible matrix R:
Subspace Methods for Estimating State Space Models Using a Noisy Estimate of the Extended Observability Matrix Let us now assume that the given matrix G is a noisy estimate of the true obsevability matrix Where EN is small and tends to zero as . The rank of Or is not known. While the noise matrix EN is likely to be full rank. It is reasonable to proceed as above and perform an SVD on G: Due to the noise, S will typically have all singular non-zero values
Subspace Methods for Estimating State Space Models The first n will be supported by Or , while the remaining ones will stem from EN . If the noise is small, one should expected that the latter are significantly smaller than the former. Therefore determine n as the number of singular values that are significantly larger than 0. Then use to determine , as before. However in the noisy case, will not be exactly subject to the shift structure So this system of equations should be solved in a least-squares sense.
Subspace Methods for Estimating State Space Models Using Weighting Matrices in the SVD For more flexibility we could pre- and post- multiply G as before performing the SVD And then use the below equation to determine and Here R is an arbitrary matrix, that will the coordinate basis for the state representation. The post-multiplication by W2 just corresponds to a change of basis in the state-space and the pre-multiplication by W1 is eliminated. In the noiseless case E=0, these weightings are without consequence. However, when noise is present, they have an important influence on the space spanned by U1.and hence on the quality of the estimates and . Remark. The post-multiplying W2 by an orthogonal matrix does not effect the U1-matrix in the decomposition. Exercise: Proof the mentioned remark.(10.E10).
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. Remember Now,
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. Now, form the vectors The scalar r, is the maximum prediction horizon.
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. And the Kth block component of V(t)
Subspace Methods for Estimating State Space Models ► Esimating the Extended Observability Matrix. ? ? ? We must eliminate the U term and make the noise influence disappear asymptotically.
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. We must eliminate the U term and make the noise influence disappear asymptotically. Removing the U-term. Form the matrix Now Multiplying from the right by will leads to: ? Since this term is made up of noise contributions, the idea is to correlate is away with a suitable matrix.
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. Removing the Noise Term. Since the last term is made up of noise contributions. The idea is to correlate it away with a suitable matrix. Define matrix . Here acts as an instrument and we must define it such that
Subspace Methods for Estimating State Space Models ► Estimating the Extended Observability Matrix. Here acts as an instrument and we must define it such that then so The matrix G can thus be seen as a noisy estimate of the extended observability matrix. But we need to define .
Subspace Methods for Estimating State Space Models Finding Good Instruments. The only remaining question is how to achieve to the following equations Remember instrument variable: Remember: The law of large numbers states that the sample sums converges to their respective expected values, so
Subspace Methods for Estimating State Space Models Finding Good Instruments. The only remaining question is how to achieve to the following equations Assume the input u is generated in open loop, so that it is independent of the noise V. The k;th block component of V(t) is: Now let
Subspace Methods for Estimating State Space Models Finding Good Instruments. The only remaining question is how to achieve to the following equations Similarly we have: A formal proof that has full rank is not immediate and will involve properties of the input. See problem 10G.6 and Van Overschee and DeMoor(1996).
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics Some part of chapter 7 Let a system given by the impulse response representation Let the formal k-step ahead predictors be defined by just deleting the last terms so: Define
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics Some part of chapter 7 Then the following is true as (see chapter 4 appendix A) 1- The system (I) has an nth order minimal state space description if and only if the rank is equal to n for all r ≥ n 2- The state vector of any minimal realizations form can be chosen as linear
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics Let a system given by the impulse response representation For practical reason we have This predictor can be determined effectively by or, dealing with all r predictors simultaneously
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics According to chapter 7 By LS we have By inverse lemma
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics According to chapter 7 Predicted output is So we have
Subspace Methods for Estimating State Space Models Finding the States and Estimating the Noise statistics So let:
Subspace Methods for Estimating State Space Models With the states given, we can estimate the process and measurement noises as
The family of subspace algotithm Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 1. From the input-output data form Remember: Many algorithms choose φs(t) to consist of past inputs and outputs with s1=s2=s. So scalar s is a design variable. The scalar r, is the maximal prediction horizon and in many algorithms use r = s
The family of subspace algotithm Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 2. Select weighting matrices W1 and W2 and perform SVD The weighting matrices W1 and W2. This is the perhaps most important choice. Existing algorithms employ the following choices:
The family of subspace algotithm Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 3. Select a full rank matrix R and define the matrix solve Typical choices for R, are R=I, R=S1 or For and . The latter equation should be solved in a least square sense. 4. Estimate , and from the linear regression problem:
The family of subspace algotithm Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 5. If a noise model is sought, form as in And estimate the noise contributions as in