SYSTEMS Identification

SYSTEMS Identification
Ali Karimpour Assistant Professor Ferdowsi University of Mashhad <<<1.1>>> ###Control System Design### {{{Control, Design}}} Reference: “System Identification Theory For The User” Lennart Ljung

Computing the estimate
Lecture 10 Computing the estimate Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.

Introduction In chapter 7 three basic parameter estimation method considered 1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ. 2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ. 3- The Subspace Approach to estimating state space models. In this chapter we shall discuss how these problems are best solved numerically.

Linear Regression and Least Squares.
Topics to be covered include: Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.

For linear regression we have: Least-squares criterion leads to An alternative form is: Normal equations Note that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IV method.

Normal equations R(N) may be ill-conditioned specially when its dimension is high. The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property This class of methods is commonly known as “square-root algorithm” But the term “quadratic methods” is more appropriate. How to derive R? Householder Cholesky decomposition Gram-Schmidt procedure QR decomposition Bjorck decomposition

Solving for the LS estimates by QR factorization. The QR-factorization of an n d matrix A is defined as: Here Q is an unitary n n and R is n d.

Solving for the LS estimates by QR factorization.

Solving for the LS estimates by QR factorization. Let define Let Q as an unitary matrix, then

Solving for the LS estimates by QR factorization. Now, introduce QR-factorization This means that which clearly is minimized for

Exercise: Suppose for t=1 to 11 the value of u and y are: Consider the simple model for system 1) Derive from eq. (I) and find the condition number of R(N) 2) Derive from eq. (II) and find the condition number of R1

Solving for the LS estimates by QR factorization. There are three important advantages with this way of solving the LS estimate: Therefore R1 is much better conditioned than R(N). 1- The condition number of R1 is the square root of R(N). 2- R1 is a triangular matrix, so the equation is easy to solve. 3- If the QR-factorization is performed for a regressor size d*, then the solutions for all models with fewer parameter are easily obtained from R0. Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0

Initial condition: “Windowed” Data The regression vector φ(t) is: Here z(t-1) is an r-dimensional vector. For example, the for ARX model For example, the for AR model R(N) will be:

Initial condition: “Windowed” Data R(N) will be: If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition 1 - Start the summation at t=n+1 rather than t=1. 2 - Replace the unknown initial condition by zeros.

Numerical Solution by Iterative Search Method

In general neither the function nor cannot be minimized or solved by analytical methods. Numerical minimization Methods for numerical minimization of a function V(θ) update the minimizing point iteratively by: f (i) is a search direction based on information about V(θ) α is a positive constant Depending on the information to determine f (i) there is 3 groups 1- Methods using function values only. 2- Methods using values of the function as well as of its gradient. 3- Methods using values of the function, its gradient and of its Hessian..

Depending on the information to determine f (i) there is 3 groups Methods using values of the function, its gradient and of its Hessian.. Newton algorithms Methods using values of the function V as well as of its gradient. An estimate of Hessian is find and then: Quasi Newton algorithms Methods using function values only. An estimate of gradient is used then Quasi Newton algorithm applied.

In general consider the function The gradient is: Here, Ψ(t,θ) is:

Some explicit search schemes Consider the special case The gradient is: A general family of search routines is given by

Some explicit search schemes Consider the special case

Some explicit search schemes Consider the special case Let then we have This is the gradient or steepest-descent method. This method is fairly inefficient close to the minimum.

Gradient or steepest-descent method for solving f(x)=0. Make an initial guess: x0. Draw the tangent line. Its equation is: x1 x2 x0 Let x1 be x-intercept of the tangent line. This intercept is given by the formula: Now repeat x1 as the initial guess. This method is fairly inefficient close to the minimum..

Gradient or steepest-descent method for solving f(x)=0. Some difficulties of steepest-descent method. Zero derivatives. Diverging. x2 x2 x1 x0

Gradient or steepest-descent method for finding minimum of f(x)

Some explicit search schemes Consider the special case The gradient or steepest-descent method is fairly inefficient close to the minimum. The gradient and the Hessian of V is: Let then we have This is the Newton method. But it is not an easy task to compute Hessian since of

Some explicit search schemes Consider the special case This is the Newton method. But it is not an easy task to compute Hessian since of Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t)

Newton method So choose of in the vicinity of minimum is a good estimate of Hessian. This is known as the Gauss-Newton Method. In the statistical literature it is called the “Method of scoring”. In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.

Newton method Dennis and Schnabel reserve the term “Guess-Newton” for and for the term “damped Guess-Newton” has been used.

Newton method Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as “regularization techniques” Goldfeld, Quandt and Trotter suggest Levenberg and Marquardt suggest With λ = 0 we have the Guess-Newton case, increasing λ means that the step size is decreased and the search direction is turned towards the gradient.

Remember that we want to or Newton method to solve (I) This leads to Correlation Equation Solving equation (II) is quite analogous to the minimization of (I) Newton-Raphson method to solve (II) Substitution method to solve (II)

Computing Gradients Topics to be covered include:
Linear Regression and Least Squares. Numerical Solution by Iterative Search Method. Computing Gradients. Two-Stage and Multistage Method. Local Solutions and Initial Values. Subspace Methods for Estimating State Space Models.

Computing Gradients The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation. Example Consider the ARMAX model the predictor is: Differentiation with respect to ak is: similarly now

Computing Gradients now

Computing Gradients SISO black box model
General model structure and its predictor is: so we have

General model structure and its predictor is: As an special case consider OE model now

As an special case consider OE model now

Two-Stage and Multistage Method

Numerical Solution by Iterative Search Method Combined Guaranteed convergence to a local minimum. Efficiently. Applicability to general model structure. Linear Regression and Least Squares Efficient methods with analytic solution.

Why we interest in this topic: It helps to understand the identification literature. It is useful to providing initial estimates to use in iterative methods . Some important Two-Stage or Multistage Method 1- Bootstrap Methods. 2- Bilinear Parameterization. 3- Separate Least Squares. 4- High Order AR(X) Models. 5- Separating Dynamics And Noise Models. 6- Determining ARMA Models. 7- Subspace Methods For Estimating State Space Models.

Bootstrap Methods Consider the correlation formulation This formulation contains a number of common situation IV methods with: IV: instrument variable PLR methods: PLR: Pseudo linear regression Minimizing the quadratic criterion:

Bootstrap Methods Consider the correlation formulation It is called Bootstrap Method since it alternate between: It does not necessarily converge to a solution. A convergence analysis is given by:

Bilinear Parameterization. For some models, the predictor is bilinear in the parameters, for example consider ARARX model Now the estimator is Let Bilinear means that is linear in ρ for fixed η and linear in η for fixed ρ.

Bilinear Parameterization. In ARARX model With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let Exercise 10T.3 Show that this minimization problem is an special case of According to exercise 10T.3 Bilinear parameterization is thus indeed a descent method. It converges to a local minimum.

Separate Least Squares. A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor: The identification criterion then becomes For given η this criterion is an LS criterion and minimized w.r.t. θ by We can thus insert it to VN and define the problem as

Separate Least Squares. The identification criterion then becomes 1- 2- 3- The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions.

High Order AR(X) Models. Suppose the true system is: An order M, ARX structure is used Hannan and Kavalieris and Ljung and Wahlberg show that So high-order ARX model is capable of approximating any linear system arbitrary well.

High Order AR(X) Models. So high-order ARX model is capable of approximating any linear system arbitrary well. It is of course desirable to reduce this high-order to more tractable versions:

Separating Dynamics And Noise Models. General model structure is:

Determining ARMA Models.

Subspace Methods For Estimating State Space Models. The Subspace methods can also be regarded as a two-stage method, being built up from two LS-steps.

Local Solutions and Initial Values

Local Minima The general numerical schemes in Section 10.2 typically have the property that, with suitably chosen step length μ, they will converge to a solution i.e. May have several solutions. While for positive definite R, we have local minimum of VN(θ,Z) The global minimum interests us. Local minima do not necessary create problem in practice, if a model passes the validation tests. (Sec 16.5 and 16.6) To find the global solution one must start at different feasible initial values. An important possibility is to use some preliminary estimation procedure to produce a good initial value.

Remember from chapter 8 54

Results from SISO Black-box Models General model structure is: Consider the assumption that the system can be described within the model set: SεM The results are listed below for the general SISO model set and refer to

Results from SISO Black-box Models

Initial parameter values Duo to the possible occurrence of undesired local minima in the criterion function, it is worthwhile to put some effort on producing good initial values. Also Newton-type method has good local convergence rate, it is again worthwhile to put some effort on producing good initial values. 1- For a physical parameterized model structure: Use your physical insight. 2- For a linear black-box model structure:

Initial filter condition In some configuration we need initial values φ(0,θ). 1 - Start the summation at t=n+1 rather than t=1. 2 - Consider initial condition by:

Subspace Methods for Estimating State Space Models

Let us now consider how to estimate the system matrices A, B, C and D in the ss model Let the output y(t) is a p-dimensional column vector, the input u(t) is a m-dimensional column vector. Also the order of system is n. We also assume that this ss representation is a minimal realization. We know that many different representation can also described the system. They are: Where T is any invertible matrix. We also have

Let the ss as: ► Estimating B and D ► Finding A and C from Observability matrix ► Estimating the Extended Observability matrix ► Finding the States and Estimating the noise Statistics.

Let the ss as: Subspace procedure. ► Estimating B and D ► Finding A and C from Observability matrix ► Estimating the Extended Observability matrix ► Finding the States and Estimating the noise Statistics.

► Estimating B and D For given and fixed the model structure: It is clearly linear in B and D. If the system operates in open loop. We can thus consistently estimate B and D according to theorem 8.4 even if the noise sequence is non-white.

► Estimating B and D Let us write the predictor in the standard linear regression form

► Estimating B and D

► Estimating B and D If desired, also the initial state x0=x(0) can be estimated in an analogous way, since the predictor with initial values taken into account is Which is linear also in x0. Here is the unit pulse at time 0.

► Finding A and C from Observability matrix Suppose that a dimensional matrix G is given. That is related to the extended observability matrix Or . We have to determine A and C from G. There is two situation: Known System Order. Unknown System Order. Known System Order. Suppose first we know that So that n*=n. To find C is then immediate:

► Finding A and C from Observability matrix Similarly, we can find from the equation

► Finding A and C from Observability matrix Note: Large r leads to numerical problem. Role of the State Space Basis The extended obsevability matrix is depends on the choice of basis in the state-space representation. It is easy to verify that the observability matrix would be

Unknown system order. Suppose now the true orders of the system is unknown. And that n*-the number of columns of G is just an upper bound for the order.

Now multiplying this by V1 from right. Now multiplying this by S1-1 from right. Or for some invertible matrix R:

Using a Noisy Estimate of the Extended Observability Matrix Let us now assume that the given matrix G is a noisy estimate of the true obsevability matrix Where EN is small and tends to zero as The rank of Or is not known. While the noise matrix EN is likely to be full rank. It is reasonable to proceed as above and perform an SVD on G: Due to the noise, S will typically have all singular non-zero values

The first n will be supported by Or , while the remaining ones will stem from EN . If the noise is small, one should expected that the latter are significantly smaller than the former. Therefore determine n as the number of singular values that are significantly larger than 0. Then use to determine , as before. However in the noisy case, will not be exactly subject to the shift structure So this system of equations should be solved in a least-squares sense.

Using Weighting Matrices in the SVD For more flexibility we could pre- and post- multiply G as before performing the SVD And then use the below equation to determine and Here R is an arbitrary matrix, that will the coordinate basis for the state representation. The post-multiplication by W2 just corresponds to a change of basis in the state-space and the pre-multiplication by W1 is eliminated. In the noiseless case E=0, these weightings are without consequence. However, when noise is present, they have an important influence on the space spanned by U1.and hence on the quality of the estimates and Remark. The post-multiplying W2 by an orthogonal matrix does not effect the U1-matrix in the decomposition. Exercise: Proof the mentioned remark.(10.E10).

► Estimating the Extended Observability Matrix. Remember Now,

► Estimating the Extended Observability Matrix. Now, form the vectors The scalar r, is the maximum prediction horizon.

► Estimating the Extended Observability Matrix. And the Kth block component of V(t)

► Esimating the Extended Observability Matrix. ? ? ? We must eliminate the U term and make the noise influence disappear asymptotically.

► Estimating the Extended Observability Matrix. We must eliminate the U term and make the noise influence disappear asymptotically. Removing the U-term. Form the matrix Now Multiplying from the right by will leads to: ? Since this term is made up of noise contributions, the idea is to correlate is away with a suitable matrix.

► Estimating the Extended Observability Matrix. Removing the Noise Term. Since the last term is made up of noise contributions. The idea is to correlate it away with a suitable matrix. Define matrix Here acts as an instrument and we must define it such that

► Estimating the Extended Observability Matrix. Here acts as an instrument and we must define it such that then so The matrix G can thus be seen as a noisy estimate of the extended observability matrix. But we need to define .

Finding Good Instruments. The only remaining question is how to achieve to the following equations Remember instrument variable: Remember: The law of large numbers states that the sample sums converges to their respective expected values, so

Finding Good Instruments. The only remaining question is how to achieve to the following equations Assume the input u is generated in open loop, so that it is independent of the noise V. The k;th block component of V(t) is: Now let

Finding Good Instruments. The only remaining question is how to achieve to the following equations Similarly we have: A formal proof that has full rank is not immediate and will involve properties of the input. See problem 10G.6 and Van Overschee and DeMoor(1996).

Finding the States and Estimating the Noise statistics Some part of chapter 7 Let a system given by the impulse response representation Let the formal k-step ahead predictors be defined by just deleting the last terms so: Define

Finding the States and Estimating the Noise statistics Some part of chapter 7 Then the following is true as (see chapter 4 appendix A) 1- The system (I) has an nth order minimal state space description if and only if the rank is equal to n for all r ≥ n 2- The state vector of any minimal realizations form can be chosen as linear

Finding the States and Estimating the Noise statistics Let a system given by the impulse response representation For practical reason we have This predictor can be determined effectively by or, dealing with all r predictors simultaneously

Finding the States and Estimating the Noise statistics According to chapter 7 By LS we have By inverse lemma

Finding the States and Estimating the Noise statistics According to chapter 7 Predicted output is So we have

Finding the States and Estimating the Noise statistics So let:

With the states given, we can estimate the process and measurement noises as

The family of subspace algotithm
Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 1. From the input-output data form Remember: Many algorithms choose φs(t) to consist of past inputs and outputs with s1=s2=s. So scalar s is a design variable. The scalar r, is the maximal prediction horizon and in many algorithms use r = s

Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 2. Select weighting matrices W1 and W2 and perform SVD The weighting matrices W1 and W2. This is the perhaps most important choice. Existing algorithms employ the following choices:

Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 3. Select a full rank matrix R and define the matrix solve Typical choices for R, are R=I, R=S1 or For and The latter equation should be solved in a least square sense. 4. Estimate , and from the linear regression problem:

Subspace Methods for Estimating State Space Models Putting It All Together The family of subspace algotithm 5. If a noise model is sought, form as in And estimate the noise contributions as in

SYSTEMS Identification

Similar presentations

Presentation on theme: "SYSTEMS Identification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SYSTEMS Identification

Similar presentations

Presentation on theme: "SYSTEMS Identification"— Presentation transcript:

Similar presentations

About project

Feedback