Computacion Inteligente Least-Square Methods for System Identification.


Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

The Maximum Likelihood Method
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Chapter 10 Curve Fitting and Regression Analysis
The General Linear Model. The Simple Linear Model Linear Regression.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Visual Recognition Tutorial
The Simple Linear Regression Model: Specification and Estimation
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
Maximum likelihood (ML) and likelihood ratio (LR) test
Visual Recognition Tutorial
Linear and generalised linear models
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 5 Transformations and Weighting to Correct Model Inadequacies
Maximum likelihood (ML)
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Chapter 8: Adaptive Networks
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Computacion Inteligente Least-Square Methods for System Identification.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 7. Classification and Prediction
Deep Feedforward Networks
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
Simultaneous equation system
10701 / Machine Learning Today: - Cross validation,
Simple Linear Regression
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Applied Statistics and Probability for Engineers
Presentation transcript:

Computacion Inteligente Least-Square Methods for System Identification

2 Contents  System Identification: an Introduction  Least-Squares Estimators  Statistical Properties of least-squares estimators  Maximum likelihood (ML) estimator  Maximum likelihood estimator for linear model  LSE for Nonlinear Models  Developing Dinamic models from Data  Example: Tank level modeling

3 System Identification: Introduction  Goal –Determine a mathematical model for an unknown system (or target system) by observing its input-output data pairs

4 System Identification: Introduction  Purposes –To predict a system’s behavior, –As in time series prediction & weather forecasting –To explain the interactions & relationships between inputs & outputs of a system

5 System Identification: Introduction  Context example –To design a controller based on the model of a system, –as an aircraft or ship control –Simulate the system under control once the model is known

6 Why cover System Identification  System Identification  It is a well established and easy to use technique for modeling a real life system.  It will be needed for the section on fuzzy-neural networks.

7 Spring Example ExperimentForce(newtons)Length(inches) What will the length be when the force is 5.0 newtons? Experimental data

8 Components of System Identification  There are 2 main steps that are involved –Structure identification –Parameter identification

9 Structure identification  Structure identification  Apply a-priori knowledge about the target system to determine a class of models within which the search for the most suitable model is to be conducted This class of model is denoted by a function y = f(u,  ) where: y is the model output u is the input vector  is the parameter vector

10 Structure identification  Structure identification  f(u,  ) depends on –the problem at hand –the designer’s experience –the laws of nature governing the target system

11 Parameter identification –Training data is used for both system and model. –Difference between Target System output, y i, and Mathematical Model output, y i, is used to update parameter vector, θ. ^

12 Parameter identification  Parameter identification –The structure of the model is known, however we need to apply optimization techniques –In order to determine the parameter vector such that the resulting model describes the system appropriately:

13 System Identification Process  The data set composed of m desired input-output pairs –(u i, y i ) (i = 1,…,m) is called the training data  System identification needs to do both structure & parameter identification repeatedly until satisfactory model is found

14 System Identification: Steps –Specify & parameterize a class of mathematical models representing the system to be identified –Perform parameter identification to choose the parameters that best fit the training data set –Conduct validation set to see if the model identified responds correctly to an unseen data set –Terminate the procedure once the results of the validation test are satisfactory. Otherwise, another class of model is selected & repeat step 2 to 4

15 System Identification Process Structure and parameter identification may need to be done repeatedly

16  Least-Squares Estimators

17 Objective of Linear Least Squares fitting  Given a training data set {(u i, y i ), i = 1, …, m} and the general form function:  Find the parameters  1, …,  n, such that estimate

18 The linear model  The linear model : y =  1 f 1 (u) +  2 f 2 (u) + … +  n f n (u) = f T (u,  )  where: –u = (u 1, …, u p ) T is the model input vector –f 1, …, f n are known functions of u –  1, …,  n are unknown parameters to be estimated

19 Least-Squares Estimators  The task of fitting data using a linear model is referred to as linear regression where: –u = (u 1, …, u p ) T is the input vector –f 1 (u), …, f n (u)regressors –  1, …,  n parameter vector

20 Least-Squares Estimators  We collect training data set {(u i, y i ), i = 1, …, m} System’s equations becomes: Which is equivalent to: A  = y

21 Least-Squares Estimators  Which is equivalent to: A  = y –where A  = y   = A -1 y (solution) m*n matrixn*1 vectorm*1 vector unknown

22 Least-Squares Estimators  We have – m outputs, and – n fitting parameters to find  Or – m equations, and – n unknown variables Usually m is greater than n

23 Least-Squares Estimators  Since  the model is just an approximation of the target system &  the data observed might be corrupted,  Therefore –an exact solution is not always possible!  To overcome this inherent conceptual problem, an error vector e is added to compensate A  + e = y

24 Least-Squares Estimators  Our goal consists now of finding that reduces the errors between and  The problem: Find, estimate

25 Least-Squares Estimators  If e = y - A  then: We need to compute:

26 Least-Squares Estimators  Theorem [least-squares estimator] The squared error is minimized when  satisfies the normal equation if is nonsingular, is unique & is given by is called the least-squares estimators, LSE

27 Spring Example –Structure Identification can be done using domain knowledge. –The change in length of a spring is proportional to the force applied. Hooke’s law length = k 0 + k 1 *force

28 Spring Example

29  Statistical Properties of least-squares estimators

30 Statistical qualities of LSE  Definition [unbiased estimator] An estimator of the parameter  is unbiased if where E[.] is the statistical expectation

31 Statistical qualities of LSE  Definition [minimal variance] –An estimator is a minimum variance estimator if for any other estimator  *: where Cov(  ) is the covariance matrix of the random vector 

32 Statistical qualities of LSE  Theorem [Gauss-Markov]: –Gauss-Markov conditions: The error vector e is a vector of m uncorrelated random variables, each with zero mean & the same variance  2. This means that:

33 Statistical qualities of LSE  Theorem [Gauss-Markov] LSE is unbiased & has minimum variance. Proof:

34  Maximum likelihood (ML) estimator

35 Maximum likelihood (ML) estimator  The problem –Suppose we observe m independent samples x 1, x 2, …, x m, –coming from a probability density function with parameters  1, …,  r

36 Maximum likelihood (ML) estimator  The criterion for choosing  is: –Choose parameters  that maximize data probability Which one do you prefer? Why?

37 Maximum likelihood (ML) estimator  Likelihood function definition: –For a sample of n observations x 1, x 2, …, x m –with independent probability density function f, –the likelihood function L is defined by L is the joint probability density

38 Maximum likelihood (ML) estimator  ML estimator is defined as the value of  which maximizes L: or equivalently:

39 Maximum likelihood (ML) estimator  Example: ML estimation for normal distribution –Suppose we have m indipendent samples x 1, x 2, …, x m, coming from a Gaussian distribution with parameters μ and σ 2. Which is the MLE for μ and σ 2 ?

40 Maximum likelihood (ML) estimator  Example: ML estimation for normal distribution –For m observations x 1, x 2, …, x m, we have:

41 Maximum likelihood (ML) estimator  Example: ML estimation for normal distribution –For m observations x 1, x 2, …, x m, we have:

42  Maximum likelihood estimator for linear model

43 Maximum likelihood estimator for linear model –Let a linear model be given as –Then –here e has PDF p e (u,θ) (independent). The likelihood function is given by

44 Maximum likelihood estimator for linear model –Asume a regression model where errors are distributed normally with zero mean. –The likelihood function is given by

45 Maximum likelihood estimator for linear model  The maximum likelihood model –Any algorithm that maximizes  –gives de Maximum likelihood model with respect to a given family of possible models

46 Maximum likelihood estimator for linear model –Same as maximizing –Same as minimizing

47 Connection to Least Squares  Conclusion –The least-squares fitting criterion can be understood as emerging from the use of the maximum likelihood principle for estimating a regression model where errors are distributed normally. –The applicability of the least-squares method is, however, not limited to the normality assumption.

48  LSE for Nonlinear Models

49 LSE for Nonlinear Models  Nonlinear models are divided into 2 families –Intrinsically linear –Intrinsically nonlinear Through appropriate transformations of the input- output variables & fitting parameters, an intrinsically linear model can become a linear model By this transformation into linear models, LSE can be used to optimize the unknown parameters

50 LSE for Nonlinear Models  Examples of intrinsically linear systems

51  Developing Dinamic models from Data

52 Dynamical System? Input u(t) Output y(t) System

53 The ARX model  In dynamic systems analysis, the independent variable is often time (k) –A ARX model (AutoRegressive with eXogenous input model) is often used where

54 The ARX model  Or equivalently –writing

55 The ARX model as a linear regressor  Input-output relationship can take the form –where Regression vector Parameter vector to estimate

56 Prediction error model estimation  The problem –Assume input-output data –Build the predictor –Such that minimizes Prediction Error

57 Prediction error model estimation –The model is fitted to the data by minimizing the criterion function Which gives the least squares criterion

58 Prediction error model estimation  Solution –Normal equation –Estimates

59 Prediction error model estimation  In matrix form, the solution is the standard linear least squares formula

60  Example: Tank level modeling

61 Example: Tank level modeling

62 Example Tank level modeling  The identification goal –To explain how the voltage u(t) (the input) afects the water level h(t) (the output) of the tank Experimetal data

63 Simple ARX modeling  A plausible first identification attempt is to try a simple linear regression model –The parameters can easily be estimated using linear least squares, resulting in

64 ARX model results –Simulated water level follows the true level but at levels close to zero the linear model produces negative levels.

65 Semiphysical modeling  Model equation is based on dynamic conservation of mass –Accumulation of mass in the tank is equal to: the mass flow rate into the tank the mass flow rate out. minus

66 Semiphysical modeling  While the inflow is roughly proportional to u(t) the outflow can be approximated using Bernoulli’s law –The parameters can easily be estimated using linear least squares, resulting in

67 Semiphysical model results  The RMS error of this model is lower and more importantly no simulated output is negative which indicates that the model is physically sound

68 Sources  J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall,  Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005  Henrik Melgaard, Identication of Physical Models. Institute of Mathematical Modelling, Technical University of Denmark. Ph.D. THESIS  Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano  Peter Lindskog, Fuzzy Identification from a Grey Box Modeling Point of View. Department of Electrical Engineering, Linkoping University  Jacob Roll, Local and Piecewise Afinne Approaches to System Identification. Department of Electrical Engineering, Linkoping University, Linkoping, Sweden. 2003