Chapter 12: Data Analysis by linear least squares

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Advertisements

Numeriska beräkningar i Naturvetenskap och Teknik Today’s topic: Approximations Least square method Interpolations Fit of polynomials Splines.
Chapter 10 Curve Fitting and Regression Analysis
COMP 116: Introduction to Scientific Programming Lecture 11: Linear Regression.
Data mining and statistical learning - lecture 6
Data mining in 1D: curve fitting
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
3D Geometry for Computer Graphics. 2 The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Curve-Fitting Regression
Lecture 12 Projection and Least Square Approximation Shang-Hua Teng.
Lecture 12 Least Square Approximation Shang-Hua Teng.
Ordinary least squares regression (OLS)
Linear and generalised linear models
Linear Least Squares QR Factorization. Systems of linear equations Problem to solve: M x = b Given M x = b : Is there a solution? Is the solution unique?
Linear and generalised linear models
Computer Graphics Recitation The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Chapter 10 Real Inner Products and Least-Square (cont.)
Least-Squares Regression
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Introduction to Error Analysis
Computing the Fundamental matrix Peter Praženica FMFI UK May 5, 2008.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Scientific Computing Linear Least Squares. Interpolation vs Approximation Recall: Given a set of (x,y) data points, Interpolation is the process of finding.
Fitting a line to N data points – 1 If we use then a, b are not independent. To make a, b independent, compute: Then use: Intercept = optimally weighted.
Curve-Fitting Regression
By Adam Mallen.  What is it?  How is it different from regression?  When would you use it?  What can go wrong?  How do we find the interpolating.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
Lecture 6 - Single Variable Problems & Systems of Equations CVEN 302 June 14, 2002.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system.
Let W be a subspace of R n, y any vector in R n, and the orthogonal projection of y onto W. …
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Assignment 1: due 1/19/16 Estimate all of the zero of x3-x2-2x+1 graphically. Write a MatLab code for Newton’s method. Use your code to refine the graphical.
Lecture 16: Image alignment
The Computational Method (mathematics)
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Part 5 - Chapter
Review of Matrix Operations
Salinity Calibration fit with MATLAB
Linear independence and matrix rank
Evgeniya Anatolievna Kolomak, Professor
Singular Value Decomposition
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Numerical Analysis Lecture 16.
J.-F. Pâris University of Houston
Linear regression Fitting a straight line to observations.
Nonlinear regression.
~ Least Squares example
5.2 Least-Squares Fit to a Straight Line
Nonlinear Fitting.
Discrete Least Squares Approximation
Numerical Analysis Lecture 17.
~ Least Squares example
Assignment 1: due 1/17/19 Estimate all of the zero of x3-x2-2x+1 graphically. Write a MatLab code for Newton’s method. Use your code to refine the graphical.
Parametric Estimation
Regression in 1D: Fit a line to data by minimizing squared residuals.
Test #1 Thursday September 20th
Approximation of Functions
Approximation of Functions
Presentation transcript:

Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system by optimization

Example: fit y = ax + b to m data points (xk, yk) Find unknowns a and b that minimize sum of squared residuals What is the objective function to be minimized? What are the equations for a and b?

Example: fit y = ax + b to m data points Find unknown a and b that minimize sum of squared residuals

If know uncertainty in data, (xk, yk, sk), find the line that goes through the error bars (i.e. thightest fit where the error is least). What is the objective function to be minimized? What are the equations for a and b?

m+1 data points indexed from zero

Data fitting formulated as an over-determined linear system: Given dataset {(tk,yk), k=1,...,m} and functions {fj(t), j=1,...,n}, find the linear combination of functions that best represents the data. Let akj = fj(tk) (jth function evaluated at kth data point) Let b = [y1, y2,...,ym]T (column vector of the measured values) Let x = [x1, x2,...,xn]T (column vector of unknown parameters) If n = m, then Ax = b might have a solution that gives to a combination of functions that goes through all of the data points. Even if this solution exist, if is probably not the result that we want

In this example, the model is a parabola Usually, we are seeking a model of the data as “noise” superimposed on a smooth variation described by a linear combination of f(t) functions {fj(t), j=1,...,n}, with n << m. Under the condition, n << m, Ax = b does not have a solution because x is “over determined” (i.e. more equations than unknown parameters) To get the “dimensional reduction” we want, find an approximate solution which minimizes a norm of the residual vector r = b – Ax Choosing the Euclidean norm for minimization  linear least squares In this example, the model is a parabola

Let y = Ax (like b, y is a column vector with m components) y  span(A) (y is a linear combination of the columns of A) For given data set, b is a constant vector and || b – y ||2 = f(y) f(y) is continuous and strictly convex. f(y) is guaranteed to have a minimum Find y such that f(y) is a minimum The vector y  span(A) that is closest to b is unique; however, this unique vector may not correspond to a unique set of unknown parameters x If Ax1 = Ax2 = y then A(x2 – x1) =0 and z = (x2 – x1)  o Columns of A must be linearly dependent. This condition is called rank deficiency Unless otherwise stated, assume A has full rank.

Let r = b – Ax and define f(x) = (||r||2)2 = rTr Normal Equations: Let r = b – Ax and define f(x) = (||r||2)2 = rTr f(x) = (b – Ax)T(b – Ax) = bTb –2xTATb + xTATAx A necessary condition for x0 to be minimum of f(x) is f(x0) = o, where f is an vector with n components equal to the partial derivatives of f(x) with respect to the unknown parameters f(x) = 2ATAx – 2ATb = o  the optimal set of parameters is a solution of the nxn symmetric system ATAx = ATb called the “normal” equations of the linear least squares problem.

Example: Use the normal equations to fit a line y = c2x +c1 to m data points (no error bars). What are the functions {fj(t), j=1,2}, that determine the matrix elements, akj = fj(tk) jth function evaluated at kth data point? What is the A matrix? What is the b vector? What is the vector of unknowns? Construct the normal equations.

Same set of equations as obtained by objective function method

Fit a polynomial of degree p-1 to m data points (xk, yk)

r = Yfit – y are the residuals at the data points Given the parameters, c0, c1,…,ck, that minimize the sum of squared deviations, evaluate fit at all the data points xt by r = Yfit – y are the residuals at the data points ||r||2 = rTr is the sum of squared residuals Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Fit parabola to data from text p495, surface tension as a function of temperature T = [0, 10, 20, 30, 40, 80,90, 95] S = [68.0, 67.1, 66.4, 65.6, 64.6, 61.8, 61.0, 60.0] Write a pseudo-code to solve this problem by normal equations.

Input data Construct the V matrix Construct the normal equations Solve the normal equations Evaluate the fit Calculate the sum of squared residuals Plot the data and fit on the same set of axes

Fit parabola to data from text p495, surface tension as a function of temperature T = [0, 10, 20, 30, 40, 80, 90, 95]’; % column vector S = [68.0, 67.1, 66.4, 65.6, 64.6, 61.8, 61.0, 60.0]’; %column vector npts=8; V=ones(npts,3); V(:,2)=T; T2=T.*T; V(:,3)=T2; A=V’*V; (3x8*8x3)=3x3 b=V’S; (3x8*8x1)=3x1 x=A\b; fit=V*x; r=fit-S; chisq=r’*r; plot(T,S,’*’); hold on plot(T,fit); hold off

Matrix formulation of weighted linear least squares Let Dy be a column vector of uncertainties in measured values w = 1./ Dy is a column vector of the square root of the weights W = diag(w) is a diagonal matrix V = coefficient matrix of the un-weighted least-squares problem (Vandermonde matrix in the case of polynomials fitting) y = column vector of observations A = W V = weighted coefficient matrix b = W y = weighted column vector of measured values Ax = b is over-determined linear system for weighted linear least squares normal equations ATAx = ATb become (WV)TWVx = (WV)T W y (note that w = 1./ Dy gets squared at this point)

Fit weighted parabola to data on surface tension vs temperature dS=[6, 2, 5, 3, 7, 8, 4,1]’; (note small uncertainty in last point) weights=1./dS W=diag(weights); weighted=W*S; Construct V matrix weightedV=W*V; Setup and solve normal equations Evaluate the fit Calculate the sum of squared residuals Plot the data and fit on the same set of axes

T = [0, 10, 20, 30, 40, 80, 90, 95]’; % column vector S = [68.0, 67.1, 66.4, 65.6, 64.6, 61.8, 61.0, 60.0]’; %column vector dS=[6, 2, 5, 3, 7, 8, 4,1]’; % note small uncertainty in last point npts=8; V=ones(npts,3); V(:,2)=T; T2=T.*T; V(:,3)=T2; weights=1./dS W=diag(weights); wS=W*S; wV=W*V; A=wV’*wV; % weights squared b=wV’*wS; %weights squared x=A\b; fit=V*x; r=fit-S; chisq=r’*r; errorbar(T,S,dS,’*’); hold on plot(T,fit); hold off

Plot using MatLab’s errorbar Uncertainties chosen to show how small uncertainty draws fit to data point at 95

Assignment 17 due 4/18/19 Use normal equations to fit a parabola to the data set t=linspace(0,10,21) y=[2.9, 2.7, 4.8, 5.3, 7.1, 7.6, 7.7, 7.6, 9.4, 9, 9.6,10, 10.2, 9.7, 8.3, 8.4, 9, 8.3, 6.6, 6.7, 4.1] with weights that the reciprocal of the square of the uncertain in y, which is 10% of the value of y. Plot the data with error bars and the fit on the same set of axes. Use MatLab’s errorbar(t,y,dy,’*’) function. Show the optimum value of the parameters and calculate the sum of squared deviations between fit and data.