Environmental Data Analysis with MatLab Lecture 5: Linear Models.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Lecture 22 Exemplary Inverse Problems including Filter Design.
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Data mining in 1D: curve fitting
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
The Basics of Inversion
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
Lecture 5 A Priori Information and Weighted Least Squared.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 8 Advanced Topics in Least Squares - Part Two -
3D Geometry for Computer Graphics. 2 The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Function Approximation
Lecture 3 Review of Linear Algebra Simple least-squares.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)
Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Computer Graphics Recitation The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Chapter 10 Real Inner Products and Least-Square (cont.)
Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
G(m)=d mathematical model d data m model G operator d=G(m true )+  = d true +  Forward problem: find d given m Inverse problem (discrete parameter estimation):
Curve-Fitting Regression
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Computacion Inteligente Least-Square Methods for System Identification.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
ELG5377 Adaptive Signal Processing Lecture 13: Method of Least Squares.
Estimating standard error using bootstrap
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Environmental Data Analysis with MatLab 2nd Edition
Modelling data and curve fitting
Nonlinear Fitting.
Environmental Data Analysis with MatLab
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 5: Linear Models

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectra Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture develop and apply the concept of a Linear Model

data, d what we measure model parameters, m what we want to know quantitative model links model parameters to data

data, d carats, color, clarity model parameters, m dollar value, celebrity value quantitative model economic model for diamonds Photo credit: Wikipedia Commons

general case

N = number of observations, d M = number of model parameters, m usually (but not always) N>M many data, a few model parameters

special case of a linear model =

The matrix G is called the data kernel it embodies the quantitative model the relationship between the data and the model parameters

because of observational noise no m can exactly satisfy this equation it can only be satisfied approximately d ≈ Gm

data, d pre prediction of data model parameters, m est estimate of model parameters quantitative model evaluate equation

data, d obs observation of data model parameters, m est estimate of model parameters quantitative model solve equation

because of observational noise m est ≠m true the estimated model parameters differ from the true model parameters and d pre ≠d obs the predicted data differ from the observed data

the simplest of linear models

fitting a straight line to data

interpretion of x i the model is only linear when the x i ’s are neither data nor model parameters we will call them auxiliary variables they are assumed to be exactly known they specify the geometry of the experiment

MatLab script for G in straight line case M=2; G=zeros(N,M); G(:,1)=1; G(:,2)=x;

fitting a quadratic curve to data

MatLab script for G in quadratic case M=3; G=zeros(N,M); G(:,1)=1; G(:,2)=x; G(:,3)=x.^2;

fitting a sum of known functions

fitting a sum of cosines and sines (Fourier series)

i A) Polynomial i j B) Fourier series 1 M 1 M 1 1 N N j grey-scale images of data kernels

i 1 M 1 N Gc (1) c (2) c (3) c (4) c (M) any data kernel can be thought of as a concatenation of its columns

thought of this way, the equation d=Gm means

sometimes, models do represent literal mixing but more often the mixing is more abstract

any data kernel also can be thought of as a concatenation of its rows

thought of this way, the equation d=Gm means data is a weighted average of the model parameters for example, if weighted average

sometimes the model represents literal averaging data kernels for running averages i 1 M 1 N A) three points j i 1 M 1 N B) five points j i 1 M 1 N C) seven points j but more often the averaging is more abstract

MatLab script data kernel for a running-average w = [2, 1]'; Lw = length(w); n = 2*sum(w)-w(1); w = w/n; r = zeros(M,1); c = zeros(N,1); r(1:Lw)=w; c(1:Lw)=w; G = toeplitz(c,r);

averaging doesn’t have to be symmetric with this data kernel, each d i is a weighted average of m j, with i≥j, that is, just “past and present” model parameters.

the prediction error error vector, e

prediction error in straight line case auxiliary variable, x data, d d i pre d i obs eiei

total error single number summarizing the error sum of squares of individual errors

principle of least-squares that minimizes

MatLab script for total error dpre = G*mest; e=dobs-dpre; E = e'*e;

grid search strategy for finding the m that minimizes E(m) try lots of combinations of (m 1, m 2, …) … a grid of combinations … pick the combination with the smallest E as m est.

m 1 est 0 4 m2m2 0 4 point of minimum error, E min m1m1 m 2 est region of low error, E

the best m is at the point of minimum E choose that as m est but, actually, any m in the region of low E is almost as good as m est. especially since E is effected by measurement error if the experiment was repeated, the results would be slightly different, anyway

the shape of the region of low error is related to the covariance of the estimated model parameters (more on this in the next lecture)

think about error surfaces leads to important insights but actually calculating an error surface with a grid search so as to locate m est is not very practical in the next lecture we will develop a solution to the least squares problem that doesn’t require a grid search