Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Ordinary Least-Squares
General Linear Model With correlated error terms  =  2 V ≠  2 I.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Integration of sensory modalities
The General Linear Model. The Simple Linear Model Linear Regression.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
The Simple Linear Regression Model: Specification and Estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Ordinary least squares regression (OLS)
Linear and generalised linear models
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Linear regression models in matrix terms. The regression function in matrix terms.
Modern Navigation Thomas Herring
Lecture 10A: Matrix Algebra. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns)
Today Wrap up of probability Vectors, Matrices. Calculus
GEO7600 Inverse Theory 09 Sep 2008 Inverse Theory: Goals are to (1) Solve for parameters from observational data; (2) Know something about the range of.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 26: Singular Value Decomposition.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Chapter 28 Cononical Correction Regression Analysis used for Temperature Retrieval.
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
Review of statistical modeling and probability theory Alan Moses ML4bio.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Geology 6600/7600 Signal Analysis 04 Sep 2014 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2015 Professor Brandon A. Jones Lecture 26: Cholesky and Singular Value.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Geology 5670/6670 Inverse Theory 6 Feb 2015 © A.R. Lowry 2015 Read for Mon 9 Feb: Menke Ch 5 (89-114) Last time: The Generalized Inverse; Damped LS The.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
ASEN 5070: Statistical Orbit Determination I Fall 2014
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Modelling data and curve fitting
OVERVIEW OF LINEAR MODELS
Singular Value Decomposition SVD
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Topic 11: Matrix Approach to Linear Regression
Presentation transcript:

Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”

Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”

Outline 1.Discussion of yesterday’s exercise 2.The mathematics of regression 3.Solution of the normal equations 4.Probability and likelihood 5.Sample exercise: Mauna Loa CO 2 6.Sample exercise: TransCom3 inversion

sara/statistics_course/andy/R/ corr_exer.r18 July practical mauna_loa.rToday’s first example transcom3.rToday’s second example dot-RprofileRename to ~/.Rprofile (i.e., home dir) hclimate.indices.rGet SOI, NAO, PDO, etc. from CDC cov2cor.rConvert covariance to correlation ferret.palette.rUse nice ferret color palettes geo.axes.rFormat degree symbols, etc., for maps load.ncdf.rQuickly load a whole netCDF file svd.invert.rMultiple linear regression using SVD mat4.rRead and write Matlab.mat files (v4 only) svd_invert.mMultiple linear regression using SVD (Matlab) atm0_m1.matData for the TransCom3 example R-intro.pdfBasic R documentation faraway_pra_book.pdfJulian Faraway’s “Practical Regression and ANOVA in R” book

Multiple Linear Regression Data Parameters Basis Set

Basis Functions “Design matrix” A gives values of each basis function at each observation location. Basis Functions Observations Note that one column of (e.g., a i 1 ) may be all ones, to represent the “intercept”.

From the Cost Function to the Normal Equations “Least squares” optimization minimizes sum of squared residuals (misfits to data). For the time being, we assume that the residuals are IID: Expanding terms: Cost is minimized when derivative w.r.t. x vanishes: Rearranging: Optimal parameter values (note that A T A must be invertible):

x-hat is BLUE BLUE = Best Linear Unbiased Estimate (not shown here: “best”)

Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: For every M x N matrix A, there exists a singular value decomposition (SVD): U is M x M S is N x N V is N x N S is diagonal and contains the Singular Values The columns of U and V are orthogonal to one another: The pseudo- inverse is thus:

Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: The pseudo- inverse is: where

Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: The pseudo- inverse is: And the parameter uncertainty covariance matrix is: with

Gaussian Probability and Least Squares Residuals vector: Probability of r i : Likelihood of r : N.B.: Only true if residuals are uncorrelated (independent). PredictionsObservations

Maximum Likelihood Log-Likelihood of r : Goodness-of-fit:  2 for N-M degrees of freedom has a known distribution, so regression models such as this can be judged on the probability of getting a given value of  2.

Probability and Least Squares Why should we expect Gaussian residuals?

Random Processes z1 <- runif(5000)

Random Processes hist(z1)

Random Processes z1 <- runif(5000)z2 <- runif(5000) What is the distribution of (z1 + z2) ?

Triangular Distribution hist(z1+z2)

Central Limit Theorem There are more ways to get a central value than an extreme one.

Probability and Least Squares Why should we expect Gaussian residuals? (1) Because the Central Limit Theorem is on our side. (2) Note that the LS solution is always a minimum variance solution, which is useful by itself. The “maximum-likelihood” interpretation is more of a goal than a reality.

Weighted Least Squares: More General “Data” Errors Minimizing the  2 is equivalent to minimizing a cost function containing a covariance matrix C of data errors: The data error covariance matrix is often taken to be diagonal. This means that you put different levels of confidence on different observations (confidence assigned by assessing both measurement error and amount of trust in your basis functions and linear model). Note that this structure still assumes independence between the residuals.

Covariate Data Errors Recall cost function: Now allow off-diagonal covariances in C. N.B.  ij =  ji and  ii =  i 2. Multivariate normal PDF: J propagates without trouble into the likelihood expression. Minimizing J still maximizes the likelihood

Fundamental Trick for Weighted and Generalized Least Squares Transform system (A,b,C) with data covariance matrix C into system (A’,b’,C’), where C’ is the identity matrix: The Cholesky decomposition computes a “matrix square root” such that if R=chol(C), then C=RR. You can then solve the Ordinary Least Squares problem A’x = b’, using for instance the SVD method. Note that x remains in regular, untransformed space.