L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,

Slides:



Advertisements
Similar presentations
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Multiple Regression Analysis
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter Outline 3.1 Introduction
The Simple Regression Model
Statistical Techniques I EXST7005 Simple Linear Regression.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Cal State Northridge  320 Andrew Ainsworth PhD Regression.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
A quick introduction to the analysis of questionnaire data John Richardson.
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Lecture 6: Multiple Regression
Ordinary least squares regression (OLS)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Basics of regression analysis
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 15: Model Building
PSY 307 – Statistics for the Behavioral Sciences Chapter 7 – Regression.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Simple Linear Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Regression and Correlation Methods Judy Zhong Ph.D.
Correlation and Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.
Model Building III – Remedial Measures KNNL – Chapter 11.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression David A. Kenny January 12, 2014.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Regression Analysis AGEC 784.
CJT 765: Structural Equation Modeling
Evaluation of measuring tools: reliability
Linear Model Selection and regularization
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite, Data, and Information Service Washington, D.C.

L.M. McMillin NOAA/NESDIS/ORA Pick one - This is All you ever wanted to know about regression All you never wanted to know about regression

L.M. McMillin NOAA/NESDIS/ORA Overview What is regression? How correlated predictors affect the solution Synthetic regression or real regression? Regression with constraints Theory and applications Classification Normalized regression AMSU sample Recommendations

L.M. McMillin NOAA/NESDIS/ORA Regression - What are we trying to do? Obtain the estimate with the lowest RMS error. Or Obtain the true relationship

L.M. McMillin NOAA/NESDIS/ORA Which line do we want?

L.M. McMillin NOAA/NESDIS/ORA Considerations Single predictors –Easy Multiple uncorrelated predictors –Easy Multiple predictors with correlations –Assume two predictors are highly correlated and each has a noise –Difference is small with a larger noise than any of the two And that is the problem Theoretical approach is hard for these cases –If there is an independent component, then you want the difference –If they are perfectly correlated, then you want to average to reduce noise

L.M. McMillin NOAA/NESDIS/ORA Considerations continued Observational approach using stepwise regression Stability depends on the ratio of predictors to the predictands Stepwise steps –1. Find the predictor with the highest correlation with the predictand –2. Generate the regression coefficient –3. Make the predictands orthogonal to the selected predictor –4. Make all remaining predictors orthogonal to the selected predictor Problem –When two predictors are highly correlated and one is removed, the calculation of correlation of the other one involves a division by essentially zero –The other predictor is selected next –The predictors end up with large coefficients of opposite sign

L.M. McMillin NOAA/NESDIS/ORA Considerations continued Consider two predictands with the same predictors –Stable case (temperature for example) The correlation with the predictand is high –Unstable case (water vapor for example) The correlation with the predictand is low Essentially when a selected predictor is removed from the predictand and the predictors –If the residual variance of the predictand decays at least as fast as the residual variance of the predictors, the solution remains stable

L.M. McMillin NOAA/NESDIS/ORA Considerations continued Desire –Damp the small eigenvectors but don’t damp the regression coefficients –C = YX T (XX T ) -1 –But when removing the variable want to use (XX T +   Solutions –Decrease the contributions from the smaller eigenvectors This damps the slope of the regression coefficients and forces the solution towards the mean value –Alternatives Increase the constraint with each step of the stepwise regression But no theory exists

L.M. McMillin NOAA/NESDIS/ORA Regression Retrievals T = T guess + C(R – R guess ) R is measured R guess –Measured Apples subtracted from apples (measured – measured) –Calculated Apples subtracted from oranges (measured – calculated) This leads to a need for bias adjustment (tuning)

L.M. McMillin NOAA/NESDIS/ORA Synthetic or Real Synthetic regression – use calculated radiances to generate regression coefficients –Errors Can be controlled Need to be realistic –Sample needs to be representative –Systematic errors result if measurements and calculations are not perfectly matched Real regression - uses matches with radiosondes –Compares measured to measured - no bias adjustment needed –Sample size issues - sample size can be hard to achieve –Sample consistency across scan spots - different samples for each angle –Additional errors - match errors, “truth” errors

L.M. McMillin NOAA/NESDIS/ORA Regression with constraints Why add constraints? –Problem is often singular or nearly so Possible regressions –Normal regression –Ridge regression –Shrinkage –Rotated regression –Orthogonal regression –Eigenvector regression –Stepwise regression –Stagewise regression –Search all combinations for a subset

L.M. McMillin NOAA/NESDIS/ORA Definitions Y = predictands X = predictors C = coefficients C normal = normal coefficients C 0 = initial coefficients C ridge = ridge coefficients C shrinkage = shrinkage coefficients C rotated = rotated coefficients C orthogonal = orthogonal coefficients C eigenvector = eigenvector coefficients

L.M. McMillin NOAA/NESDIS/ORA Definitions continued  = a constant  = errors in y  = errors in x X t = true value when known

L.M. McMillin NOAA/NESDIS/ORA Equations Y = C X C normal = YX T (XX T ) –1 C ridge = YX T (XX T +  I) –1 C shrinkage = (YX T +  C 0 )(XX T +  I) –1 C rotated = (YY T C 0 T + YX T +  C 0 )(C 0 T YX T + XX T +  I) -1 C orthogonal = multiple rotated regression until solution converges Note many of these differ only in the directions used to calculate the components –The first 3 minimize differences along the y direction –Rotated minimizes differences perpendicular to the previous solution –Orthogonal minimizes differences perpendicular to the final solution

L.M. McMillin NOAA/NESDIS/ORA Regression examples

L.M. McMillin NOAA/NESDIS/ORA Regression examples

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Constraint summary True relationship Y = 1.2 X Guess Y = 1.0 X ss = Ordinary Least Squares –Y = 0.71 X ss = Ridge - gamma = 2 –Y = 0.64 X ss = Shrinkage - gamma = 2 –Y = 0.74 X ss = Rotated - gamma = 4 ( equivalent to gamma = 2) –Y = 1.15 X ss = ss = 7.35 in rotated space Orthogonal - gamma = 4 –Y = 1.22 X ss = ss = 7.29 in orthogonal space

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Regression Examples

L.M. McMillin NOAA/NESDIS/ORA Popular myth - Or the devil is in the details Two regression can be replaced by a single one Y = C X X = D Z Y = E Z Then Y = C D Z and E = C D True for normal regression but false for any constrained regression In particular, if X is a predicted value of Y from Z using an initial set of coefficients and C is obtained using a constrained regression, then the constrain is in a direction determined by D. If this is iterated, it becomes rotated regression.

L.M. McMillin NOAA/NESDIS/ORA Regression with Classification Pro –Starts with a good guess Con –Decreases the signal to noise ratio –Can get a series of means values –With noise, the adjacent groups have jumps at the boundaries

L.M. McMillin NOAA/NESDIS/ORA normalized regression Subtract the mean from both X an Y Divide by the standard deviation Theoretically makes no difference But numerical precision is not theory Good for variables with large dynamic range Recent experience with eigenvectors suggests dividing radiances by the noise

L.M. McMillin NOAA/NESDIS/ORA Example - Tuning AMSU on AQUA Predictors are the channel values Predictands are the observed minus calculated differences

L.M. McMillin NOAA/NESDIS/ORA Measured minus calculated

L.M. McMillin NOAA/NESDIS/ORA The predictors

L.M. McMillin NOAA/NESDIS/ORA Ordinary Least Squares

L.M. McMillin NOAA/NESDIS/ORA Ridge Regression

L.M. McMillin NOAA/NESDIS/ORA Shrinkage

L.M. McMillin NOAA/NESDIS/ORA Rotated Regression

L.M. McMillin NOAA/NESDIS/ORA Orthogonal Regression

L.M. McMillin NOAA/NESDIS/ORA Results Summarized Maximum means maximum absolute value Ordinary least squares - max coefficient = Ridge regression - max coefficient = Shrinkage - max coefficient = –Shrinkage to 0 as the guess coefficient is the same as ridge regression Rotated regression - max coefficient = –Rotated to the ordinary least squares solution Orthogonal Regression

L.M. McMillin NOAA/NESDIS/ORA Recommendations Think Know what you are doing