Seasonal Forecasting Using the Climate Predictability Tool

Slides:



Advertisements
Similar presentations
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Multiple Linear Regression uses 2 or more predictors General form: Let us take simplest multiple regression case--two predictors: Here, the b’s are not.
General Linear Model Introduction to ANOVA.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Climate Predictability Tool (CPT)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Lecture 7: Principal component analysis (PCA)
Chapter 4 Multiple Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
A Regression Model for Ensemble Forecasts David Unger Climate Prediction Center.
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Objectives of Multiple Regression
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Seasonal Forecasting Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Climate Predictability Tool (CPT) Ousmane Ndiaye and Simon J. Mason International Research Institute for Climate and Society The Earth.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
MULTIPLE REGRESSION Using more than one variable to predict another.
Further Topics in Regression Analysis Objectives: By the end of this section, I will be able to… 1) Explain prediction error, calculate SSE, and.
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Model validation Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
Regression Regression relationship = trend + scatter
Autocorrelation in Time Series KNNL – Chapter 12.
Mathematics of PCR and CCA Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January.
Interpreting Principal Components Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University L i n.
Forecasting in CPT Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
Principal Components: A Conceptual Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
Climate Predictability Tool (CPT) Ousmane Ndiaye and Simon J. Mason International Research Institute for Climate and Society The Earth.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Linear Regression Simon Mason Seasonal Forecasting Using the Climate Predictability Tool Bangkok, Thailand, 12 – 16 January 2015.
Statistical Summary ATM 305 – 12 November Review of Primary Statistics Mean Median Mode x i - scalar quantity N - number of observations Value at.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Stats Methods at IC Lecture 3: Regression.
Module II Lecture 1: Multiple Regression
Multiple Regression.
Chapter 12 Understanding Research Results: Description and Correlation
Statistical analysis.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Mentor: Dr. Jinchun Yuan
Correlation and Simple Linear Regression
Statistical analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter 4: Seasonal Series: Forecasting and Decomposition
Multiple Regression.
Principal Components: A Conceptual Introduction
BPK 304W Regression Linear Regression Least Sum of Squares
Simple Linear Regression
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Interpreting Principal Components
Multiple Regression.
Multivariate Statistics
Simple Linear Regression
Regression Forecasting and Model Building
Can we distinguish wet years from dry years?
Seasonal Forecasting Using the Climate Predictability Tool
Principal Component Analysis
Seasonal Forecasting Using the Climate Predictability Tool
Seasonal Forecasting Using the Climate Predictability Tool
Factor Analysis.
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 7 Excel Extension: Now You Try!
Correlation and Simple Linear Regression
Presentation transcript:

Seasonal Forecasting Using the Climate Predictability Tool Principal Components Regression Simon Mason simon@iri.columbia.edu Seasonal Forecasting Using the Climate Predictability Tool

Linear Regression in CPT In CPT linear regression is performed using the MLR (multiple linear regression) option. The MLR (multiple linear regression) option allows for more than one predictor: But what happens when we have lots of predictors (k is large)? …

Problems with Multiple Linear Regression (MLR) Multicolinearity - Predictors are strongly correlated. Predicting MAM 1961 – 2010 rainfall for Thailand from NIÑO4 SSTs: Correlation between NINO4Jan and NINO4Feb is 0.97. The correlation between June and July SST is 0.88 For the first half of the data (1961 – 1985) only:

Problems with Multiple Linear Regression Multiplicity - Too many predictors from which to choose. With more than a handful of candidate predictors, the probability of including a least one spurious predictor (and therefore of subsequently making a bad prediction) becomes very high.

Exercise Using the NINO indices, how well can we predict rainfall over Thailand at increasing lead-times? Create a file combining 2 or more lead-times as separate predictors. Repeat the calculations using this new file. Does the skill improve? Compare the regression equation for the three predictors with the equations for the three months individually. Now try calculating a seasonal average of the predictors for lead-times of interest. Which gives the best results: one month predictors, predictors for multiple months, or seasonal predictors?

Principal Components The principal components are defined like a weighted average of the original data: If the sum of the “weights” added to 1.0 then the principal components would be a true weighted average. However, the squares of the weights are made to add to 1.0; the variance of the original data is then retained.

Principal Components Regression Instead of using the original data as predictors, we can use the principal components as predictors in the same simple regression model. The PCR option contains the information in many of the original predictors, and so a complex MLR model can be simplified considerably:

Principal Components A principal component is a weighted sum of a set of original variables, with the weights set so that the principal component has maximum variance. Instead of using all the gridboxes as individual predictors, we define a spatial pattern (or “mode” or “principal component”) of, in this case, SSTs, and then calculate how similar the observed pattern of SSTs is to this mode. We then use this measure of similarity as the predictor. So in the example shown, the predictor indicates whether we have large-scale warming in the central Pacific Ocean (i.e., something akin to El Nino). If there is large-scale warming, the predictor scores strongly positive (e.g., as in 1997); if the observed SSTs are opposite of the mode, the predictor scores strongly negative; if the observed SSTs did not resemble the mode at all then the predictor scores zero. A completely different spatial pattern can then be defined to give another predictor, etc. The modes are defined to have as much variance as possible, which essentially means that they are defined so that in as many of the years as possible the observed SSTs resemble the modes. We can therefore use only a few modes to represent the total variability in the original gridded dataset. In representing the original data with only a few modes we now have a small number of predictors in our prediction model, and thus considerably reduce the multiplicity problem, plus because the modes are uncorrelated with each other (the patterns of SST are defined to be completely different) we eliminate the multicolinearity problem. Scores and loadings for first principal component of February 1961 – 2000 sea-surface temperatures.

Principal Components The score indicates how intensely developed the loading pattern is for each year. ????

Principal Components Separate patterns (“modes”) of variability can be defined. We can use just a few of these modes to represent the SST variability throughout the domain. Scores and loadings for second principal component of February 1961 – 2000 sea-surface temperatures.

Why PCR? When using principal components of sea-surface temperatures the components have desirable features: They explain maximum amounts of variance, and therefore are representative of sea temperature variability over large areas; They are uncorrelated, and so errors in estimating the regression parameters are minimized; Only a few need be retained and so the dangers of fishing are minimized.

Summary Multiple regression has two serious problems: multicolinearity: if predictors are correlated the coefficients become difficult to understand, and can be very sensitive to the sample; multiplicity: if there are lots of predictors, the chances of one or more of them working well by accident becomes very large. Principal components regression can resolve the multicolinearity problem; it can reduce the multiplicity problem.

Exercise Use gridded SSTs to predict Thailand rainfall. What considerations can we apply for selecting an appropriate SST domain and setting the number of modes?

CPT Help Desk web: iri.columbia.edu/cpt/ @climatesociety cpt@iri.columbia.edu @climatesociety …/climatesociety