III-1 WMO course - “Statistics and Climatology” - Lecture III Dr. Bertrand Timbal Regional Meteorological Training Centre, Tehran, Iran December 2003
III-2 Statistics of the Climate system --- Spatio-temporal linkages within the system Overview: 1.Links within the system: the example of ENSO 2.Regression and correlation of variables 3.Spatial structures: reduction of the degree of freedom statistical tools Review some classical statistical tools Statistics and Climatology: Lecture III
III-3 Schematic of summer La Niña conditions across the Equatorial Pacific Ocean El Niño / La Niña : a large scale feature
III-4 Schematic of summer EL Niño conditions across the Equatorial Pacific Ocean El Niño / La Niña: a large scale feature
III-5 Temperature, along an equatorial longitude-depth section Anomalies are relevant for interannual variability Observed with the TAO: array of buoys in the Tropical Pacific Thermocline movements important for seasonal forecasting Thermocline: Layer of strong temp gradient around 20C El Niño: a large scale feature
III-6 El Niño: sub-surface ocean anomalies El Niño formation Anomalous warm water accumulated at depth in the West Pacific and travel across the basin along the thermocline The predictability comes from the slow moving ocean anomalies
III-7 Transition to the La Niña
III-8 El Niño: air-sea interactions
III-9 El Niño: air-sea interactions
III-10 El Niño: Global Tele- connections Courtesy of NOAA
III-11 La Niña: Global Tele- connections Courtesy of NOAA
III-12 El Niño: impact on Australian rainfall Stratification of the mean climate based on ENSO phases
III-13 La Niña: impact on Australian rainfall Stratification of the mean climate based on ENSO phases
III-14 Probability of exceeding median rainfall for Cold, Neutral and Warm conditions in the Equatorial Pacific Ocean (Data for ) El Niño: global impact on rainfall Stratification of the mean climate based on ENSO phases.
III-15 El Niño: impact on Australian Wheat Yields
III-16 Links within the climate system exist: El Niño is a planetary scale phenomenon Several variables exhibit coherent variations (correlation) Distant teleconnections are observed (lag correlation) Probabilities are shifted by ENSO phases (predictable) How to best express these relationships ?
III-17 Statistics of the Climate system --- Spatio-temporal linkages within the system Overview: 1.Links within the system: the example of ENSO 2.Regression and correlation of variables 3.Spatial structures: reduction of the degree of freedom
III-18 Simple model: Least-Squares Regression Regression: Correlation: Pearson ordinary correlation (r) a is the intercept for X=0 b is the slope: r 2 is the amount of variance explained
III-19 r = r = Courtesy of J. Stockburger Role of outliers: Outlier detection method to find observations with large influence Problem often arises from either erroneous data or small sample Graphical visualisation is essential In this example, out of 100 points, only one data is different !
III-20 False correlation based on one erroneous data Perfect relationship affected by one data In all cases, the correlation is r=0.816 but … The relationship is not linear. Graphical visualisation of correlation Correlation is not robust and resistant …. Instead we can use the rank correlation: correlation based on ranked data
III-21 Annual SW WA Rainfall AswWArain20 per. Mov. Avg. (AswWArain) Annual SW WA Rainfall AswWArain20 per. Mov. Avg. (AswWArain) An example of a non linear relation Rainfall and river flow Courtesy of S. Power
III-22 Correlations between seasonal rain and SOI Correlation is not causation! Correlation does not imply causation Simultaneous evolution Others techniques are needed: Path analysis (Blalock, 1971) Temporal precedence Is ENSO forced by Australian rainfall? or Are Australian rainfall affected by ENSO? Courtesy of W. Drosdowsky
III-23 Lag Correlation and auto-correlation Lagged correlation between the SOI and cyclone formation Lag correlation of a series with itself is auto-correlation at lag-k: Meteorological variables are auto-correlated (persistence) Violate the independent data assumption effective sample size Hypothesis testing Variance estimate (Prior) Lag correlations exhibit the dependence between variables Predictability arises from lag correlation
III-24 Correlation in the climate system: Correlation coefficientes express the part of the variation of two variables which are linked (no causality) Correlation assumes normality (!) and linear relation (!) A more robust coefficient is the rank correlation Lag correlation is useful for causality and predictability Auto-correlation of meteorological data has serious consequences for the use of statistics in climate
III-25 Statistics of the Climate system --- Spatio-temporal linkages within the system Overview: 1.Links within the system: the example of ENSO 2.Regression and correlation of variables 3.Spatial structures: reduction of the degree of freedom
III-26 Spatial structure in climate data Several motivations to identify large scale spatial features: Data are not spatially independent: spatial correlation Large scale structures are more coherent and predictable Extract the large scale climate signal Reduce the weather noise associated with small scales Smaller degree of freedom and reduced data set Identify useful relationships to exploit for climate forecasting
III-27 Principal-Component (EOF) Analysis Objective : To reduce the original data set to a new data set of (much) fewer variables To condense a large fraction of the variance of the original dataset To explore large multivariate data sets (spatial and temporal variation) Calculation : PCA are done on anomalies Based on the covariance [S] or the correlation [R] matrix of a vector X: X T X The principal components are the projection of X on the eigenvectors of [S]: e i orthogonal one to an other: new coordinate system maximise the variance: measured by the eigenvalues ( λ i )
III-28 Principal-Component (EOF) Analysis Eigenvectors (PCA) are orthogonal Strong constraint for small domain (Jolliffe, 1989) Typically the 2 nd PC is a dipole (not necessarily meaningful) The number of PCs to be consider is based on the eigenvalues
III hPa 850 hPa EOFs of combined fields: Courtesy of M. Wheeler
III-30 The phase-space representation of the MJO M(t) = [RMM1(t),RMM2(t)] Vector M traces: - large anti-clockwise circles about the origin when the MJO is strong. - random jiggles around the origin when the MJO is weak. For compositing, we define the 8 equal-angle phases as labeled, and described by the angle Φ = tan -1 [RMM2(t)/RMM1(t)] Southern Summer = DJFMA Courtesy of M. Wheeler
III-31 MJO propagation based on vector M in the two dimensional phase space OLR contour interval = 4 Wm -2 blue negative 850 hPa wind Max vector = 4.5 ms -1 Courtesy of M. Wheeler
III-32 First two rotated PCAs of Indian/Pacific SSTAs using data from Jan 1949 to Dec Rotated PCs Courtesy of W. Drosdowsky Facilitate physical interpretation Review by Richman (1986) and by Jolliffe (1989, 2002) New set of variable: RPCs Varimax is a very classic rotation technique (many others)
III-33 Other multivariate analyses Extended EOFs and Complex (Hilbert) EOFs are two classical extensions of PCs Canonical Correlation Analysis: extension of PCA to two multivariate data sets: forecasting one variable with the other (book by Wilks, 1995). Principal Orthogonal Pattern (POP) and (PIP), SVD are other techniques used (book by von Storch and Navarra, 1995 and von Storch and Zwiers, 1999) Discriminant analysis (e.g. the operational seasonal forecast of the BoM): the conditioning is on the predictand and in a sense the reverse conditional probabilities are estimated from the data, and Bayes theorem is used to invert these (article by Drosdowsky and Chambers, 2001) Analogue (lecture 7), clustering (book by Wilks, 1995) and NHMM (next slide) are other techniques dealing with classification. All techniques can be use for forecasting and downscaling
III-34 An other downscaling approach Non-homogeneous Hidden Markov Model: makes use of non observed “hidden” weather states which are related to observed rainfall structures Courtesy of S. Charles
III-35 Summary: Many interactions in the system correlation Many issues with correlation: robustness, causality Large scale structure exist multivariate analyses Useful for filtering, organizing and reducing the noise in data Forecasting uses many of these statistical tools Tool box to analyse our dynamic climate system …. and … basis for climate forecasting