Statistical analysis of global temperature and precipitation data Imre Bartos, Imre Jánosi Department of Physics of Complex Systems Eötvös University
Outline The GDCN database Correlation properties of temperature data Short-term Long-term Nonlinear Cumulants Extreme value statistics Recent results Degrees of Freedom estimation
Global Daily Climatology Network Temperature stationsPrecipitation stations stations…
Correlation properties Short-term correlationLong-term correlation TiTi
Correlation properties Short-term correlationLong-term correlation TiTi a i+1 = T i+1 - T i+1 = F(a i ) + i Short term memory: exponential decay Autoregressive process: Linear case: AR1 a i+1 = A a i + i C 1 ( ) = a i a i+ ~ A
Short-term correlation a i+1 = A a i + i in terms of temperature change: a i+1 = a i+1 – a i ~ T i+1 – T i = (A-1) a i + i thus the response function one measures: a i+1 = (A-1) a i + 0 The fitted curve: a i+1 = c 1 a i + c 0 Király, Jánosi, PRE (2002).
Short-term correlation a i+1 = c 1 a i + c 0 c0c0 c1c1 Bartos, Jánosi, Geophys. Res. Lett. (2005). |c 1 | it increases to the South-East c 0 != 0 significantly a i - asymmetric distribution
Short-term correlation more warming steps (N m ) then cooling (N h ) Bartos, Jánosi, Geophys. Res. Lett. (2005). the average cooling steps (S h ) are bigger then the average warming steps (S m ) Warming index: W = (N m S m ) / (N h S h ) Do these two effects compensate each other? asymmetric distribution Global warming (?)
Correlation properties Short-term correlationLong-term correlation TiTi C( ) = a i a i+ ~ - Long term memory: power decay
Long-term correlation Measurement: Detrended Fluctuation Analysis (DFA) F(n) ~ n = 2 (1 - ) C( ) = a i a i+ ~ - DFA curve: Initial gradient ( 0 ) Asymptotic gradient ( ) ~ long-term memory 0 ~ short term memory
Detrended Fluctuation Analysis (DFA) Király, Bartos, Jánosi, Tellus A (2006). All time series are long term correlated
Nonlinear correlation Linear (Gauss) process: C q>2 = f(C 2 ) (3rd or higher cumulants are 0) Two-point correlation: C 2 = a i a j , q-point correlation: C q = F(a i a j a k …) C 2 completely describes the process Nonlinear (multifractal) process: 3rd or higher cumulants are NOT 0 the 2-point correlation doesn’t give the full picture One needs to measure the nonlinear correlations for the full description
Nonlinear correlation The 2-point correlation of the volatility time series features the nonlinear correlation properties of the anomaly time series a i |a i+1 - a i | „volatility” time series: volatility - DFA exponent
Nonlinear correlation There is also short- and long-term memory for the volatility time series volatility - initial DFA exponent
In short… Daily temperature values are correlated in both short and long terms and both linearly and nonlinearly. We constructed the geographic distributions for these properties, and described or explained some of them in details. volatility - initial DFA exponent
Cumulants skewness kurtosis - nonuniform can affect the EVS
Extreme value statistics we want to use temperature time series temperature anomaly normalized anomaly
Extreme value statistics we try to get rid of the spatial correlation lets use one station in every 4x4 grid
Dangers in filtering for extreme value statistics after filtering out the flagged (bad) data: cutoff at 3.5 Daily normalized distribution seems exactly like a Weibull distribution Explanation: preliminary filtering of „outliers”
Then how can we filter out bad data?? Extreme value statistics There are certainly bad data in the series. The usual way to filter them out is to flag the suspicious ones, but it seems we cannot use the flags. One try to find real outliers: Temperature difference distribution Impossible to validate
Another possible way: try to isolate unreliable stations Extreme value statistics Now we use all the data without filtering spatial correlations Also notice the two peaks
New problem: the two peaks Extreme value statistics What makes the average maximum values differ for some stations? Why two peaks? skewnesskurtosiscorrelation dependsdoesn’t depend
New problem: the two peaks Extreme value statistics Average yearly maximum One can spatially separate the different peaks
Separate one peak by using US stations only: Extreme value statistics Finally we get to the Gumbel distribution
Degrees of Freedom Why does the average maximum value not depend on the correlation exponent? One can calculate the degrees of freedome of N variables with long time correlation characterized by correlation exponent DOF = N^2 / i ^2 Where i is the ith eigenvalue of the covariance matrix, containing the covariance of each pair of days of the year. Long term correlation: C(|x-y|) = c * |x-y|^ Short term correlation: T i+1 = A * T i + noise Variables determining the DOF: c, , A.
Degrees of Freedom – Dependence on correlation C = 1 C = 0.25 C = Short-term
Degrees of Freedom – measurement and calculation Estimation with with c=1 Measurement: Chi square method (underestimation)
Degrees of Freedom – difficulties c = 1 estimation: this causes the difference It is hard to measure anything due to the bad signal to noise rato To say something about c: correlation between consequtive years
Imre Bartos, Imre Jánosi Department of Physics of Complex Systems, Eötvös University Statistical analysis of global temperature and precipitation data