Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill.

Data analyses 2008 Lecture 2 16-10-2008

Last Lecture Basic statistics Testing Linear regression parameters Skill

What is a time series/random data? Time series may be seen as randomly selected finite section of infinitely long sequences of random numbers (Storch and Zwiers) Time series is a stochastic random process Ordered as random samples X t A single time history is called a sample function or record Simplest example is white noise (random data no deterministic signal)

Stationary process Definition: all stochastic properties are independent of time StationaryNonstationary Random ErgodicNonergodic

Ensemble (collection of sample functions)  time lag

Stationarity (2) Mean: Summing all t 1 values and divide by ensemble size (first moment) and autocovariance:

Stationarity (3) If  (x)t 1 and C xx (t 1,t 1 +  ) vary as a function of time then: X t non stationary If  x (t 1 ) and C xx (t 1,t 1 +  ) do not vary as a function of time then: weakly stationary Note that autocovariance depends on time lag (  ) so:

Strongly stationary Definition:if an infinite collection of high-order moments (complete probability distribution function) are time invariant. Also called:stationary in the strict sense Often unknown for most analysing purposes weakly stationary sufficient

Ergodic So far we considered ensembles to calculate mean, autocovariance But if we consider the k th sample: If  x (k) and C xx (k) independent of k then time series ergodic

Ergodic (2) So we can write: (note that only a stationary process can be ergodic) Advantage only one single sample function needed In practice we have to assume ergodicity Sometimes tested by splitting up the data set in subsamples

Time series X t is the time series D t is the deterministic component N t is the stochastic component (noise) Purpose of time series analyses detect and describe the deterministic component

Time series (2) DtDt XtXt NtNt White noise Deterministic oscillation

Time series (3) DtDt XtXt NtNt White noise Quasi-oscillatory signal Dynamical component changed by the noise

Autocorrelation (1) (AC.f) Purpose: to see whether there are repetitions in a time series Each point in time can be compared with a previous point in time or any previous points in time and similarity can be studied Data available at regular time spacing!

Recalling lecture 1 Focus on mutual variability of pairs of properties Covariance:is the joint variation of two variables about their common mean Now: (because then t=t+  )

Autocorrelation (2) If  =0 C xx =s 2 Autocorrelation: Autocovariance: xx refers to same variable but can be replaced by x 1 and x 2 if two variables are considered Stationary processes:

Recalling lecture 1 Correlation coefficient (r) defined as: ratio of the covariance divided by the product of the standard deviations Scaled quantity: 1 is a perfect correlation 0 is no correlation -1 is a perfect inverse correlation Correlation coefficient analogue of autocorrelation

Autocorrelation (2b) Properties: Note that autocorrelation function is not unique many processes might have similar autocorrelation function Not invertible!

Autocorrelation (6) X X X

Autocorrelation (7) Most cases autocorrelation not analytical solved: Autocorrelation retrieved from a simple calculation: Basically: five summation over time domain

Cross-correlation (1) Two time series different variables Cross correlation:covariance divided by product of variances

Partial autocorrelation function An autocorrelation function that identifies the k lag magnitude of the autocorrelation between the t and t-k, controlling for all intervening autocorrelations From regression analysis: X 1 partially varies because of variation in X 2 and partially due to variation in X 3

Partial autocorrelation (2) 2 variable which is kept constant In time series analysis we find analogue: Partial autocorrelation used for assessing order of stochastic models

Autoregressive models and moving average models Stochastical models partly deterministic partly random Tools autocorrelation and partial autocorrelation

White Noise Non-stationary process because variance increases tµtµ

Random walk (1) Random walk example of autoregressive model Autoregressive model order 1 with  1 =1

Random walk (3) Distribution of air pollution; results from advection and random contribution Storch and Zwiers 1999

Basic idea time series analysis Autoregressive model:value at t depends on previous values t-i plus some random perturbation Moving average model:value at t depends on the random perturbation of previous values t-i plus some random perturbation See if you can learn something from a data set which looks noisy

Basic formulation of Autoregressive integrated Moving average models ARIMA (p,d,q) includes autoregressive process integrated process moving average process If d=0 stationary If d=1 non stationary first to be transformed to stationarity (stationary mean and variance constant)

AR(1)-process AR(1) process ~ ARIMA (1,0,0) Markov process General formulation:  1 is the first-order autoregressive coefficient z t white noise (Xt to one side rest to other side)

AR(2)-process AR(2) ~ ARIMA(2,0,0) (dependent previous two time steps) ARIMA(p,0,0) now p is the order of the process

MA(1)-process Autoregressive models depend on previous observations Moving average models depend on innovations General formulation for a MA(1) model ARIMA(0,0,1) Z t is the innovation or shock  1 is the first order moving average coefficient

MA(2)-process ARIMA (0,0,2) So the current observation is a function of -mean, current innovation and two past innovations If ARIMA (0,0,q) then q is the order of the moving average process If ARIMA (p,0,q) then we have a autoregressive moving average model So the order learns us the number of previous observation of which the series is a significant function

(1) process different notation No need to bother about mean (it doesn’t influence the autocorrelation) Expression can be regarded as normalized version of a time series

Time series should be weakly stationary (constant mean and variance) Auto-regressive indicates that the process evolves by regressing past values towards the mean and then adding noise. AR process ~ discretized differential equation a 0,a 1,a 2 are constants Z t is external forcing AR Process as differential equation

Discretized version (backward in time) AR-process if z t is white noise AR process differential equation (2) Exercise: write AR(1) process as first order diff. equation

Red noise Autoregressive model p=1 0<  1 <1 Very common in climate research describes gradually changes  1st order diff. eq.

Blue Noise Autoregressive model with p=1  1 <0 Characteristic many sign changes Not very common in climate research, exception ice cores 

AR process unstable  1 >1 Unstable explosive growth rate of the variance  1 =1 Random walk

AR process mean and variance

AR process mean

Autocorrelation AR process

AR process autocorrelation (  ) Recalling the general form AR (1) process (  0 =0, p=1))

Variance of AR process

Variance (2)

AR(1) process autocorrelation Autocorrelation for different values of  1 Note that positive values of  1 equal negative values for even time lags

MA(1) processes RANDOM

Autocorrelation MA process Characteristic pattern sharp spikes up to and including the lag Consider MA(1) process (ARIMA (0,0,1)) Covariance: C(k) consider C(1):

Autocorrelation MA(1) process (2) Given the autocovariance we need the variance to obtain the autocorrelation Variance MA (1)

Autocorrelation MA(1) process (3) Autocovariance: Variance: Autocorrelation:

Autocorrelation MA(1) process (4) Autocovariance For lag 2: So the autocorrelation is 0! In other words the autocorrelation spikes at the lag of its order In this case 1 This implies a finite memory for the process, after the shock the Autocorrelation drops to zero

Summary identification ARIMA(1) ProcessAutocorrelationPartial autocorrelation White noise ARIMA(0,0,0)no spikesno spikes Random walkslow attenuationspike at order of differencing ARIMA(0,1,0)

Summary identification ARIMA(2) ProcessAutocorrelationPartial autocorrelation autoregressive ARIMA(1,0,0)  1 >0exp. decay1 pos. spike at lag 1 positive spikes ARIMA(1,0,0)  1 <0 oscillating decay1 neg. spike at lag 1 begins with neg. spike ARIMA(2,0,0)  1,  2 >0 exp. decay2 pos. spikes at lags 1 pos. spikesand 2 ARIMA(2,0,0)  1 0 oscillating exp.1 neg. spike at lag 1 decay1 pos. spike at lag 2

Summary identification ARIMA(3) ProcessAutocorrelationPartial autocorrelation Moving average processes ARIMA(0,0,1)  1 >01 neg. spike at lag 1exp. decay of neg. spikes ARIMA(0,0,1)  1 <0 1 pos. spike at lag 1 oscillating decay of pos. and neg. spikes ARIMA(0,0,2)  1,  2 >0 2 neg. spikes at lag 1exp. decay of neg. and lag 2 spikes ARIMA(0,0,2)  1,  2 <0 2 pos. spikes at lag 1oscillating decay of and lag 2 pos. and neg. spikes

Summary identification ARIMA(4) ProcessAutocorrelationPartial autocorrelation Mixed processes ARIMA(1,0,1)  1 >0,  1 >0exp. decay of pos.exp decay of pos.spikes ARIMA(1,0,1)  1 >0,  1 <0 exp. decay of pos.oscillating decay of spikespos. and neg. spikes ARIMA(1,0,1)  1 0 oscillating decayexp. decay of neg. spikes ARIMA(1,0,1)  1 <0  1 <0 oscillating decay ofoscillating decay of neg. and pos. spikes pos. and neg. spikes

Example the other way around Assume we measured the following; Can we describe this by a stochastic process?

Strategy for time series analysis -plotting the data -testing for stationairity -calculating autocorrelation and partial correlation -identifying the order (expertise and subjective) -recursive solution of the parameters -check whether residuals are white noise -further analysis (forecasting, extending the series)

Autocorrelation & partial correlation Only two lags non-zero One positive One negative Gradually decaying with oscillation superimposed Results from our time series

Recursive solution of AR- parameters via Yule Walker equations

Solution of  1 and  2 From autocorrelation we arrive at estimates of AR-parameters  1 =0.894 and  2 =-0.841 (generated with 0.9 and -0.8) As a result noise variance and spectra correct

Parameter estimation Don’t bother too much just brute force least square fitting

What did we learn today? -special cases: -white, red and blue noise, random walk General concepts of: -backward differencing provides relation differential equation and arima process -autoregressive models order 1 and 2 -moving average models 1 and 2 -autocorrelation for these models -estimating the order -estimating the coefficients

Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill.

Similar presentations

Presentation on theme: "Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill.

Similar presentations

Presentation on theme: "Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill."— Presentation transcript:

Similar presentations

About project

Feedback