Data analyses 2008 Lecture 2 16-10-2008. Last Lecture Basic statistics Testing Linear regression parameters Skill.

Slides:



Advertisements
Similar presentations
FINANCIAL TIME-SERIES ECONOMETRICS SUN LIJIAN Feb 23,2001.
Advertisements

SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.
Autocorrelation Functions and ARIMA Modelling
Autoregressive Integrated Moving Average (ARIMA) models
Time Series Analysis Topics in Machine Learning Fall 2011 School of Electrical Engineering and Computer Science.
Dates for term tests Friday, February 07 Friday, March 07
Model Building For ARIMA time series
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Unit Roots & Forecasting
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.
Regression with Time-Series Data: Nonstationary Variables
Time Series Building 1. Model Identification
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by means of inference statistical methods.
STAT 497 APPLIED TIME SERIES ANALYSIS
Non-Seasonal Box-Jenkins Models
Stationary process NONSTATIONARY PROCESSES 1 In the last sequence, the process shown at the top was shown to be stationary. The expected value and variance.
Forecasting Purpose is to forecast, not to explain the historical pattern Models for forecasting may not make sense as a description for ”physical” behaviour.
ARIMA-models for non-stationary time series
NY Times 23 Sept time series of the day. Stat Sept 2008 D. R. Brillinger Chapter 4 - Fitting t.s. models in the time domain sample autocovariance.
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Modern methods The classical approach: MethodProsCons Time series regression Easy to implement Fairly easy to interpret Covariates may be added (normalization)
Non-Seasonal Box-Jenkins Models
Modern methods The classical approach: MethodProsCons Time series regression Easy to implement Fairly easy to interpret Covariates may be added (normalization)
Review of Probability.
BOX JENKINS METHODOLOGY
ARMA models Gloria González-Rivera University of California, Riverside
Module 2: Representing Process and Disturbance Dynamics Using Discrete Time Transfer Functions.
STAT 497 LECTURE NOTES 2.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Linear Stationary Processes. ARMA models. This lecture introduces the basic linear models for stationary processes. Considering only stationary processes.
TIME SERIES ANALYSIS Time Domain Models: Red Noise; AR and ARMA models LECTURE 7 Supplementary Readings: Wilks, chapters 8.
#1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Module 3: Introduction to Time Series Methods and Models.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.
K. Ensor, STAT Spring 2004 Memory characterization of a process How would the ACF behave for a process with no memory? What is a short memory series?
Time Series Basics Fin250f: Lecture 8.1 Spring 2010 Reading: Brooks, chapter
3.Analysis of asset price dynamics 3.1Introduction Price – continuous function yet sampled discretely (usually - equal spacing). Stochastic process – has.
MULTIVARIATE TIME SERIES & FORECASTING 1. 2 : autocovariance function of the individual time series.
Auto Regressive, Integrated, Moving Average Box-Jenkins models A stationary times series can be modelled on basis of the serial correlations in it. A non-stationary.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Discrete-time Random Signals
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Review and Summary Box-Jenkins models Stationary Time series AR(p), MA(q), ARMA(p,q)
The Box-Jenkins (ARIMA) Methodology
Geology 5600/6600 Signal Analysis 11 Sep 2015 © A.R. Lowry 2015 Last time: The Central Limit theorem : The sum of a sequence of random variables tends.
Introduction to stochastic processes
Time Series Analysis PART II. Econometric Forecasting Forecasting is an important part of econometric analysis, for some people probably the most important.
STAT 497 LECTURE NOTES 3 STATIONARY TIME SERIES PROCESSES
EC 827 Module 2 Forecasting a Single Variable from its own History.
1 Autocorrelation in Time Series data KNN Ch. 12 (pp )
Chapter 6 Random Processes
Analysis of Financial Data Spring 2012 Lecture 4: Time Series Models - 1 Priyantha Wijayatunga Department of Statistics, Umeå University
Stochastic Process - Introduction
Multiple Random Variables and Joint Distributions
Covariance, stationarity & some useful operators
Linear Regression.
SIGNALS PROCESSING AND ANALYSIS
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Statistics 153 Review - Sept 30, 2008
Model Building For ARIMA time series
Stochastic models - time series.
Machine Learning Week 4.
Stochastic models - time series.
Lecturer Dr. Veronika Alhanaqtah
CH2 Time series.
Basic descriptions of physical data
BOX JENKINS (ARIMA) METHODOLOGY
Presentation transcript:

Data analyses 2008 Lecture

Last Lecture Basic statistics Testing Linear regression parameters Skill

What is a time series/random data? Time series may be seen as randomly selected finite section of infinitely long sequences of random numbers (Storch and Zwiers) Time series is a stochastic random process Ordered as random samples X t A single time history is called a sample function or record Simplest example is white noise (random data no deterministic signal)

Stationary process Definition: all stochastic properties are independent of time StationaryNonstationary Random ErgodicNonergodic

Ensemble (collection of sample functions)  time lag

Stationarity (2) Mean: Summing all t 1 values and divide by ensemble size (first moment) and autocovariance:

Stationarity (3) If  (x)t 1 and C xx (t 1,t 1 +  ) vary as a function of time then: X t non stationary If  x (t 1 ) and C xx (t 1,t 1 +  ) do not vary as a function of time then: weakly stationary Note that autocovariance depends on time lag (  ) so:

Strongly stationary Definition:if an infinite collection of high-order moments (complete probability distribution function) are time invariant. Also called:stationary in the strict sense Often unknown for most analysing purposes weakly stationary sufficient

Ergodic So far we considered ensembles to calculate mean, autocovariance But if we consider the k th sample: If  x (k) and C xx (k) independent of k then time series ergodic

Ergodic (2) So we can write: (note that only a stationary process can be ergodic) Advantage only one single sample function needed In practice we have to assume ergodicity Sometimes tested by splitting up the data set in subsamples

Time series X t is the time series D t is the deterministic component N t is the stochastic component (noise) Purpose of time series analyses detect and describe the deterministic component

Time series (2) DtDt XtXt NtNt White noise Deterministic oscillation

Time series (3) DtDt XtXt NtNt White noise Quasi-oscillatory signal Dynamical component changed by the noise

Autocorrelation (1) (AC.f) Purpose: to see whether there are repetitions in a time series Each point in time can be compared with a previous point in time or any previous points in time and similarity can be studied Data available at regular time spacing!

Recalling lecture 1 Focus on mutual variability of pairs of properties Covariance:is the joint variation of two variables about their common mean Now: (because then t=t+  )

Autocorrelation (2) If  =0 C xx =s 2 Autocorrelation: Autocovariance: xx refers to same variable but can be replaced by x 1 and x 2 if two variables are considered Stationary processes:

Recalling lecture 1 Correlation coefficient (r) defined as: ratio of the covariance divided by the product of the standard deviations Scaled quantity: 1 is a perfect correlation 0 is no correlation -1 is a perfect inverse correlation Correlation coefficient analogue of autocorrelation

Autocorrelation (2b) Properties: Note that autocorrelation function is not unique many processes might have similar autocorrelation function Not invertible!

Autocorrelation (6) X X X

Autocorrelation (7) Most cases autocorrelation not analytical solved: Autocorrelation retrieved from a simple calculation: Basically: five summation over time domain

Cross-correlation (1) Two time series different variables Cross correlation:covariance divided by product of variances

Partial autocorrelation function An autocorrelation function that identifies the k lag magnitude of the autocorrelation between the t and t-k, controlling for all intervening autocorrelations From regression analysis: X 1 partially varies because of variation in X 2 and partially due to variation in X 3

Partial autocorrelation (2) 2 variable which is kept constant In time series analysis we find analogue: Partial autocorrelation used for assessing order of stochastic models

Autoregressive models and moving average models Stochastical models partly deterministic partly random Tools autocorrelation and partial autocorrelation

White Noise Non-stationary process because variance increases tµtµ

Random walk (1) Random walk example of autoregressive model Autoregressive model order 1 with  1 =1

Random walk (3) Distribution of air pollution; results from advection and random contribution Storch and Zwiers 1999

Basic idea time series analysis Autoregressive model:value at t depends on previous values t-i plus some random perturbation Moving average model:value at t depends on the random per- turbation of previous values t-i plus some random perturbation See if you can learn something from a data set which looks noisy

Basic formulation of Autoregressive integrated Moving average models ARIMA (p,d,q) includes autoregressive process integrated process moving average process If d=0 stationary If d=1 non stationary first to be transformed to stationarity (stationary mean and variance constant)

AR(1)-process AR(1) process ~ ARIMA (1,0,0) Markov process General formulation:  1 is the first-order autoregressive coefficient z t white noise (Xt to one side rest to other side)

AR(2)-process AR(2) ~ ARIMA(2,0,0) (dependent previous two time steps) ARIMA(p,0,0) now p is the order of the process

MA(1)-process Autoregressive models depend on previous observations Moving average models depend on innovations General formulation for a MA(1) model ARIMA(0,0,1) Z t is the innovation or shock  1 is the first order moving average coefficient

MA(2)-process ARIMA (0,0,2) So the current observation is a function of -mean, current innovation and two past innovations If ARIMA (0,0,q) then q is the order of the moving average process If ARIMA (p,0,q) then we have a autoregressive moving average model So the order learns us the number of previous observation of which the series is a significant function

(1) process different notation No need to bother about mean (it doesn’t influence the autocorrelation) Expression can be regarded as normalized version of a time series

Time series should be weakly stationary (constant mean and variance) Auto-regressive indicates that the process evolves by regressing past values towards the mean and then adding noise. AR process ~ discretized differential equation a 0,a 1,a 2 are constants Z t is external forcing AR Process as differential equation

Discretized version (backward in time) AR-process if z t is white noise AR process differential equation (2) Exercise: write AR(1) process as first order diff. equation

Red noise Autoregressive model p=1 0<  1 <1 Very common in climate research describes gradually changes  1st order diff. eq.

Blue Noise Autoregressive model with p=1  1 <0 Characteristic many sign changes Not very common in climate research, exception ice cores 

AR process unstable  1 >1 Unstable explosive growth rate of the variance  1 =1 Random walk

AR process mean and variance

AR process mean

Autocorrelation AR process

AR process autocorrelation (  ) Recalling the general form AR (1) process (  0 =0, p=1))

Variance of AR process

Variance (2)

AR(1) process autocorrelation Autocorrelation for different values of  1 Note that positive values of  1 equal negative values for even time lags

MA(1) processes RANDOM

Autocorrelation MA process Characteristic pattern sharp spikes up to and including the lag Consider MA(1) process (ARIMA (0,0,1)) Covariance: C(k) consider C(1):

Autocorrelation MA(1) process (2) Given the autocovariance we need the variance to obtain the autocorrelation Variance MA (1)

Autocorrelation MA(1) process (3) Autocovariance: Variance: Autocorrelation:

Autocorrelation MA(1) process (4) Autocovariance For lag 2: So the autocorrelation is 0! In other words the autocorrelation spikes at the lag of its order In this case 1 This implies a finite memory for the process, after the shock the Autocorrelation drops to zero

Summary identification ARIMA(1) ProcessAutocorrelationPartial autocorrelation White noise ARIMA(0,0,0)no spikesno spikes Random walkslow attenuationspike at order of differencing ARIMA(0,1,0)

Summary identification ARIMA(2) ProcessAutocorrelationPartial autocorrelation autoregressive ARIMA(1,0,0)  1 >0exp. decay1 pos. spike at lag 1 positive spikes ARIMA(1,0,0)  1 <0 oscillating decay1 neg. spike at lag 1 begins with neg. spike ARIMA(2,0,0)  1,  2 >0 exp. decay2 pos. spikes at lags 1 pos. spikesand 2 ARIMA(2,0,0)  1 0 oscillating exp.1 neg. spike at lag 1 decay1 pos. spike at lag 2

Summary identification ARIMA(3) ProcessAutocorrelationPartial autocorrelation Moving average processes ARIMA(0,0,1)  1 >01 neg. spike at lag 1exp. decay of neg. spikes ARIMA(0,0,1)  1 <0 1 pos. spike at lag 1 oscillating decay of pos. and neg. spikes ARIMA(0,0,2)  1,  2 >0 2 neg. spikes at lag 1exp. decay of neg. and lag 2 spikes ARIMA(0,0,2)  1,  2 <0 2 pos. spikes at lag 1oscillating decay of and lag 2 pos. and neg. spikes

Summary identification ARIMA(4) ProcessAutocorrelationPartial autocorrelation Mixed processes ARIMA(1,0,1)  1 >0,  1 >0exp. decay of pos.exp decay of pos.spikes ARIMA(1,0,1)  1 >0,  1 <0 exp. decay of pos.oscillating decay of spikespos. and neg. spikes ARIMA(1,0,1)  1 0 oscillating decayexp. decay of neg. spikes ARIMA(1,0,1)  1 <0  1 <0 oscillating decay ofoscillating decay of neg. and pos. spikes pos. and neg. spikes

Example the other way around Assume we measured the following; Can we describe this by a stochastic process?

Strategy for time series analysis -plotting the data -testing for stationairity -calculating autocorrelation and partial correlation -identifying the order (expertise and subjective) -recursive solution of the parameters -check whether residuals are white noise -further analysis (forecasting, extending the series)

Autocorrelation & partial correlation Only two lags non-zero One positive One negative Gradually decaying with oscillation superimposed Results from our time series

Recursive solution of AR- parameters via Yule Walker equations

Solution of  1 and  2 From autocorrelation we arrive at estimates of AR-parameters  1 =0.894 and  2 = (generated with 0.9 and -0.8) As a result noise variance and spectra correct

Parameter estimation Don’t bother too much just brute force least square fitting

What did we learn today? -special cases: -white, red and blue noise, random walk General concepts of: -backward differencing provides relation differential equation and arima process -autoregressive models order 1 and 2 -moving average models 1 and 2 -autocorrelation for these models -estimating the order -estimating the coefficients