Model selection, model weights, and forecasting for time series models

Name: Model selection, model weights, and forecasting for time series models
Uploaded: 2017-08-20T20:22:38+00:00
Duration: PTM16S52
Channel: Rolf Cody Simon
Description: Model selection, model weights, and forecasting for time series models

Model selection, model weights, and forecasting for time series models
Eric Ward FISH 507 – Applied Time Series Analysis 21 January 2015

Topics Week 3 Model Selection Prediction & evaluating forecasting

Model selection tools: how good is our model?
Several candidate models might be built based on (1) hypotheses / mechanisms (2) diagnostics / summaries of fit Models can be evaluated by their ability to explain data OR by the tradeoff in the ability to explain data, and ability to predict future data OR just in their predictive abilities Hindcasting Forecasting

Regression library(MARSS) dat(harborSealWA)
plot(harborSealWA[,1],harborSealWA[,3],xlab = "Time",ylab="log(Abundance)",main="San Juan Islands",cex=2,lwd=3) library(MARSS) dat(harborSealWA) plot(harborSealWA[,1],harborSealWA[,3],xlab = "Time",ylab="log(Abundance)", main="San Juan Islands",cex=2,lwd=3)

R-squared (linear regression)
For simple regression, square of Adjusted R-sq is adjusted for # predictors – does adding more pars improve results more than chance?

Cross validation Split data into test and training set.
Evaluate predictive accuracy on test set We can choose 1-(n-1) points as test set Ecological datasets often short, making cross validation difficult

Alternatives to cross validation
Information theoretic approaches (*IC) Widely used in ecology, fisheries, other fields Convenient to use: many available in R Examples: AIC (Akaike’s Information Criterion) AICc (Small sample AIC) QAIC (Quasi-liklihood AIC) BIC (Bayesian Information Criterion) SIC (Schwarz Information Criterion) = BIC Burnham & Anderson (2002), Ward (2008), Murtaugh (2009)

All *IC criteria based on deviance
Deviance = minus twice negative log likelihood Measure of model fit to data Lower values are better Maximizing likelihood = minimizing negative logLik

BUT we can’t just select a model based on fit
One objective of model selection is parsimony: choose a model that minimizes bias AND variance Image: cell.com

AIC and AICc probably most common *IC tools
k = number of parameters, 2k = penalty Converges to Leave-One-Out cross validation AICc (small sample AIC) n = sample size Burnham & Anderson (2002)

AIC is easy to extract for many models in R
lm, glm, arima models AIC() function in stats library, e.g. AIC(mod) extractAIC() also in stats Be aware: sometimes they may give different answers (depends on scaling constants) deviance() and logLik() also useful What about AICc? Calculate it by hand OR Use AICc() function in AICcmodavg library

Alternatives to AIC AIC aims to find the best model to predict data generated from the same process that generated your observations A downside: AIC has a tendency to overpenalize, especially for more complex models Equivalent to significance test w/alpha = 0.16 Instead, we can use BIC Alpha becomes a function of sample size

BIC (or SIC) Not at all Bayesian
Laplace approximation to posterior, assumes normality Measure of explanatory power (rather than balancing explanation / prediction) Penalty a function of sample size and parameters Tendency of BIC to underpenalize

Philosophical differences
AIC / AICc try to choose a model that approximates reality, but does not assume that reality exists in your set of candidate models BIC assumes that one of your models is truth This model will tend to be favored more as sample size increases

Example 1: breakpoints Q: Does this data support a breakpoint in the mean flow?

Model 1: single mean (intercept) model
mod1 = lm(log(Nile) ~ 1) AIC(mod1) = Model 2: break – point model b = c(rep(0,43),rep(1,57)) mod2 = lm(log(Nile) ~ b) AIC(mod2) =

Example 2: breakpoint in intercept and trend
Q: How would we write the equation for this model in lm using our indicator 'b'? b = c(rep(0,43),rep(1,57))

Model 3: break – point in trend and intercept
b = c(rep(0,43),rep(1,57)) mod3 = lm(log(Nile) ~ b + b*seq(1,100)) AIC(mod3) =

Model weights We often have a large number (> 2) models that are plausible AIC / BIC values may be very similar How to consider model uncertainty / model weights

AIC model weights (Burnham and Anderson 2002)
1. Calculate delta-AIC values from each model relative to the best 2. Normalize these delta values into weights 3. Present ~ top 90% of weights in table Model AIC Delta-AIC 1 3.4 2.4 2 2.3 1.3 3

Example 2: model weights & arima()
Fitted arima() models return objects with $aic Fit ARMA models to gray whale time series library(MARSS) whales = log(graywhales[-c(1:17),2]) # interpolate because of missing values set.seed(1) whales[29:32] = predict.lm( lm(whales~x),newdata= list(x=12:15))+rnorm(4,0,0.1)) plot(whales)

We’ll compare 4 models (drift / trend estimated in all 4)
ARIMA(0, 0, 0) ARIMA(0, 0, 1) MA(1) model ARIMA(1, 0, 0) AR(1) model ARIMA(1, 0, 1) AR(1), MA(1) model

4 candidate models Fit with Arima(order=c(p,d=0,q),include.drift=TRUE)
Use $aic for fitted model object It’s common to see people present delta-AIC values relative to the best model instead of raw AIC values Note: even though weights sum to 1, they’re *not* probabilities AR order MA order AIC delta AIC weight 1 -28.50 0.47 -28.11 0.39 -25.35 3.14 0.10 -23.99 4.51 0.05

AIC weights in R akaike.weights() in qpcR library
You pass in vector of AIC values, it calculates weights aictab() in AICcmodavg library You pass in list of models, it returns table of AIC values, delta-AIC values, and weights dredge() in MuMIn library Exhaustive model selection, with model (and variable weights) included

Bayesian model selection
Largely beyond the scope of this class BUT Deviance Information Criterion (DIC) described as analog of AIC Easily available in the Bayesian model we’ll fit DIC is returned in the jags() function in R2jags Bayes factors (what BIC approximates) Can be very difficult to calculate for complex models Predictive model selection

Topics Week 3 Model Selection Prediction & evaluating forecasting

Forecasting with arima()
Let’s fit an ARMA(1,1) model to the global temperature data, after 1st differencing to remove trend You can use the arima() function or Arima() function – Arima() is a wrapper for arima() # for simplicity, we won’t include a separate ARMA model for seasonality ar.global.1 = Arima(Global, order = c(1,1,1),seasonal=list(order=c(0,0,0),period=12)) f1 = forecast(ar.global.1, h = 10) h = number of time steps to forecast in future

What does f1 contain?

plot fitted arima() object
In our arima() fit, we used all the data, and forecast 10 steps forward – so we don’t have remaining data to evaluate the predictions

Quantifying forecast performance
One of the most common measures is mean square error, MSE

Bias – Variance tradeoff
Principle of model parsimony Can be rewritten as Smaller MSE = lower bias + variance

MSE and other criterion can be calculated over longer time period
If our forecast is over n years, the MSE for that period can be represented as Do you care about the final outcome, or the entire path to get there?

Variants of MSE Root mean square error, RMSE (quadratic score)
on the same scale as the data also referred to as RMSD, root mean sq deviation Mean absolute error, MAE (linear score) Median absolute error, MdAE

Scale independent measures
Better when applying statistics of model(s) to multiple datasets MSE or RMSE summed across these datasets will be driven by the time series that is larger in magnitude

Percent Error Statistics
Mean Absolute Percent Error (MAPE): Root Mean Square Percent Error (RMSPE):

Issues with percent error statistics
What happens when Y = 0? Distribution of percent errors tends to be highly skewed / long tails MAPE tends to put higher penalty on positive errors See Hyndman & Koehler (2006)

Scaled error statistics
Define scaled error as Absolute scaled error (ASE) is Mean absolute scaled error (MASE) is Hyndman & Koehler (2006) notes: (1) denominator is MAE from random walk model, so performance gauged relative to rw model; (2) no missing data note: expectation can be taken over datasets, over time, or both

Interpreting ASE & MASE
All values are relative to the naïve random walk model Values < 1 indicate better performance than RW model Values > 1 indicate worse performance than RW model

Implementing in R Fit an ARIMA model to ‘airmiles’
# fit an arima model to the airmiles data, holding out 3 data points n = length(airmiles) air.model = auto.arima(log(airmiles[1:(n-3)]))

Forecast the model 3 steps ahead
# forecast 3 steps ahead air.forecast = forecast(air.model, h = 3) plot(air.forecast)

Use holdout or “test” data to evaluate accuracy
Use of accuracy() # evaluate RMSE / MASE statistics for 3 holdouts accuracy(air.forecast, log(airmiles[(n-2):n]), test = 3) ME RMSE MAE MPE MAPE MASE # evaluate RMSE / MASE statistics for only last holdout accuracy(air.forecast, log(airmiles[(n-2):n]), test = 1)

Ecological examples (Ward et al. 2014)

Summary Raw statistics (e.g. MSE, RMSE) shouldn’t be applied for data of different scale Percent error metrics (e.g. MAPE) may be skewed & undefined for real zeroes Scaled error metrics (ASE, MASE) have been shown to be more robust meta-analyses of many datasets Hyndman & Koehler (2006)

Model selection, model weights, and forecasting for time series models

Similar presentations

Presentation on theme: "Model selection, model weights, and forecasting for time series models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model selection, model weights, and forecasting for time series models

Similar presentations

Presentation on theme: "Model selection, model weights, and forecasting for time series models"— Presentation transcript:

Similar presentations

About project

Feedback