Download presentation
Published byRolf Cody Simon Modified over 8 years ago
1
Model selection, model weights, and forecasting for time series models
Eric Ward FISH 507 – Applied Time Series Analysis 21 January 2015
2
Topics Week 3 Model Selection Prediction & evaluating forecasting
3
Model selection tools: how good is our model?
Several candidate models might be built based on (1) hypotheses / mechanisms (2) diagnostics / summaries of fit Models can be evaluated by their ability to explain data OR by the tradeoff in the ability to explain data, and ability to predict future data OR just in their predictive abilities Hindcasting Forecasting
4
Regression library(MARSS) dat(harborSealWA)
plot(harborSealWA[,1],harborSealWA[,3],xlab = "Time",ylab="log(Abundance)",main="San Juan Islands",cex=2,lwd=3) library(MARSS) dat(harborSealWA) plot(harborSealWA[,1],harborSealWA[,3],xlab = "Time",ylab="log(Abundance)", main="San Juan Islands",cex=2,lwd=3)
5
R-squared (linear regression)
For simple regression, square of Adjusted R-sq is adjusted for # predictors – does adding more pars improve results more than chance?
6
Cross validation Split data into test and training set.
Evaluate predictive accuracy on test set We can choose 1-(n-1) points as test set Ecological datasets often short, making cross validation difficult
7
Alternatives to cross validation
Information theoretic approaches (*IC) Widely used in ecology, fisheries, other fields Convenient to use: many available in R Examples: AIC (Akaike’s Information Criterion) AICc (Small sample AIC) QAIC (Quasi-liklihood AIC) BIC (Bayesian Information Criterion) SIC (Schwarz Information Criterion) = BIC Burnham & Anderson (2002), Ward (2008), Murtaugh (2009)
8
All *IC criteria based on deviance
Deviance = minus twice negative log likelihood Measure of model fit to data Lower values are better Maximizing likelihood = minimizing negative logLik
9
BUT we can’t just select a model based on fit
One objective of model selection is parsimony: choose a model that minimizes bias AND variance Image: cell.com
10
AIC and AICc probably most common *IC tools
k = number of parameters, 2k = penalty Converges to Leave-One-Out cross validation AICc (small sample AIC) n = sample size Burnham & Anderson (2002)
11
AIC is easy to extract for many models in R
lm, glm, arima models AIC() function in stats library, e.g. AIC(mod) extractAIC() also in stats Be aware: sometimes they may give different answers (depends on scaling constants) deviance() and logLik() also useful What about AICc? Calculate it by hand OR Use AICc() function in AICcmodavg library
12
Alternatives to AIC AIC aims to find the best model to predict data generated from the same process that generated your observations A downside: AIC has a tendency to overpenalize, especially for more complex models Equivalent to significance test w/alpha = 0.16 Instead, we can use BIC Alpha becomes a function of sample size
13
BIC (or SIC) Not at all Bayesian
Laplace approximation to posterior, assumes normality Measure of explanatory power (rather than balancing explanation / prediction) Penalty a function of sample size and parameters Tendency of BIC to underpenalize
14
Philosophical differences
AIC / AICc try to choose a model that approximates reality, but does not assume that reality exists in your set of candidate models BIC assumes that one of your models is truth This model will tend to be favored more as sample size increases
15
Example 1: breakpoints Q: Does this data support a breakpoint in the mean flow?
16
Model 1: single mean (intercept) model
mod1 = lm(log(Nile) ~ 1) AIC(mod1) = Model 2: break – point model b = c(rep(0,43),rep(1,57)) mod2 = lm(log(Nile) ~ b) AIC(mod2) =
17
Example 2: breakpoint in intercept and trend
Q: How would we write the equation for this model in lm using our indicator 'b'? b = c(rep(0,43),rep(1,57))
18
Model 3: break – point in trend and intercept
b = c(rep(0,43),rep(1,57)) mod3 = lm(log(Nile) ~ b + b*seq(1,100)) AIC(mod3) =
19
Model weights We often have a large number (> 2) models that are plausible AIC / BIC values may be very similar How to consider model uncertainty / model weights
20
AIC model weights (Burnham and Anderson 2002)
1. Calculate delta-AIC values from each model relative to the best 2. Normalize these delta values into weights 3. Present ~ top 90% of weights in table Model AIC Delta-AIC 1 3.4 2.4 2 2.3 1.3 3
21
Example 2: model weights & arima()
Fitted arima() models return objects with $aic Fit ARMA models to gray whale time series library(MARSS) whales = log(graywhales[-c(1:17),2]) # interpolate because of missing values set.seed(1) whales[29:32] = predict.lm( lm(whales~x),newdata= list(x=12:15))+rnorm(4,0,0.1)) plot(whales)
22
We’ll compare 4 models (drift / trend estimated in all 4)
ARIMA(0, 0, 0) ARIMA(0, 0, 1) MA(1) model ARIMA(1, 0, 0) AR(1) model ARIMA(1, 0, 1) AR(1), MA(1) model
23
4 candidate models Fit with Arima(order=c(p,d=0,q),include.drift=TRUE)
Use $aic for fitted model object It’s common to see people present delta-AIC values relative to the best model instead of raw AIC values Note: even though weights sum to 1, they’re *not* probabilities AR order MA order AIC delta AIC weight 1 -28.50 0.47 -28.11 0.39 -25.35 3.14 0.10 -23.99 4.51 0.05
24
AIC weights in R akaike.weights() in qpcR library
You pass in vector of AIC values, it calculates weights aictab() in AICcmodavg library You pass in list of models, it returns table of AIC values, delta-AIC values, and weights dredge() in MuMIn library Exhaustive model selection, with model (and variable weights) included
25
Bayesian model selection
Largely beyond the scope of this class BUT Deviance Information Criterion (DIC) described as analog of AIC Easily available in the Bayesian model we’ll fit DIC is returned in the jags() function in R2jags Bayes factors (what BIC approximates) Can be very difficult to calculate for complex models Predictive model selection
26
Topics Week 3 Model Selection Prediction & evaluating forecasting
27
Forecasting with arima()
Let’s fit an ARMA(1,1) model to the global temperature data, after 1st differencing to remove trend You can use the arima() function or Arima() function – Arima() is a wrapper for arima() # for simplicity, we won’t include a separate ARMA model for seasonality ar.global.1 = Arima(Global, order = c(1,1,1),seasonal=list(order=c(0,0,0),period=12)) f1 = forecast(ar.global.1, h = 10) h = number of time steps to forecast in future
28
What does f1 contain?
29
plot fitted arima() object
In our arima() fit, we used all the data, and forecast 10 steps forward – so we don’t have remaining data to evaluate the predictions
30
Quantifying forecast performance
One of the most common measures is mean square error, MSE
31
Bias – Variance tradeoff
Principle of model parsimony Can be rewritten as Smaller MSE = lower bias + variance
32
MSE and other criterion can be calculated over longer time period
If our forecast is over n years, the MSE for that period can be represented as Do you care about the final outcome, or the entire path to get there?
33
Variants of MSE Root mean square error, RMSE (quadratic score)
on the same scale as the data also referred to as RMSD, root mean sq deviation Mean absolute error, MAE (linear score) Median absolute error, MdAE
34
Scale independent measures
Better when applying statistics of model(s) to multiple datasets MSE or RMSE summed across these datasets will be driven by the time series that is larger in magnitude
36
Percent Error Statistics
Mean Absolute Percent Error (MAPE): Root Mean Square Percent Error (RMSPE):
37
Issues with percent error statistics
What happens when Y = 0? Distribution of percent errors tends to be highly skewed / long tails MAPE tends to put higher penalty on positive errors See Hyndman & Koehler (2006)
38
Scaled error statistics
Define scaled error as Absolute scaled error (ASE) is Mean absolute scaled error (MASE) is Hyndman & Koehler (2006) notes: (1) denominator is MAE from random walk model, so performance gauged relative to rw model; (2) no missing data note: expectation can be taken over datasets, over time, or both
39
Interpreting ASE & MASE
All values are relative to the naïve random walk model Values < 1 indicate better performance than RW model Values > 1 indicate worse performance than RW model
40
Implementing in R Fit an ARIMA model to ‘airmiles’
# fit an arima model to the airmiles data, holding out 3 data points n = length(airmiles) air.model = auto.arima(log(airmiles[1:(n-3)]))
41
Forecast the model 3 steps ahead
# forecast 3 steps ahead air.forecast = forecast(air.model, h = 3) plot(air.forecast)
42
Use holdout or “test” data to evaluate accuracy
Use of accuracy() # evaluate RMSE / MASE statistics for 3 holdouts accuracy(air.forecast, log(airmiles[(n-2):n]), test = 3) ME RMSE MAE MPE MAPE MASE # evaluate RMSE / MASE statistics for only last holdout accuracy(air.forecast, log(airmiles[(n-2):n]), test = 1)
43
Ecological examples (Ward et al. 2014)
44
Summary Raw statistics (e.g. MSE, RMSE) shouldn’t be applied for data of different scale Percent error metrics (e.g. MAPE) may be skewed & undefined for real zeroes Scaled error metrics (ASE, MASE) have been shown to be more robust meta-analyses of many datasets Hyndman & Koehler (2006)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.