Download presentation
1
John G. Zhang, Ph.D. Harper College jzhang@harpercollege.edu
Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang, Ph.D. Harper College
2
Topics Why forecast How to forecast Why ARIMA What is ARIMA
How to ARIMA How ARIMA did Discussion 47th AIR Annual Forum
3
Why Forecast Queries and Reports: what was Dashboard: what is
Forecasts: what will be Forecast for enrollment: more valuable for resources planning 47th AIR Annual Forum
4
How to forecast Naïve forecast: random walk, moving average
Exponential smoothing Markov chain Regression ARIMA Others Combining methods 47th AIR Annual Forum
5
Why ARIMA Naïve forecast: best guess if no patterns
Exponential Smoothing: usually designed for one-step ahead forecast Markov chain: see reference Regression: frequently violates the assumption of uncorrelated errors ARIMA: worked well, more later Others: see reference Combining Methods: non-directional 47th AIR Annual Forum
6
What is ARIMA AutoRegressive Integrated Moving Average
Generally, the model is given by 47th AIR Annual Forum
7
where Xt is a time series value at time t, 0 is a constant,
B is a backshift or lag operator, i is a number of lags or spans, is an error term at time t, and θ are AR and MA parameters, and p, d, and q are the orders of AR, I, MA 47th AIR Annual Forum
8
If p = 1, 1 = 1, d = 0, θ1= 0, random walk: (1 - B)(Xt – θ0) = t
if p = 1, d = 0, q = 1, ARMA(1, 1): (1 - 1B)(Xt – θ0) = (1 - θ1B) t If p = 1, d = 0, θ1 = 0, AR(1) model: (1 - 1B)(Xt – θ0) = t If p = 1, 1 = 1, d = 0, θ1= 0, random walk: (1 - B)(Xt – θ0) = t If 1 = 0, d = 0, θ1 = 0, constant: (Xt – θ0) = t 47th AIR Annual Forum
9
How to ARIMA Box and Jenkins (1976) notation: (p d q)(p d q)s
Four stages: Identification Estimation Validation Forecasting 47th AIR Annual Forum
10
How to ARIMA SPSS Trends module: version 12 worked well
version 13 and 14: algorithms changed same data, same program, different forecast SAS ETS module: ARIMA procedure more flexible forecast consistant automation possible thanks to macros 47th AIR Annual Forum
11
Identification Series Plot Autocorrelation plot
Dickey-Fuller test of unit root hypothesis AR models to compare the log likelihood values for a series and its transformed series 47th AIR Annual Forum
12
Identification Degree of differencing Order of AR Order of MA
Seasonality if any 47th AIR Annual Forum
13
Estimation Q statistics Goodness-of-fit criteria: variance estimate
Akaike information criterion Schwartz Bayesian criterion Significance of parameters Residuals analysis Mean Absolute Percent Error 47th AIR Annual Forum
14
Data Time series data Date variable: year, quarter, month, week, day, hour, minute, second Enrollment data: FTE, headcount, seatcount Data points Nature of the series determines the forecast 47th AIR Annual Forum
15
Patterns of Data Trend: steady increase or decrease in the values of a times series Cycle: long-term patterns of rising and falling data Seasonality: regular change in the data values that occurs at the same time in a given period 47th AIR Annual Forum
16
FTE 47th AIR Annual Forum
17
FTE Pattern Trendy: FTE increasing from 1998 to 2006, suggesting non-stationary and differencing necessary Seasonal: higher in the Fall and Spring and lower in the Summer each and every year, implying a seasonal factor present as part of the model building process 47th AIR Annual Forum
18
Autocorrelations and Partial Autocorrelations (ACF and PACF)
Lag Correlation | |********************| | |************* | | |****** | | *| | | ********| | | *********| | | *********| | | ********| | | ********| | | *| | | |***** | | |*********** | | |***************** | | |*********** | | |***** | | *| | | *******| | | ********| | | ********| | | *******| | | *******| | | *| | | |**** | | |********* | PACF Lag Correlation | |************* | | ****| | | ******| | | *******| | | |**** | | *****| | | ******| | | *********| | | |*************** | | |* | | |***** | | |**** | | *****| | | |*** | | |** | | |*** | | ****| | | |*** | | *| | | |** | | ****| | | |** | | *| | 47th AIR Annual Forum
19
Q Statistics Autocorrelation Check of Residuals To Chi Pr > Lag Square DF ChiSq Autocorrelations < < < < Q Statistics show autocorrelations among various lags highly statistically significant Autocorrelations were very high Further actions needed 47th AIR Annual Forum
20
FTE Forecast 47th AIR Annual Forum
21
How ARIMA Did Accuracy: what matters most
2-period ahead: 0.74% (FTE) 0.50% (HC) 6-period ahead: 1.43% (FTE) 1.65% (HC) 10-period ahead: 1.40% (FTE) 2.52%(HC) Forecast error bigger into distant future Eleanor S. Fox (2005) 1.2% (4) 4.1% (8) NCES (2003) 1.9% (2) 3.6% (6) 47th AIR Annual Forum
22
Discussion Theoretically factors includable along with the time series itself like in regression Unemployment rate Consumer Price Index (CPI) High school student population District population Tuition Forecasts used for forecasting? 47th AIR Annual Forum
23
Discussion Stationarity and homogeneity Scarcity and spuriousness
Seasonality and outliers Raw or cooked data Data mining and stepwise Fit and accuracy Additive or multiplicative (subset/factored) 47th AIR Annual Forum
24
Discussion Science and art Objective and Subjective
Quantitative and qualitative Over-differencing and over-fitting Parsimony and uncertainty Simple or complex 47th AIR Annual Forum
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.