Time Series: evaluating forecast accuracy

Time Series: evaluating forecast accuracy
DOCENTI Agostino Nuzzolo Antonio Comi

Bibliography Forecasting: principles and practice
by Rob J Hyndman (Author), George Athanasopoulos (Author)

The R Project for Statistical Computing

Serie temporali: previsione
Al tempo T prevediamo valori futuri assunti dalla variabile y dopo h future realizzazioni, sulla base dei valori y1,…,yT assunti fino a T

Forecasting with classical decomposition
Example of forecasting with classical decomposition The time series includes 8 successive periods “ Monday – Friday” of Bus travel time in 30 minute intervals of line 343 ( see following slides). We want forecast values between 14:15 of Wednesday and 22:45 of Friday in the last period and compare them with observed data. To forecast the decomposed time series, we separately forecast the seasonal component, , and the trend/cyclic component . The forecasted seasonal component is assumed equal to the seasonal component of last Monday-Friday period. The forecasted trend/cyclic component is assumed equal the decomposed value of 14:15 of Wednesday

Forecasting with classical decomposition Example – classical decomposition
Travel time to be forecasted Travel time line 343 from Ponte Mammolo to Conca d’Oro

Forecasting with classical decomposition Example: forecasted and observed values
time interval trend component seasonal component forecasts data observed 1171 2515,7 -49,6 2415,0 2802,0 1172 31,5 2496,6 2589,0 1173 62,6 2529,0 2446,0 1174 271,2 2738,0 2549,0 1175 482,9 2954,2 2643,0 1176 550,4 3028,7 2612,0 1177 592,6 3075,0 2572,0 1178 725,0 3212,3 2551,0 1179 532,0 3024,8 2433,0 1180 415,1 2911,8 2552,0 1181 281,8 2781,0 2453,0 1182 -45,4 2454,9 2257,0 1183 -434,0 2066,7 1845,0 1184 -559,1 1941,5 1843,0 1185 -656,5 1671,4 1730,0 1186 -738,6 1587,8 1617,0 1187 -797,7 1528,7 1575,0 1188 -858,8 1467,6 1531,0 1189 -938,4 1388,6 1712,0 1190 -846,3 1482,5 1696,0 1191 -483,5 1848,0 1905,0 1192 -183,8 2150,5 2272,0

Travel time line 343 from Ponte Mammolo to Conca d’Oro
Forecasting with classical decomposition Example – forecasted and observed values Travel time line 343 from Ponte Mammolo to Conca d’Oro

Evaluation of forecast accuracy
Forecast accuracy measures Training and test sets Cross-validation (non trattato)

Evaluation forecast accuracy [1/2] Forecast accuracy measures
Let yi denote the i-th observation and denote a forecast of yi. Scale-dependent errors. The forecast error is simply which is on the same scale as the data. Accuracy measures that are based on ei are therefore scale-dependent and cannot be used to make comparisons between series that are on different scales. The two most commonly used scale-dependent measures are based on the absolute errors or squared errors : When comparing forecast methods on a single data set, the MAE is popular as it is easy to understand and compute.

Percentage errors. The percentage error is given by Percentage errors have the advantage of being scale-independent, and so are frequently used to compare forecast performance between different data sets. The most commonly used measure is: Measures based on percentage errors have the disadvantage of being infinite or undefined if yi=0 for any i in the period of interest, and having extreme values when any yi is close to zero. Another problem with percentage errors that is often overlooked is that they assume a meaningful zero. For example, a percentage error makes no sense when measuring the accuracy of temperature forecasts on the Fahrenheit or Celsius scales

Evaluation forecast accuracy Example: forecast errors
Travel time line 343 from Ponte Mammolo to Conca d’Oro time interval of the day of analysis forecasted trend component seasonal component forecasts data observed ei ABS(ei) e2i pi ABS(pi) 1171 2515,7 -49,6 2466,1 2802,0 335,9 112798,6 12% 1172 31,5 2547,2 2589,0 41,8 1747,5 2% 1173 62,6 2578,4 2446,0 -132,4 132,4 17518,6 -5% 5% 1174 271,2 2786,9 2549,0 -237,9 237,9 56611,4 -9% 9% 1175 482,9 2998,7 2643,0 -355,7 355,7 126514,3 -13% 13% 1176 550,4 3066,2 2612,0 -454,2 454,2 206283,4 -17% 17% 1177 592,6 3108,4 2572,0 -536,4 536,4 287686,3 -21% 21% 1178 725,0 3240,7 2551,0 -689,7 689,7 475732,2 -27% 27% 1179 532,0 3047,8 2433,0 -614,8 614,8 377953,0 -25% 25% 1180 415,1 2930,8 2552,0 -378,8 378,8 143493,7 -15% 15% 1181 281,8 2797,5 2453,0 -344,5 344,5 118703,0 -14% 14% 1182 -45,4 2470,4 2257,0 -213,4 213,4 45521,6 1183 -434,0 2081,7 1845,0 -236,7 236,7 56035,9 1184 -559,1 1956,6 1843,0 -113,6 113,6 12911,2 -6% 6% 1185 -656,5 1859,3 1730,0 -129,3 129,3 16709,1 -7% 7% 1186 -738,6 1777,2 1617,0 -160,2 160,2 25656,4 -10% 10% 1187 -797,7 1718,0 1575,0 -143,0 143,0 20455,1 1188 -858,8 1656,9 1531,0 -125,9 125,9 15859,7 -8% 8% 1189 -938,4 1577,3 1712,0 134,7 18143,7 1190 -846,3 1669,4 1696,0 26,6 706,6 1191 -483,5 2032,3 1905,0 -127,3 127,3 16200,8 1192 -183,8 2332,0 2272,0 -60,0 60,0 3598,0 -3% 3% MAE 234,4 RMSE 284,8 MAPE 10%

Evaluation forecast accuracy Training and test sets
When choosing models, it is common to use a portion of the available data for fitting, and use the rest of the data for testing the model. Then the testing data can be used to measure how well the model is likely to forecast on new data. The size of the test set is typically about 20% of the total sample, although this value depends on how long the sample is and how far ahead you want to forecast. The size of the test set should ideally be at least as large as the maximum forecast horizon required.

Evaluation forecast accuracy Cross-validation
Non trattato

Residual diagnostics [1/5]
A residual in forecasting is the difference between an observed value and its forecast based on other observations: For time series forecasting, a residual is based on one-step forecasts; that is is the forecast of yt based on observations y1,…,yt−1.

A good forecasting method will yield residuals with the following properties: the residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts. the residuals have zero mean. If the residuals have a mean other than zero, then the forecasts are biased.

Any forecasting method that does not satisfy these properties can be improved. That does not mean that forecasting methods that satisfy these properties can not be improved. It is possible to have several forecasting methods for the same data set, all of which satisfy these properties. Checking these properties is important to see if a method is using all available information well, but it is not a good way for selecting a forecasting method.

If either of these two properties is not satisfied, then the forecasting method can be modified to give better forecasts. Adjusting for bias is easy: if the residuals have mean m, then simply add m to all forecasts and the bias problem is solved. Fixing the correlation problem is harder and it is not addressed here. In addition to these essential properties, it is useful (but not necessary) for the residuals to also have the following two properties. The residuals have constant variance. The residuals are normally distributed.

These two properties make the calculation of prediction intervals easier (see the next section for an example). However, a forecasting method that does not satisfy these properties cannot necessarily be improved. Sometimes applying a transformation such as a logarithm or a square root may assist with these properties, but otherwise there is usually little you can do to ensure your residuals have constant variance and have a normal distribution. Instead, an alternative approach to finding prediction intervals is necessary.

Residual diagnostics Example of residual diagnostic for previous forecast
Average ei s -61,1 278,1 rk 1 2 3 4 0,80 0,55 0,23 0,15

Residual diagnostics Example of residual diagnostic for previous forecast
In this example the error bias = - 61,1 seconds and the errors are strongly correlated Therefore, potentially a better forecasting method could be found

Prediction intervals It gives an interval within which we expect yi to lie with a specified probability. For example, assuming the forecast errors are uncorrelated and normally distributed, then a simple 95% prediction interval for the next observation in a time series is where is an estimate of the standard deviation of the forecast distribution. In forecasting, it is common to calculate 80% intervals and 95% intervals, although any percentage may be used. In the previous example, the prediction intervals are equal to +/- 545,1 sec., but consider that the errors are correlated.

Other forecast accuracy measures
APPENDICE A Other forecast accuracy measures

Evaluation forecast accuracy Forecast accuracy measures
They also have the disadvantage that they put a heavier penalty on negative errors than on positive errors. This observation led to the use of the so-called "symmetric" MAPE (sMAPE) It is defined by However, if yi is close to zero, is also likely to be close to zero. Thus, the measure still involves division by a number close to zero, making the calculation unstable. Also, the value of sMAPE can be negative, so it is not really a measure of “absolute percentage errors” at all.

Scaled errors Scaled errors were proposed as an alternative to using percentage errors when comparing forecast accuracy across series on different scales. They proposed scaling the errors based on the training MAE from a simple forecast method. For a non-seasonal time series, a useful way to define a scaled error uses naïve forecasts:

Forecast accuracy measures [2/4] Scaled errors
Because the numerator and denominator both involve values on the scale of the original data, qj is independent of the scale of the data. A scaled error is less than one if it arises from a better forecast than the average naïve forecast computed on the training data. Conversely, it is greater than one if the forecast is worse than the average naïve forecast computed on the training data. For seasonal time series, a scaled error can be defined using seasonal naïve forecasts:

Forecast accuracy measures [3/4] Scaled errors
For cross-sectional data, a scaled error can be defined as In this case, the comparison is with the mean forecast. (This doesn't work so well for time series data as there may be trends and other patterns in the data, making the mean a poor comparison. Hence, the naïve forecast is recommended when using time series data.)

Evaluation forecast accuracy [4/4] Scaled errors
Mean absolute scaled error is simply Similarly, the mean squared scaled error (MSSE) can be defined where the errors (on the training data and test data) are squared instead of using absolute values.

Time Series: evaluating forecast accuracy

Similar presentations

Presentation on theme: "Time Series: evaluating forecast accuracy"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time Series: evaluating forecast accuracy

Similar presentations

Presentation on theme: "Time Series: evaluating forecast accuracy"— Presentation transcript:

Similar presentations

About project

Feedback