Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively assumed to be independent and identically distributed. Is this wise?
Performing e.g. the Durbin-Watson test we may quite easily answer the question whether they are or not. What if D-W gives evidence of serial correlation in the error terms? Apply an AR(p) model to the error terms at the same time as the rest of the model is fitted. Standard procedure: Study the residuals from an ordinary regression fit Identify which order p of the AR-model that may be the most appropriate for the error terms. Make the fit of the combined regression-AR-model
Estimation can no longer be done using ordinary least-squares. Instead the conditional least-squares method is used. Procedures are not curretly available in Minitab, but in more comprehensive computer packages such as SAS and SPSS. Example Consider again the Hjälmaren month data set (that is used in assignments for weeks 36, 39 and 41)
Minitab output from an ordinary time series regression: The regression equation is Discharge.m = Time.m Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Predictor Coef SE Coef T P Constant Time.m Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov S = R-Sq = 18.8% R-Sq(adj) = 18.1%
Residual plots Residuals seem to follow an AR- model with order 1 or 2
SPSS output of a regression analysis with error term modelled as AR(1) FINAL PARAMETERS: Number of residuals 1284 Standard error Log likelihood AIC SBC Variables in the Model: B SEB T-RATIO APPROX. PROB. AR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV TIME CONSTANT Variance of pure error term smaller than variance of error term in ordinary regression!
Non-parametric tests for trend All models so far taken up in the course are parametric models. Parametric models assume a specific probability distribution is governing the obtained observations (i.e. the normal distribution) and The population mean value of each observation can be expressed in terms of the parameters of the model. What if we cannot specify this probability distribution?
Least-squares fitting of time series regression models can still be done, but none of the significance tests are valid We cannot test for the presence of a trend (nor for the presence of seasonal variation) Classical decomposition is still possible but they have no significance tests built-in (they are all descriptive analysis tools) Conditional least-squares estimation in ARIMA models are not valid as they emerge from the assumption that the observations are normally distributed. As a consequence the significant tests are not valid.
The Mann-Kendall test for a monotonic trend Example: Look again at the data set of sales values from lecture 3, but with restriction to the years Year Sales values Could there be a trend in data?
If there is a trend, we do not assume that it has a specific functionalform, such as linear or quadratic, just assume it is monotonic, i.e. decreasing or increasing. In this case it would be a decreasing trend. The sign function:
Now define the Mann-Kendall test statistic as i.e. the statistic is a sum of +1:s, –1:s and 0:s depending on whether y j is higher than, lower than or equal to y i for each pair of time points (i, j : i < j). Large positive values of T would then be consistent with an upward trend Large negative values of T would be consistent with a downward trend Values around 0 of T would be consistent with no trend
For the current data set: Now, is T = – 43 enough negatively large to show evidence for a trend?
The non-parametric initial “fashion”: Calculate all possible values of T by letting each difference y j – y i, i < j have in order the signs –1, 0 and 1. (Put these values in ascending order ) For the test of H 0 : No trend vs. H A : Negative monotonic trend at the level of significance , calculate the (100 )th percentile of the (ordered) values T If the observed T is < T reject H 0, otherwise “accept” H 0 If a fairly long length of the time series this procedure is quite tedious.
Approximate solution: The variance of T can be shown to be where n is the length of the time series, g is the number of so-called ties (ties means values that have duplicates) and t p is the number of duplicates for tie p. Then for fairly large n
For the current time series of sales values: n = 11 g = 3 (the values 143, 145 and 151 have each two duplicates) t 1 = t 2 = t 3 = 2 Var (T ) = (1/18) (11 10 27 – (3 2 1 9)) = 162 P-value is Thus H 0 may be rejected at any reasonable level of significance
For time series with seasonal variation, Hirsch & Sclack has developed a modification of the Mann-Kendall test with test statistic where T k is the Mann-Kendall test statistic for the time series consisting of values from season k only (e.g. for montly data we consider the series of January values, the series of February values etc.) Expressions for the variance of TS can be derived and analogously to the Mann- Kendall test