Presentation is loading. Please wait.

Presentation is loading. Please wait.

USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

Similar presentations


Presentation on theme: "USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical."— Presentation transcript:

1 USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical Engineering Riga Technical University

2 The 8th Tartu Conference on Multivariate Statistics 2 Outline 1.Introduction 2.Informative base 3.Used models for analyzing and forecasting of the air passengers’ conveyances 4.Elaboration of linear models 5.Elaboration of generalized linear models 6.Conclusion 7.References

3 The 8th Tartu Conference on Multivariate Statistics 3 1. Introduction Most the literature which is devoted to forecasting of transport flows contain only simple forecasting models on the base of the time series methods [Hünt (2003)] or linear regression methods with small number of explanatory variables [Butkevičius, Vyskupaitis (2005), Šliupas (2006)]. Two different approaches for the forecasting of air passengers conveyances from EU countries were considered in this investigation:  the classical method of linear regression;  the generalized linear model (GLM). The aim of this investigation is to illustrate the advantage of using the GLM comparing with the simple linear regression models. The verification of the models and the evaluation of the unknown parameters are included as well. All calculations are being done with Statistica 6.0 and elaborated computer software in MathCad 12.

4 The 8th Tartu Conference on Multivariate Statistics 4 t 1 -total population of the country (TP), millions of inhabitants; t 2 -area of the country (AREA), thousands of km 2 ; t 3 -density of the country population (PD), number of inhabitants per km 2 ; t 4 -monthly labour costs (MLC), thousands of euros; t 5 -gross domestic product (GDP) “per capita” in Purchasing Power Standards (PPS) (GDP_PPS); t 6 -gross domestic product (GDP), billions of euro; t 7 -comparative price level (CPL); t 8 -inflation rate (IR); t 9 -unemployment rate (UR); t 10 -labour productivity per hour worked (LPHW). Factors 2. Informative base The forecasted variable was the number of air passenger carried, expressed in millions of passengers.

5 The 8th Tartu Conference on Multivariate Statistics 5 The following 25 countries of EU were selected: Belgium, Czech Republic, Denmark, Germany, Estonia, Greece, Spain, France, Ireland, Italy, Cyprus, Latvia, Lithuania, Luxembourg, Hungary, Malta, Netherlands, Austria, Poland, Portugal, Slovenia, Slovakia, Finland, Sweden and United Kingdom. The considered period was from 1996 to 2005. All data for this investigation have been received from the electronic database “The Statistical Office of the European Communities” (EUROSTAT) http://epp.eurostat.ec.europa.eu The final number of the observation was 161:  Data for the period from 1996 to 2004 have been used for the estimation and forecasting - 140 observations;  Data of the 2005 have been used for the check out of the quality of forecasting, so called the cross-validation (CV) - 21 observations.

6 The 8th Tartu Conference on Multivariate Statistics 6 3. Used models for analyzing and forecasting of the air passengers’ conveyances The data about concrete country for the concrete year were taken as the observation. The main object of the consideration was the air passengers’ conveyances from EU countries. All the considered models were the group models [Andronov (1983)]. Classification of regressional models according to their mathematical form:  Linear regression models;  Generalized linear regression models (GLM). Main notions

7 The 8th Tartu Conference on Multivariate Statistics 7 The linear regression model [Hardle (2004)]: E(Y (k) (x)) = x T ,(1) where:  Y (k) is a dependent variable for the k-th considered model;  x = (x 1, x 2, …, x d ) T is d-dimensional vector of explanatory variables;   = (  0,  1,  2, …,  d ) T is a coefficient vector that has to be estimated from observations for Y (k) and x. The generalized linear regression model: E(Y (k) (x)) = G{x T  },(2) where G(  ) is the known function of the one dimensional variable.

8 The 8th Tartu Conference on Multivariate Statistics 8 4. Elaboration of linear models The basic criteria for the best model choosing: 1.Multiple coefficient of determination (R 2 ); 2.Fisher criterion (F); 3.Sum of the squares of the residuals (SSRes); 4.Sum of the squares of residuals for the cross-validation (CV SSRes). For the checking of the statistical hypotheses we always used the statistical significance level  = 0.05. MODEL #1 MODEL #1 Y (1) =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  6 x 6 +  7 x 7 +  8 x 8 +  9 x 9 +  10 x 10, where Y (1) is the total number of air passenger carried; x 1 = t 1, x 2 = t 2, x 3 = t 3, x 4 = t 4, x 5 = t 5, x 6 = t 6, x 7 = t 7, x 8 = t 8, x 9 = t 9, x 10 = t 10.

9 The 8th Tartu Conference on Multivariate Statistics 9 Table 1 Results for the MODEL #1 Ê(Y (1) (x)) = 14 – 0,77x 1 + 0,16x 2 + 185,8x 3 -2,44x 4 + 0,53x 5 + 0,07x 6 + 0,05x 7 + + 0,32x 8 -1,2x 9 - 1,03x 10.. Fisher criterion F = 63.49R 2 = 0.831 VariableFactorbt(129)p-level Intercept14.000.840.405 x1x1 TP-0.77-1.560.121 x2x2 AREA0.165.600.000 x3x3 PD185.804.670.000 x4x4 MLC-2.44-0.440.660 x5x5 GDP_PPS0.531.680.096 x6x6 GDP0.073.810.000 x7x7 CPL0.050.370.710 x8x8 IR0.320.290.771 x9x9 UR-1.20-1.590.114 x 10 LPHW-1.03-3.750.000

10 The 8th Tartu Conference on Multivariate Statistics 10 MODEL #2 MODEL #2 Y (2) =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5, where Y (2) = Y (1) ; x 1 = t 2, x 2 = t 3, x 3 = t 6, x 4 = t 10, x 5 = t 11. Results for the MODEL #2 Ê(Y (2) (x)) = 13.56 + 0,09x 1 + 134,01x 2 + 0,05x 3 - 0,68x 4 + 29,36x 5. t 11 (ON) = 0, if the considered country is the old member of EU; 1, if the considered country is the new one. Table 2 VariableFactorbt(134)p-level Intercept13.562.450.016 x1x1 AREA0.094.450.000 x2x2 PD134.014.320.000 x3x3 GDP0.0510.340.000 x4x4 LPHW-0.68-5.120.000 x5x5 ON29.364.210.000 R 2 = 0.829 Fisher criterion F = 129.85 New factor

11 The 8th Tartu Conference on Multivariate Statistics 11 MODEL #3 MODEL #3 Y (2) =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5, where Y (3) = Y (1) ; Table 3 Results for the MODEL #3 Ê(Y (3) (x)) = -6,34 + 113,26x 1 + 0,14x 2 - 0,52x 3 - 0,03x 4 + 3,03x 5 R 2 = 0.867 Fisher criterion F = 174.08 Modifications of factors VariableFactorbt(134)p-level Intercept-6.34-1.050.296 x1x1 PD113.264.000.000 x2x2 GDP0.1410.660.000 x3x3 LPHW-0.52-5.800.000 x4x4 sq(TP)-0.03-7.560.000 x5x5 sqrt(AREA)3.035.740.000

12 The 8th Tartu Conference on Multivariate Statistics 12 Analysis of observed and predicted values for the MODEL #3 12 Figure 1. Plot of observed and predicted values. Figure 2. Plot of observed and predicted values for the CV.

13 The 8th Tartu Conference on Multivariate Statistics 13 MODEL #4 MODEL #4 Y (4) =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  6 x 6 +  7 x 7 +  8 x 8 +  9 x 9, where Y (4) = Y (1) /t 1 - the ratio between the total number of air passenger carried and the number of inhabitants of the country; Table 4 Results for the MODEL #4 Ê(Y (4) (x)) = 0,56 + 2,33x 1 - 1,04x 2 - 0,02x 3 + 0,001x 4 + 1,76x 5 - 0,0004x 6 + +0,04x 7 + 0,17x 8. R 2 = 0.760 Fisher criterion F = 45.81 VariableFactorbt(131)p-level Intercept-5.67-6.250.000 x1x1 AREA-0.02-6.730.000 x2x2 PD10.376.190.000 x3x3 MLC-0.73-4.190.000 x4x4 ON0.838.300.000 x5x5 sqrt(TP)-1.02-7.320.000 x6x6 sqrt(AREA)1.067.100.000 x7x7 AREA/TP-0.12-6.980.000 x8x8 sqrt(AREA)/TP0.945.840.000 x9x9 GDP/TP0.156.280.000

14 The 8th Tartu Conference on Multivariate Statistics 14 MODEL #5 MODEL #5 Y (2) =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4 +  5 x 5 +  6 x 6 +  7 x 7 +  8 x 8, where Y (5) = Y (4) ; t 12 (HL) = 0, if the value y/t 1 for the considered country is small (less than 2); 1, if the value y/t 1 is larger than 2. Table 5 Results for the MODEL #5 Ê(Y (5) (x)) = 0,99 - 0,46x 1 - 0,02x 2 - 0,02x 3 - 0,02x 4 + 0,01x 5 + 1,27x 6 + 1,15x 7 + 0,07x 8 R 2 = 0.864 Fisher criterion F = 104.174 New New factor VariableFactor bt(131)p-level Intercept0.993.930.000 x1x1 MLC-0.46-3.410.001 x2x2 GDP_PPS-0.02-3.810.000 x3x3 IR-0.02-1.330.187 x4x4 UR-0.02-1.900.056 x5x5 LPHW0.013.720.000 x6x6 ON1.279.210.000 x7x7 HL1.1515.300.000 x8x8 GDP/TP0.073.410.001

15 The 8th Tartu Conference on Multivariate Statistics 15 Pivot results for the linear regression models ModelR2R2 R1R1 FR2R2 SSResR3R3 CV SSRes R4R4 Sum R Total R #10.831363.49452 6515114 8855175 #20.8294129.85253 3445109 7234153 #30.8671174.10141 599249 450151 #40.760545.81535 064357 3103164 #50.8642104.20312 775151 448282 Table 6

16 The 8th Tartu Conference on Multivariate Statistics 16 Analysis of observed and predicted values for the MODEL #5 34 Figure 3. Plot of recalculated observed and predicted values. Figure 4. Plot of recalculated observed and predicted values for the CV.

17 The 8th Tartu Conference on Multivariate Statistics 17 4. Elaboration of generalized linear models For the further investigation the best linear regression model (Model #5) has been chosen Two different GLM were considered. In both of them the value of the regressand Y (GLM) = Y (5) / t 1 and the collection of the regressors are the same as for Model #5. GLM1 where h i is the total population number, x i is vector-columns of the independent variables, i is the observation number, i = 1, 2, …, n. (3) GLM2 (4) where a is additional parameter (constant).

18 The 8th Tartu Conference on Multivariate Statistics 18 For unknown parameter vector  estimation we used the least squares criterion 1. Linearization (5) where Y i and Ŷ i are observed and calculated values of Y. LM1 LM2 (6) (7) where Y * = Y/ h.

19 The 8th Tartu Conference on Multivariate Statistics 19 The models LM1 and LM2 give the following estimate for E(Y) We can see that linearization gives bad results. Making attempts to improve the obtained results a two-stage estimation procedure was developed. The first stage corresponds to the considered linearization. As the second step we used the procedure of calibration when we precise the gotten estimates by using the well-known gradient method. SSResCV SSRes Model #5LM1LM2Model #5LM1LM2 R0/nR0/n12 77527 44721 83451 448676 576229 554 Table 7 The values of SSRes and CV SSRes for the Model #5 and LM

20 The 8th Tartu Conference on Multivariate Statistics 20 Gradients for the least squares criterion GLM1 GLM2 (8) (9) 2. Calibration

21 The 8th Tartu Conference on Multivariate Statistics 21 The GLM1 and GLM2 have the following estimates for E(Y): CV SSRes Model #5GLM1GLM2 R 0 /n51 44747 80734 567 Table 8 For the GLM2 we found the optimum value of R 0 not only from the values  but from the parameter  also.

22 The 8th Tartu Conference on Multivariate Statistics 22 Analysis of observed and predicted values for the GLM 56 Figure 5. Plot of observed and predicted values. Figure 6. Plot of observed and predicted values for the CV.

23 The 8th Tartu Conference on Multivariate Statistics 23 Figure 7. The values of SSRes and CV SSRes as a function of parameter  for GLM 2 Dependence of values SSRes and CV SSRes from the value of parameter  for GLM2 7 The optimal value for analysis of SSRes was obtained then  = 2. The best result for the analysis of CV SSRes was obtained then  = 6.

24 The 8th Tartu Conference on Multivariate Statistics 24 6. Conclusion The linear and generalized linear regressional models for the forecasting of air passengers conveyances from EU countries were considered. These models contain a big number of explanatory factors and their combinations. For the estimation of the unknown parameters of the linear regressional models we used the standard procedures. For the estimation of unknown parameters of GLM the special two-stage procedure has been elaborated. The cross-validation approach has been taken as the main procedure for the check out the adequacy of all considered models and choosing the best model for the forecasting. The advantage of GLM application has been shown.

25 The 8th Tartu Conference on Multivariate Statistics 25 7. References 1.Andronov A.M. etc. Forecasting of air passengers conveyances on the transport. // Transport, Moscow, 1983. (In Russian). 2.Butkevičius J., Vyskupaitis A. Development of passenger transportation by Lithuanian sea transport. // In Proceedings of International Conference RelStat’04, Transport and Telecommunication, Vol.6. N 2, 2005. 3.Hardle W., Muller M., Sperlich S., Werwatz A. Nonparametric and Semiparametric Models. Springer, Berlin, 2004. 4.Hünt U. Forecasting of railway freight volume: approach of Estonian railway to arise efficiency. // In TRANSPORT – 2003, Vol. XXVIII, No 6, pp. 255-258. 5.Šliupas T. Annual average daily traffic forecasting using different techniques. // In TRANSPORT – 2006, Vol. XXI, No 1, pp. 38-43. 6.EUROSTAT YEARBOOK 2005. The statistical guide to Europe. Data 1993–2004. EU, EuroSTAT, 2005. URL: http://epp.eurostat.ec.europa.euhttp://epp.eurostat.ec.europa.eu

26 The 8th Tartu Conference on Multivariate Statistics 26 THANK YOU FOR YOUR ATTENTION


Download ppt "USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical."

Similar presentations


Ads by Google