Stochastic Population Forecasting and ARIMA time series modelling Lectures QMSS Summer School, 2 July 2009 Nico Keilman Department of Economics, University.

Slides:



Advertisements
Similar presentations
COMM 472: Quantitative Analysis of Financial Decisions
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Correlation and regression
STAT 497 APPLIED TIME SERIES ANALYSIS
Uncertain population forecasts Nico Keilman Department of Economics, University of Oslo.
Bivariate Regression Analysis
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
The Simple Linear Regression Model: Specification and Estimation
Linear Regression.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Forecasting.
Bootstrap in Finance Esther Ruiz and Maria Rosa Nieto (A. Rodríguez, J. Romo and L. Pascual) Department of Statistics UNIVERSIDAD CARLOS III DE MADRID.
Why sample? Diversity in populations Practicality and cost.
Lecture 9: One Way ANOVA Between Subjects
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Inference.ppt - © Aki Taanila1 Sampling Probability sample Non probability sample Statistical inference Sampling error.
V. Statistical Demography
Business Statistics - QBM117 Statistical inference for regression.
Key Issue 2: Why Do Populations Rise & Fall in Particular Places?
Determining the Size of
Lecture II-2: Probability Review
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Northern Ireland Demographic Projections 2 nd December 2008 Dr David Marshall Demography and Methodology Branch.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
1 POPULATION PROJECTIONS Session 6 - Introduction to population projections Ben Jarabi Population Studies & Research Institute University of Nairobi.
Methods - Rehearsel Nico Keilman Demography of developing countries ECON 3710 I-lands demografi ECON 3720 January 2009.
Hydrologic Statistics
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
ICMEC seminar, 22 February 2010 The provision of child care services; the Barcelona targets revisited Janneke Plantenga
Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Reversing the reversal? The cross-country correlation between female labour market participation and fertility revisited Anna Matysiak and Tomáš Sobotka.
Population projections: Uncertainty and the user perspective Presentation to INIsPHO Seminar Newry, 2 December 2008 Tony Dignan.
Did European fertility forecasts become more accurate in the past 50 years? Nico Keilman.
Comparing Two Population Means
Inferential Statistics 2 Maarten Buis January 11, 2006.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Module 12: Advanced Session on using the RAP ILO, 2013.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Stochastic population forecasts for the United Kingdom Emma Wright & Mita Saha Office for National Statistics.
Developing stochastic population forecasts for the United Kingdom: Progress report and plans for future work Emma Wright Office for National Statistics.
Managerial Economics Demand Estimation & Forecasting.
Demographic Uncertainty and the Sustainability of Social Welfare Systems Jukka Lassila ETLA Finland.
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
Sub-regional Workshop on Census Data Evaluation, Phnom Penh, Cambodia, November 2011 Evaluation of Age and Sex Distribution United Nations Statistics.
Northern Europe Label the following countries on the next page, using the color each countries is labeled in: -United Kingdom (blue) -Ireland (green) -Iceland.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Oversampling the capital cities in the EU SAfety SUrvey (EU-SASU) Task Force on Victimization Eurostat, February 2010 Guillaume Osier Service Central.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
2014-based National Population Projections Paul Vickers Office for National Statistics 2 December 2015.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Uncertainty and Reliability Analysis D Nagesh Kumar, IISc Water Resources Planning and Management: M6L2 Stochastic Optimization.
Northern Europe Label the following countries on the next page, using the color each countries is labeled in, then add capitals to each country using a.
Chapter Eleven Sample Size Determination Chapter Eleven.
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 4: Estimating parameters with confidence.
Demand Management and Forecasting Chapter 11 Portions Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
EC 827 Module 2 Forecasting a Single Variable from its own History.
Combining Deterministic and Stochastic Population Projections Salvatore BERTINO University “La Sapienza” of Rome Eugenio SONNINO University “La Sapienza”
Estimating standard error using bootstrap
Why is the Global Population Increasing?
Simulation-Based Approach for Comparing Two Means
Lecturer Dr. Veronika Alhanaqtah
CH2 Time series.
Presentation transcript:

Stochastic Population Forecasting and ARIMA time series modelling Lectures QMSS Summer School, 2 July 2009 Nico Keilman Department of Economics, University of Oslo

Stochastic Stochastic (from the Greek "Στόχος" for "aim" or "guess") means random. A stochastic process is one whose behaviour is non- deterministic in that a system's subsequent state is determined both by the process's predictable actions and by a random element. In a stochastic population forecast, uncertainty is made explicit: random variables are part of the forecast model.

Stochastic population forecast Future population / births / deaths /migrations as probability distributions, not one number (perhaps three)

Why Stochastic Population Forecasts (SPF)? Users should be informed about the expected accuracy of the forecast - probability of alternative future paths? - which forecast horizon is reasonable? Traditional deterministic forecast variants (e.g. High, Medium, Low) - do not quantify uncertainty  Prob(MediumPop) = 0 !! - give a misleading impression of uncertainty (example later) - leave room for politically motivated choices by the user

Outline Uncertainty of population forecasts Principles of SPF Time series models (selected examples) Alho’s scaled model for error Examples from UPE Using a SPF Focus on national forecasts

How uncertain are population forecasts? Empirical findings – historical forecasts evaluated against actual population numbers (ex post facto)

Main findings for official forecasts in Western countries Uncertainty in forecasts of certain population variables surprisingly large Forecasts for the young and the old age groups are the least reliable Forecast errors increase as forecast interval lengthens Large uncertainty for small countries Large uncertainty for countries that are strongly affected by migration European forecasts have not become more accurate since WW2

Errors in age structure forecasts Europe

United Kingdom - men

United Kingdom - women

Why uncertain? Data quality (LDC’s) Social science predictions, no accurate behavioural theory Rely on observed regularities instead  Problems when sudden trend shifts occur - stagnation life expectancy men 1950s - baby boom/baby bust

Traditional population forecasts do not give a correct impression of uncertainty

Example: Old Age Dependency Ratio (OADR) for Norway in 2060 Source: Statistics Norway population forecast of 2005 HighMiddle Low|H-L|/M millions (%) POP POP OADR

Two major problems Wide margins for some variables, narrow margins for others Narrow margins in the short run, wide margins in the long run - implicitly assumed perfect autocorrelation (and sometimes perfect correlation across components)

Coverage probabilities for H-L margin of total population in official forecasts Statistics Norway47%78% Statistics Sweden -Fertility19%32% -Mortality 4%20% -Migration 1%34% Sources: Stochastic population forecasts from UPE Traditional forecasts from Statistics Norway and Statistics Sweden

Cohort-component method Deterministic population forecast Needed for the country in question: annual assumptions on future –Fertility  Total Fertility Rate –Mortality  Life expectancy at birth M/F –Migration  Net immigration –as well as rates (fertility, mortality) & numbers (migration) by age & sex

Stochastic Population Forecast: How? Cohort-component method Random rates for fertility and mortality, random numbers for net-migration Normal distributions in the log scale (rates) or in the original scale (migration numbers) - expected values (“point predictions”) – cf. Medium variant in traditional deterministic forecast - standard deviations - correlations (age, time, sex, components, countries)

SPF: How? (cntnd) Joint distribution of all random input variables (rates, migration numbers) In practice: simplifications, e.g. - independence of components (fertility, mortality, migration) - correlation between male and female mortality (constant across ages, time) One random draw from all prob. distributions  one sample path Repeated draws  thousands of sample paths

SPF: How? (cntnd) Three main approaches: uncertainty parameters based on -historical errors -expert knowledge -statistical model

SPF: Examples Multivariate time series models for all parameters of interest Examples for Norway , see and European countries , see Alho’s scaled model for error, implemented in PEP (Program for Error Propagation) Example for aggregate of 18 European countries , see

Time series example, Norway: log(TFR) = ARIMA(1,1,0) Z t = 0.67Z t-1 + ε t-1, Z t = log(TFR t ) - log(TFR t-1 ) (0.10)

Prediction intervals, age-specific fertility rates, Norway 2050

Time series models for parameters of Gamma model for age-specific fertility (TFR, MAC, variance in age at childbearing) e0 parameters of Heligman-Pollard model for age- specific mortality immigration numbers emigration numbers (deterministic age patterns for both migration flows)  5000 simulations

Population size, Norway

Time series models, two examples 1. Autoregressive model of order 1 - AR(1) Z t = φZ t-1 + ε t |φ| < 1, ε t i.i.d random variables, zero expectation, constant variance – ”white noise” Var(Z t ) = Var(ε t )(1- φ 2t )/(1- φ 2 ) constant (in the long run – large t) For large t: k-step ahead autocorrelation Corr(Z t, Z t+k ) equals φ k, independent of time

2. Random Walk - RW Z t = Z t-1 + ε t Var(Z t ) = t*Var(ε t ) unbounded for large t Independent increments (zero autocorrelation)

Forecasts and 95% prediction intervals for net migration. Data Outliers: 1989 AR(1) & const: Z t = Z t-1 +ε t Outliers: 1962, 1988 AR(1) & const: Z t = Z t-1 +ε t

Forecasts and 67%, 80%, and 95% prediction intervals for the TFR. Data Observed TFR-value for the year 2000 is given as “y2000” Model: AR(1) & constant Z t (=logTFR t ) = Z t-1 + ε t

Forecasts and 67%, 80%, and 95% prediction intervals for the TFR. Data Observed TFR-value for the year 2000 is given as “y2000” Model: AR(1) & constant Outliers 1920, 1942 Z t (=logTFR t ) = Z t-1 + ε t

Forecasts and 67%, 80%, and 95% prediction intervals for the TFR. Data Observed TFR-value for the year 2000 is given as “y2000” Model: AR(2) & constant Z t (=logTFR t ) = Z t Z t-2 + ε t

Forecasts and 67%, 80%, and 95% prediction intervals for the TFR. Data Observed TFR-value for the year 2000 is given as “y2000” Model: AR(2)-ARCH(1) Outliers 1919, 1920, 1940, 1941 Z t (=logTFR t ) = Z t-1 + v t + dummies v t = v t-2 + ε t, ε t = (√h t )e t, h t = 7E (ε t 2 )

Time series approach to SPF + conceptually simple - inflexible Alternative: Alho’s scaled model for error Implemented in Program for Error Propagation (PEP) htm. htm

Scaled model for error Suppose the true age-specific rate in age j during forecast year t > 0 is of the form R(j,t) = F(j,t)exp(X(j,t)), where F(j,t) is the point forecast, and X(j,t) is the relative error

Suppose that the error processes are of the form X(j,t) = ε(j,1) ε(j,t) with error increments of the form ε(j,t) = S(j,t)(η j + δ(j,t)) S(j,t) deterministic scales. δ(j,t) are independent over time t. δ(j,t) are independent of η j for all t and j η j ~ N(0, κ), δ(j,t) ~ N(0, 1 - κ), 0 ≤ κ ≤ 1 Note that Var(ε(j,t)) = S(j,t) 2 A positive kappa means that there is systematic error in the time trend of the rate.

κ = Corr[ε(j,t), ε(j,t+h)] for all h > 0, thus κ is the (constant) autocorrelation between the error increments. Together, the autocorrelation κ and the scale S(j,t) determine the variance of the relative error X(j,t). Ex. 1. Under a random walk model the error increments are uncorrelated with κ = 0. Ex. 2. The model with constant scales (S(j,t)=S(j)) can be interpreted as a random walk with a random drift. The relative importance of the two components is determined by κ.

Migration Migration (net) is represented in absolute terms Dependence on age is deterministic, given by a fixed distribution g(j,x) over age x The error of net migration in age x, for sex j, during year t > 0, is additive and of the form Y(j,x,t) = S(j,t)g(j,x)(η j + δ(j,t))

Key properties of the scaled model The choice of the scales S(j,t) is unrestricted. Hence any sequence of non-decreasing error variances can be matched (e.g. heteroscedasticity) Any sequence of cross-correlations over ages can be majorized using the AR(1) models of correlation Any sequence of autocorrelations for the error increments can be majorized.

Scaled model for error Used for UPE project: Uncertain Population of Europe 18 countries: EU15 + Iceland, Norway, Switzerland (EEA+) 2003 – 2050 Probability distributions specified on the basis of - time series analysis (TFR, e0, net-migr.) - empirical forecast errors - expert judgement 3000 simulations for each country, PEP

Population size EEA+ median (black), 80% prediction intervals (red) 77% chance > 400 million in 2050 (UN) 83% chance > 392 million in 2050 (2003)

median (black), 80% prediction intervals (red)

How to use SPF results? User’s Loss function What are the costs associated with underpredictions/ overpredictions of certain sizes?

Loss function, stylized example F = forecast O = observed Loss= c.(F - O)F > O(c, λ > 0) = λ.c.(O - F)F < O λ characterizes degree of symmetry in the loss function λ > 1: underprediction is more severe than overprediction

Forecast F is a stochastic variable with a predictive distribution Hence Loss is a s.v., which has a distribution Compute expected Loss Pick that value of F, which minimizes expected Loss  The optimal F is that value of F at which the statistical distribution function equals λ /(λ +1) λ =1: median value of F λ > 1: optimal F is larger than the median

e62 ~ Normal(20, stdev) λ > 1: underprediction is more severe than overprediction

Important Are overpredictions more/less harmful than underpredictions?

Challenges Multi-state forecasts (sub-national, household) Limited data Educate the users

Thank you!

Autocorrelation of error increments The error processes are of the form X(j,t) = ε(j,1) ε(j,t) with error increments of the form ε(j,t) = S(j,t)(η j + δ(j,t)) S(j,t) deterministic scales. δ(j,t) are independent over time t. δ(j,t) are independent of η j for all t and j η j ~ N(0, κ), δ(j,t) ~ N(0, 1 - κ), 0 ≤ κ ≤ 1 A positive kappa means that there is systematic error in the time trend of the rate.

UPE: age specific fertility rates We assumed that kappa = 0  random walk, non-correlated error increments ε(j,t) = S(j,t)δ(j,t) δ(j,t) i.i.d. ~ N(0, 1) Example Italy Pop aged 0 in 2050: - Expected value = 474,000 - Median= 420,000 - Standard deviation = 261,000 - Coefficient of variation = 0.55

Alternative assumption: kappa = 0.05 Italy Pop aged 0 in 2050: - Expected value = 678,000 - Median= 457,000 - Standard deviation = 794,000 - Coefficient of variation = 1.17 Kappa = 0.1 gives unrealistically wide prediction intervals for Pop aged 0 in 2050

EEA+ 15 EU countries: Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Ireland, Luxembourg, Netherlands, Portugal, Spain, Sweden, United Kingdom Iceland, Norway, Switzerland

Net migration to the countries of the EEA+: upward trend

Net migration to Italy

UPE assumptions for net migration Increase to ca. 3.5 ‰ by 2050 for the whole of the EEA+ Demand for labour (ageing, economic developments) North – South divide

UPE assumptions for mortality By 2030, mortality reductions in EEA+ countries will follow a common pattern Sex gap of life expectancy reduces to 4 years Life expectancy gains to 2050 by 6.5 (NL) -10 (Lux, Pt, E) years for men 5.7 (NL) – 9.6 (EIR) years for women On average 2-3 years higher than Eurostat/UN

UPE life expectancies too high? under Historically, increases in European life expectancies have been under-estimated by - 2 years (15 years ahead) years (25 years ahead) Record life expectancy is higher & increases faster than UPE - ca years per calendar year

UPE assumptions for fertility Mediterranean and German speaking countries low - little catching up - problems with child care facilities, housing - preference for one child Total Fertility Rate = 1.4 c/w Western and Northern Europe Total Fertility Rate = 1.8 c/w Similar to Eurostat, on average 0.2 c/w lower than UN

UPE: probabilistic forecast Similar method as UN, Eurostat (cohort-component) But parameters are drawn from assumed distributions -- simulation Volatility in fertility, mortality, migration Autocorrelations Correlations across ages, sexes, countries

Population size medians (black) and 80% prediction intervals (red) 2050 SCB10.5 mln SSB 4.8 mln

Age pyramid 2050 medians & 80 % prediction intervals

UPE assumptions Sweden 2050 exp.80%L 80%H SCB value TFR e0M e0F migr