Instructor: Prof. Louis Chauvel

Slides:



Advertisements
Similar presentations
1 Master Economics and Public Policy Ecole Polytechnique – ENSAE - Sciences Po Academic year Quantitative sociology.
Advertisements

1 Where the Boys Aren’t: Recent Trends in U.S. College Enrollment Patterns Patricia M. Anderson Department of Economics Dartmouth College And NBER.
1 Examples of Fixed-Effect Models. 2 Almond et al. Babies born w/ low birth weight(< 2500 grams) are more prone to –Die early in life –Have health problems.
1 Almond et al. Babies born w/ low birth weight(< 2500 grams) are more prone to – Die early in life – Have health problems later in life – Educational.
Brief introduction on Logistic Regression
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Linear Regression with One Regression
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
© Jorge Miguel Bravo 1 Eurostat/UNECE Work Session on Demographic Projections Lee-Carter Mortality Projection with "Limit Life Table" Jorge Miguel Bravo.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
Accounting for the Effect of Health on Economic Growth David N. Weil Proponent/Presenter Section.
Binary Response Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Exact Logistic Regression
The Probit Model Alexander Spermann University of Freiburg SS 2008.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
WHY ARE WOMEN’S AND MEN’S WORK LIVES CONVERGING? DEMOGRAPHY, HUMAN CAPITAL INVESTMENTS, AND LIFETIME EARNINGS Joyce Jacobsen (Wesleyan University) Melanie.
Methods of Presenting and Interpreting Information Class 9.
PROVIDING INTERNATIONAL COMPARABILITY OF POVERTY ASSESSMENTS
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Bootstrap and Model Validation
More Necessary and Less Sufficient: Age-Period-Cohort Approach to Overeducation in a Comparative Perspective Eyal Bar-Haim, Anne Hartung and Louis Chauvel.
Econ 326 Prof. Mariana Carrera Lab Session X [DATE]
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Linear Regression with One Regression
Advanced Quantitative Techniques
5.1 INTRODUCTORY CHI-SQUARE TEST
Department of Politics and International Relations
CHAPTER 7 Linear Correlation & Regression Methods
Advanced Quantitative Techniques
(Universitat Pompeu Fabra and Barcelona GSE)
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS
Poverty, Gender and Well-Being: An Urban-Rural Perspective
Antidepressant Use Among Working Age Canadians:
Ageing Poorly? Accounting for the Decline in Earnings Inequality in Brazil, Francisco Ferreira, PhD1; Sergio Firpo, PhD2; Julián Messina, PhD3.
Generalized Linear Models
A statistical package for epidemiologists
Inequality Across Cohorts
Fertility, Education and Birth Cohorts
Inequality Across Cohorts
Generalized Linear Models (GLM) in R
Introduction to logistic regression a.k.a. Varbrul
Advanced Quantitative Analysis
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistical Modelling
Stats Club Marnie Brennan
EMPIRICAL STUDY AND FORECASTING (II)
Migration and the Labour Market
What is Regression Analysis?
Logistic Regression.
Tue 8-10, Period III, Jan-Feb 2018
Our theory states Y=f(X) Regression is used to test theory.
Motivation THIS TALK: 1. Documents a stagnation in the schooling attainment at age 25 of Spanish cohorts born after Can we explain the poorer.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
1/18/2019 ST3131, Lecture 1.
SDMX Information Model: An Introduction
Louis Chauvel & Anja Leist, University of Luxembourg
Count Models 2 Sociology 8811 Lecture 13
Introduction to Econometrics, 5th edition
Ordinary Least Square estimator using STATA
by Patrick Francois, Thomas Fujiwara, and Tanguy van Ypersele
Cohort analysis using LIS data
Presentation transcript:

Instructor: Prof. Louis Chauvel Advanced Statistical Analysis: Advanced tools from epidemiologists and demographers: Poisson regressions, age-period-cohort models, etc (Dec 14) Instructor: Prof. Louis Chauvel

This session: Advanced tools from epidemiologists and demographers Defining the fields of “Epidemiology” / “Biostats” / “Demo” The study (description and search for causes) of diseases in populations Set of specific tools including count, aging, cohort models Set of references « As usual »: CHAPTERS 7 (glm) & 11 (“Some epidemiology”) in the STATA ADVANCED MANUAL: http://www.louischauvel.org/stata_manuel_advanced.pdf Plus more recent …

Main references Find them online on http://www.a-z.lu/

Other references Find them online on http://www.a-z.lu/

SEE ALSO Find this online at : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158538

This session Reminders on the glm “generalized linear model” Examples of Poisson models The age-period-cohort model in demography & epidemiology New développements on the APC model

Reminders on the glm “generalized linear model”

Reminders on the glm “generalized linear model” CHAPTER 7 (glm) in the STATA ADVANCED MANUAL: http://www.louischauvel.org/stata_manuel_advanced.pdf Ordinary Least Square (OLS), Logit, Poisson, etc. models find the same general expression where only distribution (“family” in stata) and link function change, given the nature of the outcome variable OV OV = continuous OV = binary OV = count

Typical cases See do file in the first part of: http://www.louischauvel.org/glm.do reg day i.gender i.ethnic i.class glm day i.gender i.ethnic i.class, f(gauss) l(id) logit absent i.gender i.ethnic i.class glm absent i.gender i.ethnic i.class, f(bin) l(logit) poisson days i.gender i.ethnic i.class glm days i.gender i.ethnic i.class, f(poisson) l(log) Ordinary Least Square (OLS) Logit model Poisson model See the options of glm help glm

Poisson models

Examples of Poisson models on mortality https://fr.wikipedia.org/wiki/Sim%C3%A9on_Denis_Poisson Why Poisson? When outcome is a count variable (counts of the number of times that events occur during a time period) a suitable model is the Poisson regression. Count variables: days in absentia, nb of life events, death, etc. In case of mortality: counts of death and exposure to risk (pop at risk) SEE https://www.mortality.org/ See do file in the second part of: http://www.louischauvel.org/glm.do

Examples of Poisson models on mortality Poisson coefficients (=log of death rates) by age groups in 2010-2014 Log of death increases linearly by 10% each year Doubles each 7th year… Exercise 1: for women? Exercise 2: across years? keep if age>=40 & 90>age glm dm i.age if ye==2010, f(poisson) l(log) exp(rm) glm dm age if ye==2010, f(poisson) l(log) exp(rm)

Introduction to APC Age-Period-Cohort models See pp 230 sqq of

Introduction to APC Age-Period-Cohort models Consider effects of age, of period, of cohort Collinearity of A = P - C Non linear effects: age thresholds, period fractures, cohort scars

Introduction to APC Age-Period-Cohort models Methodology I : the base  A = P – C The Lexis Diagram (1872) 2030 C 1918 C 1978 1890 1910 1930 1950 Period 60 40 20 Age Life line : cohort born in 1948 1970 Isochron observation in 1968 at year of observation: 20 1990 2010 80 BUT ! How to distinguish durable scarring effects and fads ??? Hysteresis = stability versus Resilience = resorption of scars

Statistical background: Age Period Cohort models Louis Statistical background: Age Period Cohort models Separate the effects of age, period of measurement and cohort. Problematic colinearity: cohort (date of birth) = period (date of measurement) - age (Ryder 1965, Mason et al. 1973, Mason / Fienberg 1985, Mason / Smith 1985, Yang Yang et al. 2006 2008, Smith 2008, Pampel 2012)

Louis Our method A: APCD APCD (detrended): are some cohorts above or below a linear trend of long-run economic growth? Basically, the APCD is a ‘bump detector’. STATA ssc install apcd => available ado file PLZ see more on www.louischauvel.org/apcdex.htm

apcd syntax is based on the glm : ssc install apcd apcd dep var control vars [if [weight], age(var) period(var) glm ptions All glm options including familyname Description -------------------------------------------------- gaussian Gaussian (normal) igaussian inverse Gaussian poisson Poisson etc linkname Description identity identity log log logit logit probit probit 18

http://www.louischauvel.org/vet.do A STATA example on Veterans (CPS extracts ipums) 1965-2015 N=322,243 use "http://www.louischauvel.org/apcgoex.dta", clear * race / 1=caucasian AA=2 * a5 / age * y5 / year * labincome / medianized labor personal income * pweight / sampling weight * vet / 1=veteran 0=no veteran satus * ED / level of education 6=drop out 7=ged 8=comunity coll ... 11=Ba 12=Ma+ * female / male=0 female = 1 * lnlab / ln of labincome keep if fem==0 & a5<65 gen ba=ED==11 | ED==12 ssc install apcd ssc install apcgo tab a5 y5 [w=pwei] , s(vet) nofr nost noobs w * are there non-linear variations of veterans by cohort? (% points)> apcd vet [w=pwei], age(a5) period(y5) drop *apc* * are there non-linear variations of veterans by cohort? (logit coeff)> stop * what is the share of veterans in a cohort? (% points)> apctlag vet [w=pwei], age(a5) period(y5) * what is the share of veterans in a cohort? (logit coeff)> apctlag vet [w=pwei], age(a5) period(y5) f(bin) l(logit) * what is the share of BA owners in a cohort? (% points) > apctlag ba [w=pwei], age(a5) period(y5) * what is the share of BA owners in a cohort?> apctlag ba [w=pwei], age(a5) period(y5) f(bin) l(logit) * how the veteran premium changed? apcgo lnlab [w=pwei], gap(vet) age(a5) period(y5) * what is the role of education in the veteran premium change? xi: apcgo lnlab i.ED if fem==0 [w=pwei], gap(vet) age(a5) period(y5) * with bootstrap confidence intervals (time consuming ! => rep(10) is minimalist but you can change...) apcgo lnlab [w=pwei], gap(vet) age(a5) period(y5) rep(10)

Ex: U.S. veterans in % of the male population (CPS ipums) 1965-2015 Period Age Cohort 1965 =? Cohort 1905 =? Cohort 1925 =WWII Cohort 1945 = Vietnam W SEE THE STORY IN: Alair MacLean and Meredith Kleykamp. 2016. “Income Inequality and the Veteran Experience.” Annals of the American Academy of Political and Social Science 663:99-116.

Ex: Veterans as % of the male population (CPS ipums) APCD model 1965-2015 Cohort 1965 =? Cohort 1905 =? Cohort 1925 =WWII Cohort 1945 = Vietnam W

Our method B: the larger APC family (with STATA ssc install ) Louis Our method B: the larger APC family (with STATA ssc install ) APCD (detrended): are some cohorts above or below a linear trend of long-run economic growth? Basically, the APCD is a ‘bump detector’. ssc install apcd APCTLAG (trended by cohort once average lagged age effect fitted): which cohort increased or declined. The program is a part of the ssc install apcgo APCGO (gap / Oaxaca): once controlled by other covariates, did the gap between group 0 and 1 changed. ssc install apcgo APCH (hystersis) is the cohort apcd effect bump durable or not over time Refinements to come (faster bootstraps, better controls, simplification, etc.)

APCT-lag (trended with lag) See Paper Online https://orbilu.uni.lu/bitstream/10993/35746/1/LIS%20WP%20gender%20gap%20final%20May%202018.pdf APC-Detrended as an identifiable solution of age, period and cohort non-linear effects (Chauvel, 2013, Chauvel and Schröder. 2014, Chauvel et al., 2016) b0 is the constant is a two-dimensional linear (=hyperplane) trend are 3 vectors of age, period and cohort fluctuations To solve the “identification problem” (a=p-c ), a meaningful constraint is needed: trend in aa = the average of the longitudinal shift observed in uapc

= [S (u(a+1, p+1, c) - uapc)] / [(A-1) (P-1)] See Paper Online The APC-lag solution = [S (u(a+1, p+1, c) - uapc)] / [(A-1) (P-1)] is the average longitudinal age effect along cohorts (= the average difference between u(a+1, p+1, c) and its cohort lag uapc across the table) Operator Trend for age coefficients: a APC-lag delivers a unique estimate of vector gc a cohort indexed measure of gaps Average gc is the general intensity of the gap Trend of gc measures increases/decreases of the gap in the window of observation Values of gc show possible non linearity The gc can be compared between countries

Ex: Veterans as % of the male population (CPS ipums) APCTLAG model 1965-2015 Cohort 1965 =? Cohort 1905 =? Cohort 1925 =WWII Cohort 1945 = Vietnam W

Ex: BA owners % of the male population (CPS ipums) APCTLAG model 1965-2015 Skyrocketing tuition and fees Cohort 1948 "Going to College to Avoid the Draft: The Unintended Legacy of the Vietnam War." (with Thomas Lemieux), American Economic Review 91, May 2001. Cohort 1925 =GI bill of rights

APC-GO (Gap/Oaxaca) model Now on Stata: ssc install apcgo APC-GO is a APC model to provide a cohort analysis in gaps in outcomes between 2 groups after controlling for relevant explanatory variables e.g. (gender) gaps in income net of education effects or (racial) gaps in education net of State/county effects Ingredients: Computation of Oaxaca decomposition in unexplained/explained gaps by A x P cell Estimate of APC-lag gaps with a focus on cohort Bootstrapping to obtain confidence intervals

Structure of data Age a indexed by a from 1 to A See Paper Online Lexis table / diagram: Age a indexed by a from 1 to A Period by p from 1 to P Cohort by c = p – a + A from 1 to C Cross-sectional surveys including one outcome y and controls x Condition: Large sample with data for each cell (APC) of the Lexis table c = p – a + A

Part II: APC-lag of the uapc See Paper Online APC-Detrended as an identifiable solution of age, period and cohort non-linear effects (Chauvel, 2013, Chauvel and Schröder. 2014, Chauvel et al., 2016) b0 is the constant is a two-dimensional linear (=hyperplane) trend are 3 vectors of age, period and cohort fluctuations To solve the “identification problem” (a=p-c ), a meaningful constraint is needed: trend in aa = the average of the longitudinal shift observed in uapc

Part II: APC-lag of the uapc See Paper Online The APC-lag solution = [S (u(a+1, p+1, c) - uapc)] / [(A-1) (P-1)] is the average longitudinal age effect along cohorts (= the average difference between u(a+1, p+1, c) and its cohort lag uapc across the table) Operator Trend for age coefficients: a APC-lag delivers a unique estimate of vector gc a cohort indexed measure of gaps Average gc is the general intensity of the gap Trend of gc measures increases/decreases of the gap in the window of observation Values of gc show possible non linearity The gc can be compared between countries

Summary APC-GO combines the different steps Oaxaca of the cells of the initial Lexis table data generates an aggregated Oaxaca Lexis table of measures of gaps unexplained by controls APC-lag of the Oaxaca Lexis table deliver notably gc coefficients Bootstrapping to obtain confidence intervals  See Stata ado file, ssc install apcgo

Implementation on different examples: Louis Implementation on different examples: Veterans and the veteran premium http://www.louischauvel.org/vet.do Suicide rates in a comparative perspective http://www.louischauvel.org/suicplosone.do Obesity epidemic http://www.louischauvel.org/apcobese.do and the ppt http://www.louischauvel.org/apc_obese.pptx

Ex: Veterans wage premium (diff of log) APCGO model (GO=Gap Oaxaca) 1965-2015 Ex: Veterans wage premium (diff of log) APCGO model (GO=Gap Oaxaca) WWII veterans premium >30% Cohort 1955 Premium<0 SEE THE STORY IN: Alair MacLean and Meredith Kleykamp. 2016. “Income Inequality and the Veteran Experience.” Annals of the American Academy of Political and Social Science 663:99-116.