Introduction to Cohort Analysis PRI SUMMER METHODS WORKSHOP June 16, 2008 Glenn Firebaugh.

Slides:



Advertisements
Similar presentations
Cross Sectional Designs
Advertisements

Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
DEMOGRAPHIC TRANSITION AND ECONOMIC DEVELOPMENT AT THE LOCAL LEVEL IN BRAZIL Ernesto F. L. Amaral Advisor: Dr. Joseph E. Potter Population Research Center.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Chapter 10 Curve Fitting and Regression Analysis
Regression What is regression to the mean?
Moderation: Assumptions
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
The Use and Interpretation of the Constant Term
Models with Discrete Dependent Variables
Linear Regression.
Multiple Linear Regression Model
Chapter 6 Women at Work Outline of Chapter: 1) Review employment trends. 2) Discuss various reasons for observed trends. 3) Note current employment differences.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Chapter 4 Multiple Regression.
Clustered or Multilevel Data
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
One-Way Analysis of Covariance One-Way ANCOVA. ANCOVA Allows you to compare mean differences in 1 or more groups with 2+ levels (just like a regular ANOVA),
1 Life Cycle Effects of Fertility on Parents’ Labor Supply James P. Vere University of Hong Kong January 16, 2007.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Regression Diagnostics Checking Assumptions and Data.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Autocorrelation Lecture 18 Lecture 18.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
1 MA 1128: Lecture 09 – 6/08/15 Solving Systems of Linear Equations.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression
Moderation: Introduction
Shih-Fan Lin 1, DrPH. Brian K. Finch 1, Ph.D. Audrey N. Beck 1, Ph.D. Robert A. Hummer 2, Ph.D. Ryan K. Masters 3, Ph.D.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Has Public Health Insurance for Older Children Reduced Disparities in Access to Care and Health Outcomes? Janet Currie, Sandra Decker, and Wanchuan Lin.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 27 Time Series.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
HAOMING LIU JINLI ZENG KENAN ERTUNC GENETIC ABILITY AND INTERGENERATIONAL EARNINGS MOBILITY 1.
Do Individual Accounts Postpone Retirement? Evidence from Chile Alejandra C. Edwards and Estelle James.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
Christopher Dougherty EC220 - Introduction to econometrics (revision lectures 2011) Slideshow: autocorrelation Original citation: Dougherty, C. (2011)
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 1: Testing the Overall Model Spring, 2009.
General Linear Model.
9.2 Linear Regression Key Concepts: –Residuals –Least Squares Criterion –Regression Line –Using a Regression Equation to Make Predictions.
Education 795 Class Notes Factor Analysis Note set 6.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
DEPARTMENT OF MECHANICAL ENGINEERING VII-SEMESTER PRODUCTION TECHNOLOGY-II 1 CHAPTER NO.4 FORECASTING.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
A Mechanism-Based Approach to the Identification of Age, Period, Cohort Models Christopher Winship and David J. Harding June 2008.
AUTOCORRELATION 1 Assumption B.5 states that the values of the disturbance term in the observations in the sample are generated independently of each other.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
1 The Training Benefits Program – A Methodological Exposition To: The Research Coordination Committee By: Jonathan Adam Lind Date: 04/01/16.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Regression and Correlation of Data Correlation: Correlation is a measure of the association between random variables, say X and Y. No assumption that one.
Parallel Lines and Slope
CH 5: Multivariate Methods
More on Specification and Data Issues
Prepared by Lee Revere and John Large
Nonlinear Fitting.
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Presentation transcript:

Introduction to Cohort Analysis PRI SUMMER METHODS WORKSHOP June 16, 2008 Glenn Firebaugh

Cohort Analysis Objective -- To separate the effects of: Age (aging/maturation, life cycle status) Age (aging/maturation, life cycle status) Period (historical conditions that affect everyone) Period (historical conditions that affect everyone) Birth Cohort (each cohort experiences “a distinctive slice of history” - Ryder) Birth Cohort (each cohort experiences “a distinctive slice of history” - Ryder) - key notion of “imprinting” during impressionable years; imprinting may result in period-age interaction effects that create cohort differences which persist over time

Cohort Analysis Problem: Linear dependence Age (years since birth) = Period (current year) – Cohort (year of birth) Age (years since birth) = Period (current year) – Cohort (year of birth) So you cannot estimate the linear equation Y = α + β A Age + β P Period + β C Cohort + ε

Cohort Analysis There are various ways to think about the problem. One useful way -- as a problem of multicollinearity: r AC. P = -1.0 r AC. P = -1.0 r AP. C = +1.0 r AP. C = +1.0 r PC. A = +1.0 r PC. A = +1.0

Cohort Analysis r AC. P = -1: At a given point in time, everyone lies on the diagonal line for age by birth year:

Cohort Analysis What to do? Replace A with A*, then r A*C. P ≠ -1

Cohort Analysis In effect, we have replaced A with A*, a nonlinear function of A, where r A*C. P ≠ -1. The correlation r A*C. P still is close to 1.0  large standard errors, unless N is large The correlation r A*C. P still is close to 1.0  large standard errors, unless N is large What we are assuming is that, for the Y of interest, A* captures the age effect as well as does actual age A. What we are assuming is that, for the Y of interest, A* captures the age effect as well as does actual age A. Possible example: age and voting. Voting increases with age until some age threshold where it levels off due to declining health and mobility. In this approach, some set of parameters constrained to be equal. Possible example: age and voting. Voting increases with age until some age threshold where it levels off due to declining health and mobility. In this approach, some set of parameters constrained to be equal.

Cohort Analysis Strategy 1 – Transformed Variables Method: Identification by assuming equivalence of adjacent categories of A, C, or P to create A*, C*, or P*, respectively. Example: A (age in years)  A* (collapsed/recoded A)  Y A (age in years)  A* (collapsed/recoded A)  Y Because A has no direct effect on Y, net of A*, to get the age effect we can simply estimate effect of A* on Y (and A* is not linearly dependent with P and C).

Cohort Analysis Observations about Transformed Variables Method : Often C is the variable that is collapsed (e.g. “depression cohort,” “baby boomers,” etc.) Often C is the variable that is collapsed (e.g. “depression cohort,” “baby boomers,” etc.) Extreme case: collapse all the categories of A, P, or C. That’s what researchers do in effect when they omit A, P, or C (i.e., assume no effect for one of them). Extreme case: collapse all the categories of A, P, or C. That’s what researchers do in effect when they omit A, P, or C (i.e., assume no effect for one of them). Collapsing adjacent categories to create A*, P* and C* all goes back to “moving cases off the linear regression line” for r AC. P etc. Collapsing adjacent categories to create A*, P* and C* all goes back to “moving cases off the linear regression line” for r AC. P etc.

Cohort Analysis Age by cohort figure where cohort categories are collapsed ( r A*C. P ≠ -1 ):

Cohort Analysis Strategy 2 – Proxy Variables Method: Avoid linear dependence by substituting A**, P** or C** for A, P, or C, where ** measures capture what it is about age, period, or cohort that matters. Common example: Cohort size for C. Used in labor market studies where, e.g., wage is thought to depend on one’s age (hump-shaped), period, and the size of one’s birth cohort (C**). Common example: Cohort size for C. Used in labor market studies where, e.g., wage is thought to depend on one’s age (hump-shaped), period, and the size of one’s birth cohort (C**). Unlike A*-P*-C* measures, A**-P**-C** measures are not recoded functions of A, P and C Unlike A*-P*-C* measures, A**-P**-C** measures are not recoded functions of A, P and C

Same underlying assumption for both strategies Assumption: That A has no effect on Y net of A*, or that C has no effect on Y net of C*, or P has no effect on Y net of P*. Similarly for the proxy variables A**, C** and P**. The idea in both cases is that at least one * or ** variable must mediate all the effect. (Note parallels with Winship-Harding approach – for both * and ** methods.) For example, C*: Are we capturing all the cohort effect when we assume no effect within some range of birth years? Or, by collapsing birth years, are we simply identifying by adding measurement error? For example, C*: Are we capturing all the cohort effect when we assume no effect within some range of birth years? Or, by collapsing birth years, are we simply identifying by adding measurement error? C**: Are we capturing all the cohort effect when we use cohort size? C**: Are we capturing all the cohort effect when we use cohort size?

Natural experiments as a promising method (where possible) Are there instances “in nature” where, say, age and cohort effects are uncoupled? Consider voting & 19 th Amendment (“enduring effect of disenfranchisement”) : Best predictor of voting at time t is whether you voted at t-1, so learning to vote early matters. Best predictor of voting at time t is whether you voted at t-1, so learning to vote early matters. But women couldn’t vote in most states before 1920  different cohort experiences than men the same age But women couldn’t vote in most states before 1920  different cohort experiences than men the same age Sex is randomly assigned by family SES, etc. Sex is randomly assigned by family SES, etc. A “natural experiment” – compare sex differences in voting rates for men and women who came of age pre- and post-19 th Amendment (Firebaugh-Chen AJS 1995) A “natural experiment” – compare sex differences in voting rates for men and women who came of age pre- and post-19 th Amendment (Firebaugh-Chen AJS 1995)