CMGPD-LN Methodological Lecture Day 7 Health and Mortality.

Slides:



Advertisements
Similar presentations
EC220 - Introduction to econometrics (chapter 10)
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
From Anova to Regression: analyzing the effect on consumption of no. of persons in family Family consumption data family.dta E/Albert/Courses/cdas/appstat00/From.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Analysis of Clustered and Longitudinal Data Module 3 Linear Mixed Models (LMMs) for Clustered Data – Two Level Part A 1 Biostat 512: Module 3A - Kathy.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
SJTU CMGPD 2012 Methodological Lecture Day 2 TABLE, COLLAPSE, HISTOGRAM, TWOWAY BAR.
CMGPD-LN Methodological Lecture Day 1 Why Use Historical Data? Origins of the CMGPD-LN Basic Characteristics of the CMPGD-LN.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Ordered probit models.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
CMGPD-LN Methodological Lecture Day 7 Health and Mortality.
BINARY CHOICE MODELS: LOGIT ANALYSIS
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
SJTU CMGPD 2012 Methodological Lecture Day 9 Kinship.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
SJTU CMGPD Methodological Lecture Day 8 Family and contextual influences.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Key Data Management Tasks in Stata
SJTU CMGPD 2012 Methodological Lecture Recommended Acknowledgments Contemporary Applications of Historical Data Origins of the CMGPD-LN Key Features.
SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Count Models 1 Sociology 8811 Lecture 12
SJTU CMGPD 2012 Methodological Lecture Day 3 Position and Status Variables.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
SJTU CMGPD 2012 Methodological Lecture Day 1 (supplemental) Strengths and Weaknesses of the CMGPD-LN.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
2006 Chicago Actuarial Association Workshop Predictors of Exceptional Human Longevity Dr. Leonid A. Gavrilov, Ph.D. Dr. Natalia S. Gavrilova, Ph.D. Center.
Day 11 Methodological Lecture Migration. Measuring migration Create a event variable from comparison of unique values of UNIQUE_VILLAGE_ID Make sure to.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
QM222 Class 19 Section D1 Tips on your Project
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
CHAPTER 7 Linear Correlation & Regression Methods
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Problems with infinite solutions in logistic regression
Stata Basic Course Lab 4.
CMGPD-LN Methodological Lecture
CMGPD-LN Methodological Lecture Day 4
CMGPD-LN Methodological Lecture Day 3
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Introduction to Econometrics, 5th edition
Presentation transcript:

CMGPD-LN Methodological Lecture Day 7 Health and Mortality

Mortality outcomes Until age 75, recording of mortality appears plausible – Age patterns resemble other historical populations, model life tables After age 75, mortality record is problematic – Many immortals were taoding at some point, so for mortality analysis perhaps safest to throw out all records of anyone who was taoding Rates below age 5 appear normal, but representativeness of registered children is unclear Large numbers of deaths allow for fine-grained analysis of mortality determinants

. use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from > ICPSR\ICPSR_27063\DS0001\ Data.dta", clear (China Multi-Generational Panel Dataset, Liaoning (CMGPD-LN) >, , Liaoning). recode AGE_IN_SUI min/0=. 1/15=1 16/55=16 56/max=56 (AGE_IN_SUI: changes made). keep if NEXT_DIE >= 0 & NEXT_3 & PRESENT ( observations deleted). keep if SEX >= 1 (1 observation deleted). tab AGE_IN_SUI SEX if NEXT_DIE | Sex Age in Sui | Female Male | Total | 1,189 5,132 | 6, | 11,160 10,721 | 21, | 11,342 11,923 | 23, Total | 23,691 27,776 | 51,467

Analyzing mortality Life tables – Remember, ages are in sui – Probability of death in next three years ( 3 q x ) – Need to be converted to m x to put into a life table – One crude conversion: m x = -ln(1- 3 q x )/3 – More sophisticated conversions are appropriate at early ages when rates are changing fast Discrete-time event-history analysis – Logistic regression – Complementary log-log regression

Life tables A crude approach keep if AGE_IN_SUI > 0 & AGE_IN_SUI 0 * Divide into five year age groups replace AGE_IN_SUI = 5*int((AGE_IN_SUI- 1)/5)+1 tab AGE_IN_SUI SEX collapse NEXT_DIE, by(AGE_IN_SUI SEX) sort SEX AGE_IN_SUI

. tab AGE_IN_SUI SEX | Sex Age in Sui | Female Male | Total | 5,026 37,223 | 42,249 6 | 7,881 53,337 | 61, | 8,334 51,932 | 60, | 20,835 47,582 | 68, | 35,747 46,067 | 81, | 37,344 44,648 | 81, | 34,870 40,533 | 75, | 32,342 37,912 | 70, | 30,347 35,131 | 65, | 27,330 30,170 | 57, | 24,282 26,714 | 50, | 20,898 22,568 | 43, | 16,949 17,566 | 34, | 13,143 12,664 | 25, | 9,014 8,072 | 17, Total | 324, ,119 | 836,461

Example of a crude life table SEX AGE_IN_S UINEXT_DIEmx5pxlx Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female

Example of a crude life table SEXAGE_IN_SUINEXT_DIEmx5pxlxe Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male

Event-history analysis keep if AGE_IN_SUI > 0 & AGE_IN_SUI 0 replace AGE_IN_SUI = 5*int((AGE_IN_SUI- 1)/5)+1 xi:logit NEXT_DIE i.AGE_IN_SUI i.SEX i.REGION

NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _IAGE_IN_~_6 | _IAGE_IN_~11 | _IAGE_IN_~16 | _IAGE_IN_~21 | _IAGE_IN_~26 | _IAGE_IN_~31 | _IAGE_IN_~36 | _IAGE_IN_~41 | _IAGE_IN_~46 | _IAGE_IN_~51 | _IAGE_IN_~56 | _IAGE_IN_~61 | _IAGE_IN_~66 | _IAGE_IN_~71 | _ISEX_2 | _IREGION_2 | _IREGION_3 | _IREGION_4 | _cons |

Accounting for age and sex We generally analyze childhood, working ages, and old age separately – Since relevant variables vary, as do their effects We often, but not always, analyze males and females separately – Because effects of key variables may vary by sex Categorical variable for age group – See previous example Polynomial generate age2 = age^2 generate age3 = age^3 logit NEXT_DIE age age2 age3 Hybrid – Include age group categories and linear term for age – To capture variation in risks within age groups

Other notes on mortality analysis Since many of the ‘immortals’ were tao at some point in their life, maybe worthwhile to throw out observations of anyone who was ever tao, even if they aren’t tao right now. Regional differences in mortality rates suggest inclusion of REGION as a basic control variable.

Using the disability variables Basic contents Time trends Age patterns Working with the original disabilities – And positions…

Working with the original disabilities use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0003\ Data.dta" merge m:1 DATASET DISABILITY_CODE using "C:\Users\Cameron Campbe\Documents\Baqi\extracts\CMGPD-LN Disability for SJTU class",keep(match master) tab CONDITION_PINYIN, sort run "C:\Users\Cameron Campbe\Documents\Dropbox\Lee-Campbell group (Dropbox shares)\SJTU Dongbei Zhongxin\SJTU Summer Class\strip_disability.do“ tab new_CONDITION_PINYIN, sort generate byte lao_zheng = index(new_CONDITION_PINYIN,"lao zheng") > 0 tab lao_zheng

.do file to clean up generate new_CONDITION_PINYIN = CONDITION_PINYIN local for_removal " " foreach x of local for_removal { replace new_CONDITION_PINYIN = subinstr(new_CONDITION_PINYIN,"`x'","",.) }

. tab CONDITION_PINYIN, sort Disease | Freq. Percent Cum chen2 tao2 | 1, lao2 zheng4 | chen2 lao2 zheng4 | yan3 xia1 | chen2 xia1 | chen2 tao2 you3 an4 | can2 ji2 | tu3 xie3 | xia1 zi5 | tui3 que2 | tui3 tong4 | chen2 tui3 que2 | tui3 huai4 | er3 long2 | lao2 bing4 tu3 xie3 | yan3 ji2 | yao1 huai4 | lou4 chuang1 | lao3 tui4 | chen2 tu3 xie3 | xia1 yan3 yan3 ji2 | yang2 gao1 feng1 |

. tab new_CONDITION_PINYIN, sort new_CONDITION_PINYIN | Freq. Percent Cum chen tao | 1, lao zheng | chen lao zheng | yan xia | chen xia | can ji | chen tao you an | tu xie | xia zi | tui que | tui tong | chen tui que | tui huai | er long | lao bing tu xie | yan ji | yao huai | lou chuang | ge bo huai | lao tui |

. generate byte lao_zheng = index(new_CONDITION_PINYIN,"lao zheng") > 0. tab lao_zheng lao_zheng | Freq. Percent Cum | 1,511, | 1, Total | 1,513,

Preceding birth interval use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear drop if MOTHER_ID == "-99" | BIRTHYEAR < 0 | (SEX == 1 & MARITAL_STATUS != 2) bysort PERSON_ID: keep if _n == 1 bysort MOTHER_ID (BIRTHYEAR): generate pbi = BIRTHYEAR - BIRTHYEAR[_n-1] bysort MOTHER_ID (BIRTHYEAR): generate firstborn = _n == 1 * Basically force firstborn and twin into separate categories represented by the dummy variables bysort MOTHER_ID (BIRTHYEAR): replace pbi = 0 if firstborn recode pbi 15/max=15 tab pbi keep PERSON_ID pbi firstborn save pbi

pbi | Freq. Percent Cum | 76, | 4, | 10, | 9, | 7, | 6, | 5, | 4, | 3, | 3, | 2, | 2, | 1, | 1, | 1, | 7, Total | 147,

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge m:1 PERSON_ID using pbi, keep(match master) keep if SEX == 2 bysort PERSON_ID (YEAR): keep if AGE_IN_SUI[1] > 0 & AGE_IN_SUI[1] <= 10 keep if AT_RISK_DIE == 1 & NEXT_3 == 1 & PRESENT == 1 generate short_pbi = firstborn == 0 & (pbi == 0 | pbi == 1 | pbi == 2) generate age_group = 1+5*int((AGE_IN_SUI-1)/5) xi:clogit NEXT_DIE i.age_group firstborn short_pbi if age_group >= 56 & age_group <= 75, group(MOTHER_ID)

. xi:clogit NEXT_DIE i.age_group firstborn short_pbi if age_group >= 56 & age_group <= 75, group(MOTHER_ID) i.age_group _Iage_group_1-166 (naturally coded; _Iage_group_1 omitted) note: multiple positive outcomes within groups encountered. note: 8860 groups (19131 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 9902 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gro~_6 | (omitted) … _Iage_gro~51 | (omitted) _Iage_gr~_56 | _Iage_gro~61 | _Iage_gr~_66 | _Iage_gro~71 | (omitted) … _Iage_gr~166 | (omitted) firstborn | short_pbi |

Age at which father last seen alive use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0003\ Data.dta" keep if SEX == 2 bysort PERSON_ID (YEAR): keep if AGE_IN_SUI[1] = 1 drop if FATHER_ALIVE < 0 drop if AGE_IN_SUI < 0 bysort PERSON_ID (FATHER_ALIVE YEAR): generate father_last_alive = AGE_IN_SUI[_N] bysort PERSON_ID (FATHER_ALIVE YEAR): replace father_last_alive = 0 if FATHER_ALIVE[_N] == 0 recode father_last_alive 1/5=1 6/10=6 11/15=11 16/max=16 generate ever_married = MARITAL_STATUS != 2 tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI <= 30, sum(ever_married) tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI = 0, sum(HAS_POSITION)

. tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI <= 30, sum(ever_married) father_last | Summary of ever_married _alive | Mean Std. Dev. Freq | | | | | Total |

. tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI = 0, sum(HAS_POSITION) father_last | Summary of Has Official Position _alive | Mean Std. Dev. Freq | | | | | Total |

Another approach to identifying age at last time father was observed use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear keep if SEX == 2 & PRESENT == 1 & AGE_IN_SUI > 0 bysort PERSON_ID (YEAR): keep if _n == _N keep PERSON_ID YEAR rename PERSON_ID FATHER_ID rename YEAR father_last_year save father_last_year, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear keep if FATHER_ID != "-99" keep if SEX == 2 merge m:1 FATHER_ID using father_last_year, keep(match master) drop if father_last_year ==. keep if BIRTHYEAR > 0 generate age_at_father_last_year = father_last_year - BIRTHYEAR recode age_at_father_last_year min/-11= /0=0 1/5=1 6/10=6 11/15=11 16/max=16 tab age_at_father_last_year if HAS_POSITION >= 0 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(HAS_POSITION) generate ever_married = MARITAL_STATUS != 2 tab age_at_father_last_year if MARITAL_STATUS >= 1 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(ever_married)

. tab age_at_father_last_year age_at_fath | er_last_yea | r | Freq. Percent Cum | 26, | 37, | 53, | 70, | 82, | 556, Total | 828,

. tab age_at_father_last_year if HAS_POSITION >= 0 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(HAS_POSITION) age_at_fath | er_last_yea | Summary of Has Official Position r | Mean Std. Dev. Freq | | | | | | Total |

tab age_at_father_last_year if MARITAL_STATUS >= 1 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(ever_married) age_at_fath | er_last_yea | Summary of ever_married r | Mean Std. Dev. Freq | | | | | | Total |

Prices around time of birth use "C:\Users\Cameron Campbe\Documents\Baqi\prices\Annual logged low sorghum.dta" rename YEAR BIRTHYEAR sort BIRTHYEAR generate allosorg5 = allosorg[_n-2]+allosorg[_n- 1]+allosorg+allosorg[_n+1]+allosorg[_n+2] save "Logged low sorghum prices around time of birthyear“ use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge m:1 BIRTHYEAR using "C:\Users\Cameron Campbe\Documents\Baqi\prices\Logged low sorghum prices around time of birthyear", keep(match master) generate age_group = 5*int((AGE_IN_SUI-1)/5)+1 keep if PRESENT == 1 & NEXT_3 == 1 & AT_RISK_DIE == 1 & AGE_IN_SUI >= 1 xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 2 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75 xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 1 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75

i.age_group _Iage_group_1-201 (naturally coded; _Iage_group_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gro~_6 | (omitted) _Iage_gr~_51 | (omitted) _Iage_gr~_56 | _Iage_gr~_61 | _Iage_gr~_66 | _Iage_gro~71 | (omitted) _Iage_gr~201 | (omitted) allosorg5 | _cons |

xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 2 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75 i.age_group _Iage_group_1-201 (naturally coded; _Iage_group_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gr~_56 | _Iage_gr~_61 | _Iage_gr~_66 | allosorg5 | _cons |