CMGPD-LN Methodological Lecture Day 7 Health and Mortality
Mortality outcomes Until age 75, recording of mortality appears plausible – Age patterns resemble other historical populations, model life tables After age 75, mortality record is problematic – Many immortals were taoding at some point, so for mortality analysis perhaps safest to throw out all records of anyone who was taoding Rates below age 5 appear normal, but representativeness of registered children is unclear Large numbers of deaths allow for fine-grained analysis of mortality determinants
. use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from > ICPSR\ICPSR_27063\DS0001\ Data.dta", clear (China Multi-Generational Panel Dataset, Liaoning (CMGPD-LN) >, , Liaoning). recode AGE_IN_SUI min/0=. 1/15=1 16/55=16 56/max=56 (AGE_IN_SUI: changes made). keep if NEXT_DIE >= 0 & NEXT_3 & PRESENT ( observations deleted). keep if SEX >= 1 (1 observation deleted). tab AGE_IN_SUI SEX if NEXT_DIE | Sex Age in Sui | Female Male | Total | 1,189 5,132 | 6, | 11,160 10,721 | 21, | 11,342 11,923 | 23, Total | 23,691 27,776 | 51,467
Analyzing mortality Life tables – Remember, ages are in sui – Probability of death in next three years ( 3 q x ) – Need to be converted to m x to put into a life table – One crude conversion: m x = -ln(1- 3 q x )/3 – More sophisticated conversions are appropriate at early ages when rates are changing fast Discrete-time event-history analysis – Logistic regression – Complementary log-log regression
Life tables A crude approach keep if AGE_IN_SUI > 0 & AGE_IN_SUI 0 * Divide into five year age groups replace AGE_IN_SUI = 5*int((AGE_IN_SUI- 1)/5)+1 tab AGE_IN_SUI SEX collapse NEXT_DIE, by(AGE_IN_SUI SEX) sort SEX AGE_IN_SUI
. tab AGE_IN_SUI SEX | Sex Age in Sui | Female Male | Total | 5,026 37,223 | 42,249 6 | 7,881 53,337 | 61, | 8,334 51,932 | 60, | 20,835 47,582 | 68, | 35,747 46,067 | 81, | 37,344 44,648 | 81, | 34,870 40,533 | 75, | 32,342 37,912 | 70, | 30,347 35,131 | 65, | 27,330 30,170 | 57, | 24,282 26,714 | 50, | 20,898 22,568 | 43, | 16,949 17,566 | 34, | 13,143 12,664 | 25, | 9,014 8,072 | 17, Total | 324, ,119 | 836,461
Example of a crude life table SEX AGE_IN_S UINEXT_DIEmx5pxlx Female Female Female Female Female Female Female Female Female Female Female Female Female Female Female
Example of a crude life table SEXAGE_IN_SUINEXT_DIEmx5pxlxe Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male
Event-history analysis keep if AGE_IN_SUI > 0 & AGE_IN_SUI 0 replace AGE_IN_SUI = 5*int((AGE_IN_SUI- 1)/5)+1 xi:logit NEXT_DIE i.AGE_IN_SUI i.SEX i.REGION
NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _IAGE_IN_~_6 | _IAGE_IN_~11 | _IAGE_IN_~16 | _IAGE_IN_~21 | _IAGE_IN_~26 | _IAGE_IN_~31 | _IAGE_IN_~36 | _IAGE_IN_~41 | _IAGE_IN_~46 | _IAGE_IN_~51 | _IAGE_IN_~56 | _IAGE_IN_~61 | _IAGE_IN_~66 | _IAGE_IN_~71 | _ISEX_2 | _IREGION_2 | _IREGION_3 | _IREGION_4 | _cons |
Accounting for age and sex We generally analyze childhood, working ages, and old age separately – Since relevant variables vary, as do their effects We often, but not always, analyze males and females separately – Because effects of key variables may vary by sex Categorical variable for age group – See previous example Polynomial generate age2 = age^2 generate age3 = age^3 logit NEXT_DIE age age2 age3 Hybrid – Include age group categories and linear term for age – To capture variation in risks within age groups
Other notes on mortality analysis Since many of the ‘immortals’ were tao at some point in their life, maybe worthwhile to throw out observations of anyone who was ever tao, even if they aren’t tao right now. Regional differences in mortality rates suggest inclusion of REGION as a basic control variable.
Using the disability variables Basic contents Time trends Age patterns Working with the original disabilities – And positions…
Working with the original disabilities use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0003\ Data.dta" merge m:1 DATASET DISABILITY_CODE using "C:\Users\Cameron Campbe\Documents\Baqi\extracts\CMGPD-LN Disability for SJTU class",keep(match master) tab CONDITION_PINYIN, sort run "C:\Users\Cameron Campbe\Documents\Dropbox\Lee-Campbell group (Dropbox shares)\SJTU Dongbei Zhongxin\SJTU Summer Class\strip_disability.do“ tab new_CONDITION_PINYIN, sort generate byte lao_zheng = index(new_CONDITION_PINYIN,"lao zheng") > 0 tab lao_zheng
.do file to clean up generate new_CONDITION_PINYIN = CONDITION_PINYIN local for_removal " " foreach x of local for_removal { replace new_CONDITION_PINYIN = subinstr(new_CONDITION_PINYIN,"`x'","",.) }
. tab CONDITION_PINYIN, sort Disease | Freq. Percent Cum chen2 tao2 | 1, lao2 zheng4 | chen2 lao2 zheng4 | yan3 xia1 | chen2 xia1 | chen2 tao2 you3 an4 | can2 ji2 | tu3 xie3 | xia1 zi5 | tui3 que2 | tui3 tong4 | chen2 tui3 que2 | tui3 huai4 | er3 long2 | lao2 bing4 tu3 xie3 | yan3 ji2 | yao1 huai4 | lou4 chuang1 | lao3 tui4 | chen2 tu3 xie3 | xia1 yan3 yan3 ji2 | yang2 gao1 feng1 |
. tab new_CONDITION_PINYIN, sort new_CONDITION_PINYIN | Freq. Percent Cum chen tao | 1, lao zheng | chen lao zheng | yan xia | chen xia | can ji | chen tao you an | tu xie | xia zi | tui que | tui tong | chen tui que | tui huai | er long | lao bing tu xie | yan ji | yao huai | lou chuang | ge bo huai | lao tui |
. generate byte lao_zheng = index(new_CONDITION_PINYIN,"lao zheng") > 0. tab lao_zheng lao_zheng | Freq. Percent Cum | 1,511, | 1, Total | 1,513,
Preceding birth interval use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear drop if MOTHER_ID == "-99" | BIRTHYEAR < 0 | (SEX == 1 & MARITAL_STATUS != 2) bysort PERSON_ID: keep if _n == 1 bysort MOTHER_ID (BIRTHYEAR): generate pbi = BIRTHYEAR - BIRTHYEAR[_n-1] bysort MOTHER_ID (BIRTHYEAR): generate firstborn = _n == 1 * Basically force firstborn and twin into separate categories represented by the dummy variables bysort MOTHER_ID (BIRTHYEAR): replace pbi = 0 if firstborn recode pbi 15/max=15 tab pbi keep PERSON_ID pbi firstborn save pbi
pbi | Freq. Percent Cum | 76, | 4, | 10, | 9, | 7, | 6, | 5, | 4, | 3, | 3, | 2, | 2, | 1, | 1, | 1, | 7, Total | 147,
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge m:1 PERSON_ID using pbi, keep(match master) keep if SEX == 2 bysort PERSON_ID (YEAR): keep if AGE_IN_SUI[1] > 0 & AGE_IN_SUI[1] <= 10 keep if AT_RISK_DIE == 1 & NEXT_3 == 1 & PRESENT == 1 generate short_pbi = firstborn == 0 & (pbi == 0 | pbi == 1 | pbi == 2) generate age_group = 1+5*int((AGE_IN_SUI-1)/5) xi:clogit NEXT_DIE i.age_group firstborn short_pbi if age_group >= 56 & age_group <= 75, group(MOTHER_ID)
. xi:clogit NEXT_DIE i.age_group firstborn short_pbi if age_group >= 56 & age_group <= 75, group(MOTHER_ID) i.age_group _Iage_group_1-166 (naturally coded; _Iage_group_1 omitted) note: multiple positive outcomes within groups encountered. note: 8860 groups (19131 obs) dropped because of all positive or all negative outcomes. Conditional (fixed-effects) logistic regression Number of obs = 9902 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gro~_6 | (omitted) … _Iage_gro~51 | (omitted) _Iage_gr~_56 | _Iage_gro~61 | _Iage_gr~_66 | _Iage_gro~71 | (omitted) … _Iage_gr~166 | (omitted) firstborn | short_pbi |
Age at which father last seen alive use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0003\ Data.dta" keep if SEX == 2 bysort PERSON_ID (YEAR): keep if AGE_IN_SUI[1] = 1 drop if FATHER_ALIVE < 0 drop if AGE_IN_SUI < 0 bysort PERSON_ID (FATHER_ALIVE YEAR): generate father_last_alive = AGE_IN_SUI[_N] bysort PERSON_ID (FATHER_ALIVE YEAR): replace father_last_alive = 0 if FATHER_ALIVE[_N] == 0 recode father_last_alive 1/5=1 6/10=6 11/15=11 16/max=16 generate ever_married = MARITAL_STATUS != 2 tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI <= 30, sum(ever_married) tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI = 0, sum(HAS_POSITION)
. tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI <= 30, sum(ever_married) father_last | Summary of ever_married _alive | Mean Std. Dev. Freq | | | | | Total |
. tab father_last_alive if SEX == 2 & AGE_IN_SUI >= 26 & AGE_IN_SUI = 0, sum(HAS_POSITION) father_last | Summary of Has Official Position _alive | Mean Std. Dev. Freq | | | | | Total |
Another approach to identifying age at last time father was observed use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear keep if SEX == 2 & PRESENT == 1 & AGE_IN_SUI > 0 bysort PERSON_ID (YEAR): keep if _n == _N keep PERSON_ID YEAR rename PERSON_ID FATHER_ID rename YEAR father_last_year save father_last_year, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear keep if FATHER_ID != "-99" keep if SEX == 2 merge m:1 FATHER_ID using father_last_year, keep(match master) drop if father_last_year ==. keep if BIRTHYEAR > 0 generate age_at_father_last_year = father_last_year - BIRTHYEAR recode age_at_father_last_year min/-11= /0=0 1/5=1 6/10=6 11/15=11 16/max=16 tab age_at_father_last_year if HAS_POSITION >= 0 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(HAS_POSITION) generate ever_married = MARITAL_STATUS != 2 tab age_at_father_last_year if MARITAL_STATUS >= 1 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(ever_married)
. tab age_at_father_last_year age_at_fath | er_last_yea | r | Freq. Percent Cum | 26, | 37, | 53, | 70, | 82, | 556, Total | 828,
. tab age_at_father_last_year if HAS_POSITION >= 0 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(HAS_POSITION) age_at_fath | er_last_yea | Summary of Has Official Position r | Mean Std. Dev. Freq | | | | | | Total |
tab age_at_father_last_year if MARITAL_STATUS >= 1 & AGE_IN_SUI >= 31 & AGE_IN_SUI <= 35, sum(ever_married) age_at_fath | er_last_yea | Summary of ever_married r | Mean Std. Dev. Freq | | | | | | Total |
Prices around time of birth use "C:\Users\Cameron Campbe\Documents\Baqi\prices\Annual logged low sorghum.dta" rename YEAR BIRTHYEAR sort BIRTHYEAR generate allosorg5 = allosorg[_n-2]+allosorg[_n- 1]+allosorg+allosorg[_n+1]+allosorg[_n+2] save "Logged low sorghum prices around time of birthyear“ use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta", clear merge m:1 BIRTHYEAR using "C:\Users\Cameron Campbe\Documents\Baqi\prices\Logged low sorghum prices around time of birthyear", keep(match master) generate age_group = 5*int((AGE_IN_SUI-1)/5)+1 keep if PRESENT == 1 & NEXT_3 == 1 & AT_RISK_DIE == 1 & AGE_IN_SUI >= 1 xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 2 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75 xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 1 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75
i.age_group _Iage_group_1-201 (naturally coded; _Iage_group_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gro~_6 | (omitted) _Iage_gr~_51 | (omitted) _Iage_gr~_56 | _Iage_gr~_61 | _Iage_gr~_66 | _Iage_gro~71 | (omitted) _Iage_gr~201 | (omitted) allosorg5 | _cons |
xi:logit NEXT_DIE i.age_group allosorg5 if SEX == 2 & AGE_IN_SUI >= 56 & AGE_IN_SUI <= 75 i.age_group _Iage_group_1-201 (naturally coded; _Iage_group_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = NEXT_DIE | Coef. Std. Err. z P>|z| [95% Conf. Interval] _Iage_gr~_56 | _Iage_gr~_61 | _Iage_gr~_66 | allosorg5 | _cons |