CMGPD-LN Methodological Lecture Day 2 Strengths and Weaknesses of the CMGPD-LN
Historical population databases Parish registers Genealogies Censuses Household registers
Table A.1. Comparison of features of sources for historical demography Parish registers Vital statistics Censuses Genealogies Household registers Longitudinal X Individual- level Detail on households Geographic specificity Complete community Population at risk Timing of vital events
CMGPD-LN Relative Strengths Household and village of residence Not available in genealogies, parish registers Longitudinal Not available in censuses Complete recording of the at-risk population Not available in parish registers Time-depth/Multigenerational Not available in most household registers Kinship Genealogies typically only record a single descent group Prospective Genealogies are retrospective
Omission of boys who died in infancy and early childhood CMGPD-LN Limitations Omission of boys who died in infancy and early childhood Can’t really do infant or early child mortality Underestimate fertility Omission of daughters No non-state occupations, or landholding Landholding will be able in Shuangcheng (CMGPD-SC) Fate and Fortune in Rural China. Emphasize that only males are studied 10
Average numbers of boys and girls born in next 3 years to married men aged 15-50
CMGPD-LN Limitations Missing registers Event-history analysis limited to registers for which immediately following register is also available Unrecorded deaths A small % of individuals who were probably dead, were carried on alive from register to register as if they were alive Creates problems at advanced (80+) ages
Using the Data RECORD_NUMBER RECORD_NUMBER identifies the same observation across the different datasets Use as the basis for one-to-one merge local cmgpd_ln_location "..\CMGPD-LN from ICPSR\ICPSR_27063“ use "`cmgpd_ln_location'\DS0001\27063-0001-Data“ merge 1:1 RECORD_NUMBER using "`cmgpd_ln_location'\DS0003\27063-0003-Data"
Using the Data RECORD_NUMBER If the merged datasets won’t fit into memory, make use of options on use and merge to load specific variables use RECORD_ID YEAR SEX using "`cmgpd_ln_location'\DS0001\27063-0001-Data“ merge 1:1 RECORD_NUMBER using "`cmgpd_ln_location'\DS0003\27063-0003-Data“, keepusing(NON_HAN_NAME) tab YEAR if SEX == 2, sum(NON_HAN_NAME)
Using the Data Missing Values Following standard practice, missing values are coded as -98 or -99 -98 is structural missing -99 is missing These are not the same as STATA missing, so observations will not be excluded automatically Especially in regressions, computations of means, etc., either manually exclude these, or recode to force exclusion recode ZHI_SHI_REN -99 -98=. or summ ZHI_SHI_REN if ZHI_SHI_REN != -98 & ZHI_SHI_REN != -99