SJTU CMGPD 2012 Methodological Lecture Day 9 Kinship
Ancestry identifiers Specific patrilineal ancestors In the Basic file… –FATHER_ID –GRANDFATHER_ID In the Kinship file… –F_ID_1 – same as FATHER_ID –F_ID_2 – same as GRANDFATHER_ID –F_ID_3 – Great-grandfather –F_ID_4 – Great-great-grandfather
Ancestry identifiers Specific patrilineal ancestors Wives of paternal ancestors –M_ID_1 – Mother Same as MOTHER_ID in Basic –M_ID_2 – Paternal grandmother Father’s mother (fm) –M_ID_3 – Paternal great-grandmother ffm –M_ID_4 – Paternal great-great-grandmother fffm
Ancestry identifiers Inferred ancestors Most identifiers refer to actual individuals observed in the dataset In some cases, the existence of a common ancestor whose death predated the earliest available register is inferred. –Based on relationship codes –Brothers in the earliest available register are inferred to have a common father. –Cousins in the earliest available register are inferred to have a common father. For grouping purposes, an identifier is assigned that doesn’t refer to anyone observed in the dataset –No corresponding PERSON_ID FATHER_ID_IMPUTED, GRANDFATHER_ID_IMPUTED are flags indicating that the IDs don’t refer to anyone observed in the dataset
Distributions of men by numbers of descendants
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 keep FATHER_ID keep if FATHER_ID != "-99" bysort FATHER_ID: generate sons = _N bysort FATHER_ID: keep if _n == 1 rename FATHER_ID PERSON_ID save Sons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 keep GRANDFATHER_ID keep if GRANDFATHER_ID != "-99" bysort GRANDFATHER_ID: generate grandsons = _N bysort GRANDFATHER_ID: keep if _n == 1 rename GRANDFATHER_ID PERSON_ID save Grandsons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\ Data.dta", keepusing(F_ID_3) keep(match master) keep F_ID_3 keep if F_ID_3 != "-99" replace F_ID_3 = substr(F_ID_3,3,.) bysort F_ID_3: generate ggrandsons = _N bysort F_ID_3: keep if _n == 1 rename F_ID_3 PERSON_ID save GGrandsons, replace
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\ Data.dta", keepusing(F_ID_4) keep(match master) keep F_ID_4 keep if F_ID_4 != "-99" replace F_ID_4 = substr(F_ID_4,3,.) bysort F_ID_4: generate gggrandsons = _N bysort F_ID_4: keep if _n == 1 rename F_ID_4 PERSON_ID save GGGrandsons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\ Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID (YEAR): keep if _n == 1 & YEAR <= 1810 keep PERSON_ID merge 1:1 PERSON_ID using Sons, keep(match master) replace sons = 0 if sons ==. drop _merge merge 1:1 PERSON_ID using Grandsons, keep(match master) replace grandsons = 0 if grandsons ==. drop _merge merge 1:1 PERSON_ID using GGrandsons, keep(match master) replace ggrandsons = 0 if ggrandsons ==. drop _merge merge 1:1 PERSON_ID using GGGrandsons, keep(match master) replace gggrandsons = 0 if gggrandsons ==. drop _merge
replace sons = 20 if sons >= 20 bysort sons: generate first_in_sons = _n == 1 bysort sons: generate sons_number = _N label variable sons_number "Sons" replace grandsons = 20 if grandsons >= 20 bysort grandsons: generate first_in_grandsons = _n == 1 bysort grandsons: generate grandsons_number = _N label variable grandsons_number "Grandsons" replace ggrandsons = 20 if ggrandsons >= 20 bysort ggrandsons: generate first_in_ggrandsons = _n == 1 bysort ggrandsons: generate ggrandsons_number = _N label variable ggrandsons_number "Great-grandsons" replace gggrandsons = 20 if gggrandsons >= 20 bysort gggrandsons: generate first_in_gggrandsons = _n == 1 bysort gggrandsons: generate gggrandsons_number = _N label variable gggrandsons_number "Great-great-grandsons" twoway line sons_number sons if first_in_sons, sort yscale(log) || line grandsons_number grandsons if first_in_grandsons, sort || line ggrandsons_number ggrandsons if first_in_ggrandsons, sort || line gggrandsons_number gggrandsons if first_in_gggrandsons, sort ||, scheme(s1mono) xtitle("Number of descendants") ytitle("Number of men") ylabel( )
Kinship variables for grouping Uses Controlling for kin group membership –Via random-effects models –Alongside village, household, other levels –Multiple levels are computationally demanding Often need tricks to collapse observations or otherwise reduce the dataset Computation of explanatory variables –Aggregate measures of kin network status to use as right-hand side variables Units of analysis in their own right –See yesterday
Kinship variables for grouping Ascending order of kin distance FOUNDER_ID –Descent from a common male ancestor in the registers FOUNDER_INFERRED_ID –Descent from a common male ancestor inferred from relationship codes in the earliest available register UNIQUE_YI_HU –Descent from members of the same yihu in the earliest available register UNIQUE_GROUP –Descent from members of the adjacent yihu with the same surname in the earliest available register
Numbers and average sizes of units UnitsObs. Per Unit Individuals Per. Unit FOUNDER_ID FOUNDER_INFERRED_ID* UNIQUE_YIHU UNIQUE_GROUP
Kinship variables for grouping FOUNDER_ID PERSON_ID of earliest male ancestor located in the registers. Most narrowly-defined grouping variable –Based on descent from a single observed individual. Many extinctions –Within one or two generations –Causes average size of groups defined by FOUNDER_ID to rise over time
bysort FOUNDER_ID: generate founder_id_obs = _N bysort FOUNDER_ID: generate first_in_founder_id = _n == 1 replace founder_id_obs = 200 if founder_id_obs > 200 histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_ID") fraction
bysort FOUNDER_ID YEAR: generate founder_id_obs_year = _N bysort FOUNDER_ID YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_ID") ylabel(0(2)12)
Kinship variables for grouping FOUNDER_ID_INFERRED Uses earliest available inferred ancestor –Based on relationship codes in earliest available register Useful for grouping records in earliest registers –Until 1789, relationships were to head of yihu, not linghu. –Allowed for inference of common ancestry Average size of groups defined by FOUNDER_ID_INFERRED increases over time because of extinction of smaller groups
bysort FOUNDER_INFERRED_ID: generate founder_id_obs = _N bysort FOUNDER_INFERRED_ID: generate first_in_founder_id = _n == 1 replace founder_id_obs = 200 if founder_id_obs > 200 histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_INFERRED_ID") fraction
bysort FOUNDER_INFERRED_ID YEAR: generate founder_id_obs_year = _N bysort FOUNDER_INFERRED_ID YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_INFERRED_ID") ylabel(0(2)12))
Kinship variables for grouping UNIQUE_YIHU Descendants of members of the same yihu in the earliest available register. Clusters are much larger than the ones defined by FOUNDER_ID or FOUNDER_INFERRED_ID
bysort UNIQUE_YI_HU YEAR: generate founder_id_obs_year = _N bysort UNIQUE_YI_HU YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_YI_HU") ylabel(0(5)60)
Kinship variables for grouping UNIQUE_GROUP Descendants of members of consecutive yihu in earliest available register who have same surname. Most stable over time in terms of size and number –Ideal for analysis of change over the long term
bysort UNIQUE_GROUP YEAR: generate founder_id_obs_year = _N bysort UNIQUE_GROUP YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_GROUP") ylabel(0(5)60)