Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Quality Issues and Adjustments in the Human Mortality Database

Similar presentations


Presentation on theme: "Data Quality Issues and Adjustments in the Human Mortality Database"— Presentation transcript:

1 Data Quality Issues and Adjustments in the Human Mortality Database
Longevity 12 conference, September 2016, Chicago Data Quality Issues and Adjustments in the Human Mortality Database Magali Barbieri HMD Associate Director University of California, Berkeley, and French National Institute for Demographic Studies (INED) Acknowledgement: this presentation is based on the work conducted by many members of the HMD team over the years, at both the University of California, Berkeley, and the Max Planck Institute for Demographic Research (MPIDR), Rostock.

2 What is the Human Mortality Database? www.mortality.org
A database of historical mortality series with Death counts, population counts and estimated population exposures (person-years lived) at the finest detail possible Original estimates of age-specific death rates and other period and cohort life table functions in various formats (of age and time) for 38 national populations with high-quality demographic data

3 The 38 HMD countries Australia Finland Latvia Slovenia Austria France
Lithuania Spain Belarus Germany Luxembourg Sweden Belgium Greece Netherlands Switzerland Bulgaria Hungary New Zealand Taiwan Canada Iceland Norway Ukraine Chile Ireland Poland United States Czech Republic Israel Portugal United Kingdom Denmark Italy Russia Estonia Japan Slovakia

4 HMD series by country and time period
Shortest series = Chile and longest = Sweden

5 HMD input data All HMD quantities are derived from: Vital statistics
Death counts Birth counts Census data Population census Official population estimates

6 The HMD ideal raw data Detailed vital statistics and population data with complete and consistent coverage, perfect quality, timely publication and freely available Mortality data: death counts by year, sex, single year of age up to maximum age at death and year of birth, with timely registration Birth data: live births by year, sex, and month, with timely registration Population data: annual January 1st population estimates by sex and single year of age up to maximum age, with constant definition Perfect quality = perfect age reporting (age at last birthday) and no unknown age in the mortality and population data

7 Basic methods Life table calculations based on:
Death counts by sex and Lexis triangle Exposure counts by sex and Lexis triangle Methodological steps with ideal raw data Construct exposure counts by sex and Lexis triangle from births, deaths and annual population estimates Compute death rates (deaths/exposure) by sex Compute complete life tables from the death rates

8 Few country/years with perfect data
Annual population estimates by single year of age up to maximum age Death counts by Lexis triangle up to maximum age Sweden Since 1992 Denmark Since 1976 Since 1943 Norway Since 1846 Since 1980 Finland Since 1995 Since 1917 Iceland Since 1840 Since 1981 Northern European countries = best data in terms of the details provided but even there, historical data not as detailed as more recent data

9 Data challenges Availability Details Definitions Accuracy

10 Data challenges: Availability
Availability and timeliness of input data Publication delays (e.g. CAN) Gaps in data series (e.g. Missing deaths for BEL ; No population for CHL >2002) Lack of annual population estimates for many countries/periods

11 Data challenges: Details
Granularity of available data Single year, 5-year or 10-year age group mortality and population data rather than Lexis triangle Diversity of mortality data “shapes” (LT/UT, RR, VH, VV, RV …) HMD CSs have developed close connections with staff of national statistics offices which allow us to obtain special tabulations of vital events or population data not available to the public

12 A variety of shapes in the original data
Age Source: Tim Riffe. Time

13 Data challenges: Details
Granularity of available data Single year, 5-year or 10-year age group mortality and population data rather than Lexis triangle Diversity of mortality data “shapes” (LT/UT, RR, VH, VV, RV …) Open age interval

14 Data challenges: Definitions
Inconsistencies in definition over time Live births WHO standard definition: “A live birth is the complete expulsion or extraction from its mother of a product of conception, irrespective of the duration of pregnancy, which, after such separation, breathes or shows any other evidence of life, such as beating of the heart, pulsation of the umbilical cord, or any definite movement of voluntary muscles, whether or not the umbilical cord has been cut or the placenta is attached”. Territorial changes De jure vs. de facto reference population (or permament vs usual residents of the country)

15 Territorial adjustment for changes in definition
Trends in the official population estimates (as of December 31st) by sex, Poland, Source: Domantas Jasilionis, in B&D file for POL.

16 Data challenges: Accuracy
Reliability of the information provided Under-registration (births, deaths, population) Immortals/phantoms (in register-based census esp.) Unknown age Age misstatement (attraction, overstatement)

17 Consequences of age-overstatement
Trends in male life expectancy at age 65 (left panel) and age 80 (right panel) in Costa Rica Source: HMD.

18 Population counts Below age 80
Redistribution of population of unknown age If January 1st population estimates by single year of age not available => Adjustment to Jan. 1st of other annual estimates by linear interpolation Calculation of intercensal estimates using intercensal survival methods on Census and vital statistics data (cohort component method) To deal with both large open age intervals in population data and age overstatement, we use a fairly complicated set of demographic techniques (relying on work that has showed that age is more accurately reported in vital statistics than in census)

19 Intercensal survival method An example for pre-existing cohorts
The simplest procedure merely consists of subtracting death counts from the initial census count to obtain cohort population estimates on Jn 1st of ech succeeding year. Unfortunately, the final step of such a computation usually yields an estimate of cohort size at time (t+5) that differs from the number given by the corresponding census This inconsistency is caused by 2 factors: migration and error Although both are typically small for national populations, they should not be ignored The standard method is to distribute implied migration/error uniformly over the parallelogram in the figure (that is within each cohort) Then, estimates of cohort size for intercensal years are found by subtracting from the initial census count both the observed death counts and an estimate of net migration/error Main problem is in case of large undocumented swings in migration over the time intercensal period Source: HMD.

20 Population counts At ages 80+ years
Intercensal estimates for non-extinct cohorts aged by end of observation period (except for N. European countries) Survivor ratio (cohorts aged 90+ by end of observation period) Extinct cohort methods Survivor ratios computed on the reconstructed population (from deaths cumulated at the highest age in fully extinct cohorts) from one age x to another age (x-1) + mortality decline taken into account by using a multiplyer C derived from recent mortality trends in the age group.

21 Methods for estimating population at ages 80+ years
A = Intercensal estimates; B = Extinct cohort; C = Survivor ratio Source: HMD Methods Protocol, V6.

22 Additional methodological issues
Large fluctuations due to small number of deaths at very high ages

23 Fitting a mortality curve at higher ages
Smoothing of period rates at older ages by fitting a logistic function with asymptote at 1 (to account for inherent randomness of mortality at older ages) (Kannisto model) – difference between Mx (unsmoothed) and life table mx (smoothed) Source: Tim Riffe.

24 Validation Sweden, 2000 Females Source: Tim Riffe.

25 Additional methodological issues
Large fluctuations due to small number of deaths at very high ages Non-uniform distribution of births during some calendar years

26 Births by month in France from 1912 to 1921
Source of the problem Births by month in France from 1912 to 1921 Source: Tim Riffe.

27 Additional methodological issues
Large fluctuations due to small number of deaths at very high ages Non-uniform distribution of births during some calendar years Obsolete estimation of the age at death of deceased infants (a0)

28 Unresolved issues Changes in population definition
Undocumented large and sudden migration waves Age overstatement in mortality data Immortals and phantoms in register-based estimation of population

29 Many thanks … To our friends and colleagues around the world who have help us to build the database To the many users of the data who make our work worthwhile To the Max Planck Society in Germany, and to the National Institute on Aging in the United States for sponsoring the project from its inception To the Department of Demography at UC Berkeley and the Berkeley Center for the Economics and Demography of Aging for their continuing support To the Society of Actuaries and the Canadian Institute of Actuaries for their financial contributions to the HMD

30 Geographic location of HMD countries

31 A variety of shapes in the original mortality data
Age Time Source: Tim Riffe.

32 Using a spline function to redistribute deaths from age groups to Lexis triangles
Source: Vladimir Shkolnikov.

33 An example of age overstatement

34 Ex (exposure) ratios from one year to the next, France
Map of mortality deviations Illustration of a situation where the assumption of a uniform distribution of births is violated = French cohort born around WWI The chart compares the raw mortality rate at age x and year t, m(x,t), to m(x-1,t) (comparison of a square with the one immediately to the left) E.g. a value of 1.2 indicates that m(x,t) is 20% larger than for the same age the year before Ex (exposure) ratios from one year to the next, France Source: Tim Riffe.

35 Using birth-by-month data to compute exposures more precisely
Estimations for period rates (using simple rules of algebra) where and We estimate exposures separately for each Lexis triangle (EL = exposures in the lower triangle, i.e. blue here; EU = exposures in the upper triangle, i.e. red here) We use the average times at birth (b barre 1 and 2) for births that occurred in years (t-x) - for b barre 2 - and (t-x-1) - for b barre 1 - expressed as a proportion of the year (with 0.5 = all births occurred exactly in the middle of the year, 1 = all births occurred at the very beginning of the year, and 0 = all births occurred at the very end of the year) And we use the corresponding variances in the distribution of births by month (sigmas) within each pair of cohorts With the coefficients s1, s2, u1 and u2 calculated using the distribution of birthdays within annual cohort:

36 Using birth-by-month data to compute exposures more precisely
Estimations for cohort rates (using simple rules of calculus) And exposure estimates = E(x,t) is not simply = to P(x,t+1) because in most age intervals, people tend to die more at the end of the interval since the probability of dying increases with age and since they are older at the end than at the beginning of the interval [check that this is indeed the reason for the adjustment] Where zL and zU are calculated using the distribution of birthdays within annual cohort:

37 Re-estimation of a0 using Andreev and Kingkade, 2015
V5 of the HMD MP estimated a0 (which determines the calculation of exposure for the infant mortality rate = denominator of rate) using the Coale and Demeny formula C-D formula determined more than 40 years ago when IMR still high pretty much everywhere As the IMR has fallen, the formula has become obsolete Andreev and Kingkade have modeled a0 on q0 using more recent data with a cut-point regression line Source: Tim Riffe.


Download ppt "Data Quality Issues and Adjustments in the Human Mortality Database"

Similar presentations


Ads by Google