Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data checks: the debate

Similar presentations


Presentation on theme: "Data checks: the debate"— Presentation transcript:

1 Data checks: the debate

2 Objectives What are the different surveys doing by default?
Data checks: the debate Objectives What are the different surveys doing by default? Understand the debates around certain tests and their impact on the final results International recommendations Conclusions on different tests used Data interpretation

3 When to apply these data checks
Data checks: the debate When to apply these data checks During, after or long after the survey Routine data vs other types of data Notes: Data quality checks can be for surverys or other type of data and done during or after the surveys

4 Missing data and illegal values
Data checks: the debate Missing data and illegal values DHS MICS SMART Missing data % of completeness for height and weight % of completeness for all anthropometric indicators Detects them Not contributing to score Duplicates No specific tool Illegal values Several checks and rules during data entry but not specifically after Exact date of birth None % of children with no birthdate MICS: % of children with no birthdate (but does not differentiate exact birthdate vs estimated birthdate) and % of children with birth date certificates DHS, MICS, and SMART/NNS surveys include checks for data quality. For those teams using computer assisted personal interviewing (CAPI) surveys or the Open Data Kit (ODK), data may be reviewed daily while teams are still in the field. DHS field-check tables for reviewing data quality include, per field team, information on the percent of eligible children a) measured, not present, or missing; b) with out-of-range length/height measures, z-scores, or incomplete date of birth; and c) since early 2015, the distribution of last digits for height and weigh MICS field-check tables include the response rate, age distributions, flags, and information on heaping. SMART/NNS field teams use a cluster control form to review outcomes by household, such as missing or refused cases. Data are entered into the Emergency Nutrition Assessment (ENA) software either in the field or if not feasible, when teams return to the base. Concerns were raised that the feedback SMART/NNS supervisors provide to interviewers may result in over-editing of data in the field and may suppress genuine variation within clusters or shift heaping from one digit to another during the course of data collection Age is the most important and difficult indicator especially if the birth mother is no longer living or is not present. DHS and MICS survey teams collect data on child age and efforts are made to determine year, month, and day of birth to calculate exact age in days on the date of the household visit. SMART/NNS survey staff also collect data on child age; however, it was not clear from the presentations and discussions if exact age or rounded age (up or down) is determined.,. Errors in age other than date of birth later than date of visit are difficult to detect in the field. Ideally observers should be trained to probe as much as possible so that they obtain good and complete information on date of birth and date of visit. It is best for the actual age calculation to take place at the analysis stage. According to WHO, it is very important to determine age in days as accurately as possible.

5 DHS MICS SMART Digit preference Age heaping Sex ratio
Data checks: the debate DHS MICS SMART Digit preference None % distribution of each digit for weight & length or height Digit Preference Score for weight, length or height, MUAC only Age heaping No specific tool Graphical display No score Sex ratio Sex ratio available but not compared with other data Chi sq Age ratio& structure Age distribution of children <5y by age but not compared with other data Harvard researchers found that digit preference was greater for height than for weight. Digit preference for height affected a higher percentage of cases for the DHS and MICS compared to SMART, but affected few cases for weight for any of the surveys. The Harvard researchers also conducted simulation exercises with the sample of DHS and MICS datasets to induce digit preference on distributions of height and weight, which indicated that digit preference for height (0.1 centimeters) was relatively unimportant in terms of its impact on prevalence, but that inaccuracy in weight (0.1 kilograms) was more important and could result in a 2 percent over-estimation of prevalence of underweight or wasting Age heaping can be a problem in many surveys, especially for estimates of underweight and stunting. A simple histogram of survey data of age in months can be used to identify age heaping. Harvard found that digit preference for age was about the same for the MICS and SMART/NNS and slightly lower for the DHS. Simulation exercises they conducted with a sample of DHS and MICS datasets to induce heaping/digit preference in distributions for age found that inaccuracy in age could result in a 4.5 percent over-estimation in the prevalence of stunting and a 4.2 percent overestimation in the prevalence of underweight.

6 Efect of rounding As DPS increase prevalences varies in these simulations Effect of rounding/digit preference of weight on the prevalence of GAM and SAM from 50 simulated surveys. From

7 Summary statistics for age and sex ratios from children aged 0–59 months across MICS, DHS, and SMART (NNS) surveys in West Central African countries (and NHANES in the US). Taken from Comment on how ratios varied in different surveys (ranges) there is lack of consensus around the ‘true’ population parameter for the age ratio given that a comparison is being made between groups spanning an unequal number of months (6–29 months versus 30–59 months). Age ratios may be sensitive within countries to demographic changes from changing fertility rates and rates of infant and child mortality. In addition, many SMART surveys do not cover children in the range of 0–5 months (ages which may be more difficult to assess). It is not clear if this ratio should be 1.0 or some other value and if the true value may vary between countries and over time.

8 DHS MICS SMART Flags Normality SD Dispersion WHO flags
Data checks: the debate DHS MICS SMART Flags WHO flags WHO & SMART flags Normality None Graphical display, Skewness and kurtosis statistics SD Focusing on WHZ (Results provided for HAZ and WAZ but not included in classification). Dispersion Index of dispersion (Variance to mean ratio) Remind: SMART flags are more restrictive and may suppress true variation in the data but WHO flags detect only the most extreme outliers. There is more consensus for using WHO flags Harvard researchers applied the same WHO flags to all the survey data. They found the percent of flagged or implausible values for height and weight were higher in DHS and MICS surveys compared to SMART. All use flags for WHZ, HAZ, WAZ. SMART provides a score

9 SMART´s criteria and scores
TEST Excellent Good Acceptable Problematic DPS < 7 8 -12 > 20 Sex ratio p>0.1 p>0.05 >0.001 ≤0.001 Age Ratio p>0.001 p≤0.001 Age structure Flags 0 <N< 2.5% 2.5%<N< 5.0% 5.0% <N< 7.5% N>7.5% Skewness <± 0.2 <± 0.4 <± 0.6 >± 0.6 Kurtosis Dispersion   p>0.1 SD <1.10 & >0.90 <1.15 & >0.85 <1.20 & >0.80 ≥1.20 & ≤0.80 DHS and MICs don´t have overall data quality score These can be used as an indication but as mentioned in TEAM report, data quality is not a score but a judgement call made on objective criteria

10 Standard Deviation Recommended by WHO in 1995 and SMART
Data checks: the debate Standard Deviation Recommended by WHO in 1995 and SMART Some: « keep SD close to 1 » Others: « SD>1 reflects heterogenitiy » What affects more the SD? Data quality? Population heterogenity? The 1995 WHO Technical Report recommended assessing quality of anthropometric data partly based on the standard deviation of the z-scores. A standard deviation greater than expected was associated with poorer quality data. Harvard study shared that the standard deviations for height-for-age, weight-for-age, and weight-for-height z-scores were higher for the DHS and MICS compared to SMART Does the shape of the distribution change when the population gets more malnourrished? Maybe not for height but maybe yes for weight, we don´t know The new “Recommendations for improving the quality of anthropometric data collection, analysis and reporting” from WHO (draft version so far) also don´t recommend threholds for SD

11 Data checks: the debate
Standard Deviation poor data quality can inflate the SD of anthropometric measures But some populations may haver naturally big SD When you aggregate data from different populations (subnational) it is very difficult to get SD of 1 While there was no agreement on what is a reasonable standard deviation of z-scores to expect in heterogeneous populations, there was some agreement that 1 may be unrealistic in some situations and that very large standard deviations, for example greater than 2, might be a sign of poor quality Further investigations are needed to (i) develop guidance on how to tease out the relative contribution of measurement error from expected population-associated spread for any given survey; and (ii) to ascertain a cut off at which the SD might be more conclusively related to data quality for each anthropometric index.

12 Example mixture of two normal distributions yielding a non-normal distribution
deviations from normality are not necessarily due to poor quality data; they can be due to sampling a mixed population Some anthropometric survey methods (e.g. SMART) use deviations from perfect normality as an indicator of poor data quality. But deviations from normality are not necessarily due to poor quality data; they can be due to sampling a mixed population.

13 Gaussian distribution curve illustrating how a random error in measurement will result in different proportions of data-points moving from one segment to another to increase the value of the standard deviation and the prevalence. The red line has a Standard deviation of 1.0. The dotted blue line has a standard deviation of 1.2. The grey areas show how the area in the centre of the curve has moved towards the tails thus increasing the areas below -2.0Z and above 2.0Z. Both distributions are normal without generating kurtosis or skewness.  Thus systematic errors tend to increase the prevalence estimates.

14 Original GAM in DHS with SD of 1.44 Simulated GAM with SD of 1
The GAM and SAM from 100 West African Surveys computed with the average observed SDs and prevalence that would obtain if the SD had been either 1.0Z or 1.1Z. From You can see how the prevalence of WFH in the DHS surveys changes from 12.6% when the SD is 1.44 to 5% when it is 1. SAM is also reduced by half In SMART SD is closer to 1 and changes are less evident

15 Statistical significant test
Data checks: the debate Statistical significant test thresholds and ranges are better approach than relying on tests of statistical significance Significance tests can be strongly affected by sample size Small sample sizes can lead to tests missing large effects and large sample sizes can lead to tests identifying small effects as highly significant Care should be exercised when using statistical significance tests to classify data as “problematic”. thresholds and ranges for skew and kurtosis statistics is usually a better approach than relying on tests of statistical significance.

16 Example Shapiro-Wilk normality test data: W = , p-value = Histograms of MUAC with Normal curves superimposed from sata from a SMART survey in Kabul, Afghanistan. tests indicate that MUAC, is significantly non-normal. Examination of the histograms show that the deviation from normality in these indices are not particular marked. All indices have symmetrical, or nearly symmetrical, “bell-shaped” distributions. If a distribution appears to be normal (i.e. has a symmetrical, or nearly symmetrical, “bell-shaped” distribution) then it is usually safe to assume normality and to use statistical procedures that assume normality. It is important to remember that the normal distribution is a mathematical abstraction. There is nothing to compel the real world to conform to the normal distribution.

17 Statistical significant test
Data checks: the debate Statistical significant test This can be applied to normality, and all other test (sex ratio etc) Failing a test doesn´t mean data is not representative of reality Take the context into account If several test fail, think of fabricated data This is not just for normality but for all test, including when applied to sex ratio , age etc The data we see may be representative of reality even when it fails a test It is extremely difficult to fabricate data for weight and height that forms a normal distribution without skewness, or Kurtosis, an acceptable SD, and without digit preference

18 Surveys procedures Dispersion Index deviations from random can reflect the true distribution more than one livelihood zone interpreted with caution The idea behind using a measure of dispersion to judge data quality is a belief that the distribution of cases of malnutrition across primary sampling units should always be random. If this is not the case then the data are considered to be suspect. The problem with this approach is that deviations from random can reflect the true distribution of cases in the survey area This may occur when the survey area comprises, for example, more than one livelihood zone. It is also less likely to be the case for conditions, such as wasting and oedema, that are associated with infectious disease and so may be more clumped than randomly distributed across primary sampling units. This may become a particular problem when proximity sampling is used to collect the within-cluster samples. Measures of dispersion are problematic when used as measures of data quality and should be interpreted with caution. The exception to this rule is finding maximum, or almost maximum, uniformity or maximum, or almost maximum, clumping. A finding of maximum uniformity is likely only when data have been fabricated. A finding of maximum clumping may indicate poor data collection and / or poor data management.

19 Conclussions on different tests
Each survey system, (DHS, MICS, SMART) has strengths and opportunities for improvement need to develop guidance analyses of anthropometric data should be encouraged update the 1995 WHO guidelines on assessing survey data quality. ensure public access to raw data Investigate whether and how best to adjust existing survey data for imprecision: Shape of distributions Heterogeneity across place, group, or time Implications of providing revised estimates need to develop guidance on how to conduct good quality anthropometric assessment; improve training and supervision; and ensure representative sampling of clusters and within-cluster selection of households and individuals across geographic areas and socio-economic groups and over time update the 1995 WHO guidelines on assessing survey data quality. This will ensure there are standardized approaches to assess data quality, with relevant indicators and thresholds, e.g., number of missing cases, digit preference, standard deviation of z-scores, proportion of extreme values, and other measures of quality.

20 Prevalence-based data for children 0-59 months are commonly reported using a cut-off value, often <-2 SD and >+2 SD. (<-3 for severe) The rationale for this is the statistical definition of staying with-in the range of the central 95% of the international reference population The use of -2 SD and + 2 SD as cut-offs implies that 2.3% of the reference population on both ends of the population curve will be classified as malnourished even if they are truly "healthy" individuals with no growth impairment. Hence, 2.3% can be regarded as the baseline or expected prevalence at both the ends of the spectrum of the nutritional status calculations. Taken from: Recommendations for improving the quality of anthropometric data collection, analysis and reporting

21 Should we subtract these 2.3% from prevalence calculations?
Debate Should we subtract these 2.3% from prevalence calculations? The 2.3% figure is customarily not subtracted from the observed value

22 New prevalence thresholds
Released in January 2019 The revised prevalence thresholds presented here can be used by the international nutrition community for interpreting the data for following purposes: classifying and mapping countries according to levels of severity of malnutrition; by donors and global actors to identify priority countries for action; and, most importantly, by governments for monitoring purposes and to trigger action and target programmes aimed at achieving “low” or “very low” levels.

23 Tracking trends Pay attention to the denominators
What is the seasonality of the survey? Look at the confidence intervals Prevalence of other types of malnutrition and mortality WHO tracking tool: Attention needs to be paid to the denominators, when reporting for stunting, wasting, underweight, and overweight for all children under five years of age or for children 6-59 months of age. Various standard surveys occasionally do not include 0-6-month children in the anthropometry measurements. Therefore, comparing populations, or when studying trends, variation in the denominators can cause confusion in the interpretation of the estimated. WHO, in collaboration with UNICEF and the EC, has developed the Tracking Tool to help countries set their national targets and monitor progress on the WHA targets, out of which three relate to tracking stunting, overweight, and wasting. This tool allows users to explore scenarios taking into account different rates of progress for the targets and the time left to Information and tools related to the tool can be accesses by the users at the following link on the WHO website This tracking tool has been used as a means to review trends of the current data included and validated in the UNICEF-WHO-WB Joint Child Malnutrition review.

24 Excercise 1 Look at the data quality reports for Nigeria surveys
DHS 2013 SMART 2015 MICS 2017 What data quality test are relevant for nutrition? What survey data quality is better? What additional tests would you perform? The full reports are also provided togeter with the data quality reports


Download ppt "Data checks: the debate"

Similar presentations


Ads by Google