Basic Anthropometric data quality checks

Slides:



Advertisements
Similar presentations
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Advertisements

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Measures of Central Tendency
Determining Sample Size
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
A Sampling Distribution
16-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 16 The.
Chapter 6 Probability. Introduction We usually start a study asking questions about the population. But we conduct the research using a sample. The role.
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
By C. Kohn Waterford Agricultural Sciences.   A major concern in science is proving that what we have observed would occur again if we repeated the.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
MATH IN THE FORM OF STATISTICS IS VERY COMMON IN AP BIOLOGY YOU WILL NEED TO BE ABLE TO CALCULATE USING THE FORMULA OR INTERPRET THE MEANING OF THE RESULTS.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Higher National Certificate in Engineering Unit 36 –Lesson 4 – Parameters used to Describe the Normal Distribution.
IM911 DTC Quantitative Research Methods Statistical Inference I: Sampling distributions Thursday 4 th February 2016.
Descriptive Statistics(Summary and Variability measures)
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Dr.Theingi Community Medicine
Describing Distributions Means Standard deviation Z scores Normal distribution Norms Tracking.
Outline Sampling Measurement Descriptive Statistics:
GOVT 201: Statistics for Political Science
Descriptive Statistics Measures of Variation
Chapter 2: The Research Enterprise in Psychology
Statistical analysis.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Research Methods in Psychology PSY 311
Descriptive Statistics
Statistical analysis.
Mathematical Presentation of Data Measures of Dispersion
Measures of Position & Exploratory Data Analysis
Introductory Mathematics & Statistics
Distribution of the Sample Means
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
By C. Kohn Waterford Agricultural Sciences
AP Biology Intro to Statistics
MEASURES OF CENTRAL TENDENCY
Introduction Second report for TEGoVA ‘Assessing the Accuracy of Individual Property Values Estimated by Automated Valuation Models’ Objective.
The Normal Distribution
Two Independent Samples
Statistical Analysis Error Bars
Process Capability.
Alinoor Mohamed (MPH) Bilal Shikur(MD, MPH) Seifu Hagos (MPH, PhD)
Using statistics to evaluate your test Gerard Seinhorst
Summary descriptive statistics: means and standard deviations:
Representation of Data
Sampling Distributions
Analysis and Interpretation of Experimental Findings
Quality Control Lecture 3
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Test 2 Covers Topics 12, 13, 16, 17, 18, 14, 19 and 20 Skipping Topics 11 and 15.
Advanced Algebra Unit 1 Vocabulary
Standard Deviation and the Normal Model
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Extra Anthropometric data quality checks
Data checks: the debate
Day 2 wrap up.
Checking data quality.
Presentation transcript:

Basic Anthropometric data quality checks

Objectives Outliers and flags WHO and SMART flags. Standard Deviation Basic Anthropometric data quality checks Objectives Outliers and flags WHO and SMART flags. Standard Deviation

Outliers and flags values that fall outside of an acceptable range Basic Anthropometric data quality checks Outliers and flags values that fall outside of an acceptable range values outside of the plausible range are frequently due to poor measurement, inaccurate date of birth, or data recording errors. an important indication of data quality Flagged records can be checked and corrected, or censored The currently recommended flagging system to detect implausible z-score values in analysis of national surveys were defined in 2006 when the WHO Child Growth Standards were released, replacing the WHO/NCHS growth references. The system cut-offs were defined on the basis of what is biologically implausible, in other terms, incompatible with life. These flagging cut-offs have been challenged based on observations of living children that have z-scores beyond the currently defined implausible values

Flagging usually applied to Basic Anthropometric data quality checks Flagging usually applied to HAZ WAZ WHZ BAZ (adults) Flagged records can be checked and corrected, or censored Flagging is a process of checking whether values of anthropometric indices are outside a given range The flagging process can be easily applied to other variables, even routine data not coming from surveys

Flagging criteria Two flagging methods are in common use WHO flags Basic Anthropometric data quality checks Flagging criteria Two flagging methods are in common use WHO flags SMART flags Flagging is a process of checking whether values of anthropometric indices are outside a given range SMART and WHO have different flagging criteria. SMART flags are more strict, thus the exclude more data. Thus SMART flagging criteria can reduce the estimated prevalence The flagging process can be easily applied to other variables even routine data not coming from surveys. Flagging is more of a process of checking whether values of anthropometric indices are outside a given range. Values outside these flagging limits are consider implausible but note that values outside these flagging limits may be observed in children admitted into therapeutic feeding programs. So be careful when using flags for routine data

Flagging criteria Basic Anthropometric data quality checks Both methods flag records in which one or more anthropometric indices are more than a certain distance either side of a reference value The WHO criteria are simple biologically plausible ranges around the reference mean of zero. If, for example, a value for WHZ is below −5 or above +5 then the record is flagged to indicate a likely problem with WHZ SMART criteria are more complicated. They require the mean value of the index to be calculated from the survey data. This is then used as the reference value and then 3 z-scores are added or subtracted to create a range. For example, a mean WHZ of −1.15 gives lower and upper SMART flagging limits of: −1.15 – 3 = −4.15 and −1.15 + 3 = +1.85

Flagging criteria Basic Anthropometric data quality checks The WHO and SMART flagging criteria will flag different but overlapping sets of measurements. The SMART flagging criteria will usually flag more records than the WHO flagging criteria. This will act to reduce the estimated prevalence

Flagging criteria Basic Anthropometric data quality checks Prevalence is in the “tails” of the distribution. The estimated prevalence is shown for cases defined using −3 z-scores below the reference median (i.e. zero). The red bars show the cases remaining after “outliers” to the left have been censored. The area covered by the red bars represents the estimated prevalence after flagged values have been censored. The estimated prevalence is reported below each plot as p(z < −3). Flagging is about detecting outlier values. one set of flagging criteria, either WHO or SMART, should be used at any one time. The WHO and SMART flagging criteria are designed to be applied to samples of children measured in surveys. They should not be applied to samples of severely malnourished or sick children. We recommend to use WHO flags

Flags Basic Anthropometric data quality checks Present the percentage of implausible value for each indicator separately, HAZ, WHZ, and WAZ as well as by each field team. A cutoff value of 1% is recommended by WHO to define the percentage of implausible values that is indicative of poor data quality. SMART guidelines consider proportions above 7.5% to be problematic. The proportion of flagged records in a dataset should, ideally, be below 5%. Present the percentage of implausible values by other disaggregations if the percentage of implausible values is greater than 1%. While a high percentage of flagged values is a good indication of poor data quality, a low percentage does not necessarily imply adequate data quality as there can be values that are inaccurate within the WHO flag range calculate anthropometric indices from anthropometric data and then apply flagging criteria to the data.

Flags Basic Anthropometric data quality checks Be careful when flagging criteria have already been applied. This is not good practice. All data should be shared. The flags should not be removed from the database but they should be excluded (censored) from any analyses Flagging has a dual role: 1. It is a data-checking tool. If you have access to data collection forms you will be often able to check records and fix data-entry errors. 2. It is a measure of data-quality. Flagged records can indicate problems with measurement, recording, data-entry, and data-checking

Standard deviation SD = 1 SD < 1 SD > 1 The standard deviation (SD) is a statistical measure that quantifies the amount of variation in a dataset. The smaller the SD, the closer the data points tend to be to the mean. The higher the SD, the more spread out the data points are. SDs cannot be negative, the lowest possible value for a SD is zero, which would indicate that all data points are equal to the mean (i.e. there is only one value in the entire dataset, e.g. every child has the exact same WHZ value, and thus zero variation The 2006 WHO growth standard reference sample, by definition has a standard normal distribution with mean zero and a SD of 1 for each of the anthropometric indices including WAZ, WHZ and HAZ. The growth standard is based on a sample of healthy children from six different countries (Brazil, Ghana, India, Norway, Oman, United States) with varying ethnic groups living in an environment that did not constrain optimal growth But what is the SD in malnourished population? We do not know!!!!! Thus putting limits is difficult.

Quick Excercise We must separate two exam results of a class of 30 students; the marks of the first exam vary between 31 % and 98 % and those of the second between 82 % and 93 %. Given this range, which standard deviation will be higher? Low standard deviation: your data is « close » to the median. High standard deviation: your data is dispersed over a large interval.

Quick Excercise the standard deviation will be higher for the results of the first exam.

Standard deviation (SD) Anthropometric data quality checks Standard deviation (SD) The higher the SD, the more likely poor data quality is Very difficult to put acceptable ranges SDs are typically wider for HAZ SDs for HAZ are largest for younger children No difference between girls and boys The standard deviation is sometimes considered to be useful measure of data quality when applied to z-scores. The 1995 WHO Technical Report on Anthropometry suggested a set of SD ranges, outside of which data quality could be a concern, but these cut-offs need to be revised so that they can better reflect nationally representative surveys and the WHO growth standard which is currently used. SDs are typically wider for HAZ than they are for WAZ or WHZ. A portion of this likely due to measurement error as height is more difficult to accurately measure than weight with currently available equipment and also obtaining accurate date of birth can be an issue in some populations. SDs for HAZ are largest for younger children and become tighter as the age of children increases. A component of the larger SD is due to measurement error as length is more difficult to measure than height SDs should not substantially different between girls and boys, although there may be slight biological variation.

Standard deviation SD = 1 SD > 1 If the SD is >1, the prevalence calculated with the current SD is higher than the prevalence calculated with SD=1. The 1995 WHO Technical Report on Anthropometry recommended using SD as a standard of quality with acceptable ranges of 1.1 to 1.3 for HAZ, 1.0 to 1.2 for WAZ and 0.85 to 1.1 for WHZ. SMART state that the acceptable range for the standard deviation of the weight-for-height z-scores (whz) is 0.8 to 1.2. But there is no consensus. More important than being out of a certain range is understand and identify the causes of large SDs because it can be due to poor data and produce inflated prevalence estimates but it may also be due to sampling from a mixed population rather than due to poor data quality. Systematic measurement errors tend to increase the SD but do not impact the mean Z score. Using mean Z score instead of prevalence of malnutrition is a more reliable indicator in case of frequent systematic errors. BUT it is more difficult to communicate on Z score than prevalence.

Standard Deviation Calculated by most softwares Apply only to cleaned data from which erroneous data and flagged records have been censored. Where n= the number of data points, Y = the mean of Yi and Yi is each of the values in the dataset. It is recommended to present the SD for each indicator separately, HAZ, WHZ and WAZ. as well as for different strata. explanations should be explored and included in the survey report. SD for anthropometric indices in any given survey can also be compared to those from other surveys meant to be representative of the same population in and around the same time period Flagging and identifying the causes of large SDs is important for data quality assessment. Therefore, if the SD is artificially inflated as a result of poor quality data, the prevalence estimates are likely to be overestimated. Definitively quantifying how much of the dispersion in z-scores can be attributed to heterogeneity in relation to environments which do not support optimal growth and how much to measurement error is a challenging research question

Conclussions For SD further investigations are needed to Anthropometric data quality of our surveys Conclussions For SD further investigations are needed to (i) develop guidance on how to tease out the relative contribution of measurement error from expected population-associated spread for any given survey; (ii) to ascertain a cut off at which the SD might be more conclusively related to data quality for each anthropometric index. Other approaches still need more testing January 2019 Addis Ababa

Excercise 4 Divide in 4groups The file ex04.csv is a comma-separated-value (CSV) file containing anthropometric data from a recent SMART survey in Sudan. Calculate WHO and SMART flags Team B: present on WHO flags Team C: present on SMART flags