Day 2: introduction
Objectives Learn different tests to asses data quality Day 2: introduction Objectives Learn different tests to asses data quality Practice the tests Understand the challenges of each test
Evaluating data quality Day 2: introduction Evaluating data quality Numerical summaries are useful for checking that data are within an expected range Graphical methods are often more informative than numerical summaries A key graphical method for examining the distribution of a variable is the histogram
Debate what tests do you know on data quality? Do you use them often? 5 min Debate what tests do you know on data quality? Do you use them often? Start the day by asking participants what tests do they know on data quality? What is it testing (sampling? measurement errors?) And see which ones are going to be reviewed during day 2.
Data files CSV files (Comma Separated Value) Day 2: introduction The CSV format has been used because it is easy to use with different data analysis systems. It can be read as a simple text file too It can be open in Excel. If you have an Excel that uses the comma (“,”) to separate the decimal points, make sure to indicate to excel that your file is using the comma as a tab. This happens with Spanish users or computers in Spanish
Software The NIPN data quality toolkit Who Anthro Data Survey Analyser Day 2: introduction Software The NIPN data quality toolkit Who Anthro Data Survey Analyser Excel STATA SPSS Online calculators We can use any of these!
Software Who Anthro Data Survey Analyser Day 2: introduction Software Who Anthro Data Survey Analyser New version coming soon This can be a quick solution for today´s excercises but you can also use Excel and a couple of online calculators
Exercise : preparing your data Day 2: introduction Exercise : preparing your data Open the file ex01.csv using Excel or other software The file ex01.csv is a comma-separated-value (CSV) file containing anthropometry data from a SMART survey in Angola. How many variables the file has? What are they? What are the problems with the sex variable? Amend the sex variable until it cab be read by WHO anthro Build an histogram in Excel for any variable you want Make sure participants can open this file with Excel and WHO Anthro (or ENA, SPPS or R)