Data Cleaning Techniques Workshop on Emergency Information Management Neuhausen, Germany, 18-22 June, 2012 Christian Oxenbøll, Registration Officer, UNHCR.

Slides:



Advertisements
Similar presentations
Advanced Excel Class Glenn Harris Microsoft Certified Trainer
Advertisements

Advanced Excel Class Glenn Harris Microsoft Certified Trainer Office Master Instructor Excel Class NYC.
What will be covered Based on our older Excel for Data Management class but freshened up a bit since everything moved in – Importing and exporting,
Excel Lesson 13 Using Powerful Excel Functions Microsoft Office 2010 Advanced Cable / Morrison 1.
© Paradigm Publishing, Inc Excel 2013 Level 2 Unit 1Advanced Formatting, Formulas, and Data Management Chapter 2Advanced Functions and Formulas.
33 CHAPTER BASIC APPLICATION SOFTWARE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 1-2 Announcement: QUIZ#02 In Lecture Session # 9 (5.
© Cheltenham Computer Training ADVANCED ECDL/ICDL [Module Four] - Spreadsheets ECDL ADVANCED Module 4 Spreadsheets Cheltenham Computer Training.
Chapter 5 Creating, Sorting, and Querying a Table
3.1 Data and Information –The rapid development of technology exposes us to a lot of facts and figures every day. –Some of these facts are not very meaningful.
Chapter 8 Data Analysis. Agenda Functions –AND and OR –COUNT, COUNTA, and COUNTIF –CONCATENATE and TRIM –RANK and QUARTILE –MOD and ROW Goal Seek in decision-making.
Exploring Microsoft Excel 2002 Chapter 7 Chapter 7 List and Data Management: Converting Data to Information By Robert T. Grauer Maryann Barber Exploring.
Calling all Data Geeks! Corey McAfee October 24, 2014 Corey McAfee October 24, 2014.
Tutorial 7: Using Advanced Functions and Conditional Formatting
Team Decision Making and Self- Evaluation: Getting the Most Out of Your Database Anne K. Abramson & William C. Dawson Center for Social Services Research.
MICROSOFT EXCEL – TRAINING FOR QC DIETETIC INTERNS Stephanie Brooks – Fall 2014.
1CP102_module 3: spreadsheet2 More features in Excel Selection: a cell, a range, multiple ranges Name a range: to give name to a selected range or multiple.
Advanced Lesson 2: Advanced Data Analysis A PivotTable is a sophisticated tool that creates a concise report summarizing large amounts of data based on.
Term 2, 2011 Week 1. CONTENTS Types and purposes of graphic representations Spreadsheet software – Producing graphs from numerical data Mathematical functions.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Spreadsheet Productivity Session Roger Grinde, Ph.D.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Excel Project 5 Creating, Sorting, and Querying a List.
COMPREHENSIVE Excel Tutorial 7 Using Advanced Functions, Conditional Formatting, and Filtering.
Welcome To Excel Day 3: Modules 4 & 5 Instructor: Mary Magellan Class will start at approximately 8:05 AM.
Excel Functions Abby Wiertzema
3.2 Data Checking.
MS Excel Management Information Systems 1. Contents 2  Functions  IF Function and nested IF  Sorting Data.  Filtering Data.  Data Form.  Data Validation.
Prescriptive Analytics Appendix A EXCEL TOOLS FOR THE MANAGEMENT SCIENTIST Business Analytics with Management Science Models and Methods Arben Asllani.
Fellows Training FINCA Client Assessment Tool (FCAT): Data Cleaning Slides Incorporate Important Information from ORC Macro Demographic and Health.
DESIGNING DATA COLLECTION FORMS Workshop on Emergency Information Management Neuhausen, June, Christian Oxenboll, UNHCR.
Microsoft Office Excel 2013 ® ® Abdul Hameed Using Advanced Functions and Conditional Formatting.
DAY 22: MICROSOFT ACCESS – REVIEW Akhila Kondai November 04, 2013.
Microsoft Excel grade 8 By : Reem Hasayen. General information  Types of files.  Word document. *.doc  Excel sheet *.exl  Publisher *.pub  Database.
Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and.
Microsoft Excel – Pivot Tables Introduction to Microsoft Excel Pivot tables Please login to the computers and launch Microsoft Excel. Rob Jones Room WG43.
Pivot Table Training. Agenda Purpose of a Pivot Table Creating a Pivot Table: Count Tailoring Your Information Cloning Pivot Tables Behind the Scenes:
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
 Enhance understanding of logical functions.  Discuss using conditional logic focusing on: ◦ IF ◦ SUMIF(s) ◦ COUNTIF(s) ◦ AVERAGEIF(s)  While waiting.
Excel Review By Mr. Griffin Elmira Business Institute.
Microsoft Office 2013 ®® Calculating Data with Formulas and Functions.
Excel On Demand Training Module Get In Touch Address: E-35, Janak Puri, Sahiababd, Ghaziabad Landmark: Near Triveni Dham Hanuman Mandir.
Microsoft ® Excel ® 2013 Enhanced Excel Tutorial 3 Calculating Data with Formulas and Functions.
Review Excel Exam 1. Format Cell –Number –Alignment –Font –Border –Pattern –Protection Worksheet (group) Conditional formatting.
04. Excel Countif and Vlookup. File -> Open -> 04b-datastart.xlsx.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Validation & Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
A Training Course for the Analysis and Reporting of Data from Education Management Information Systems (EMIS)
Working Through a Survey From Raw Data to Meaningful Visuals Xiaobing Shuai
Excel Tips and Tricks Leda Voigt Green River College.
MS-EXCEL PART 3. Use data validation in Excel to make sure that users enter certain values into a cell. Data Validation Example In this example, we restrict.
Maximizing Microsoft Excel © for HIM Professionals: Improving Data Quality While Streamlining Data Analysis Michael Gera, Partner Healthcare Computer Training.
MSAA PRESENTS: AN EXCEL TUTORIAL
Contents Introduction Text functions Logical functions
Elena Lazarevska, City of Boulder
PROJECT ON MS-EXCEL.
Exploring Excel Chapter 5 List and Data Management: Converting Data to
Adam Little Catharine Reeder
Analyzing Table Data.
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Pivot Tables to the Rescue!
Verification and Validation of Forms for Air Carriers
Creating, Sorting, and Querying a List
Facilitator: Aiman Saleem
Data quality 1: Individual records
Microsoft Excel Workshops
Microsoft Excel Workshops
Pivot Tables to the Rescue!
A few tricks to take you beyond the basics of Microsoft Office 2007
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
MS Excel – Analyzing Data
Data management and analysis
Presentation transcript:

Data Cleaning Techniques Workshop on Emergency Information Management Neuhausen, Germany, June, 2012 Christian Oxenbøll, Registration Officer, UNHCR Tips and tricks for data management in Excel

Data Cleaning Why is it important? Bad data leads to wrong results Operational and management decisions should not be based on wrong information Even “a few bad data” can make a whole dataset useless for statistics

What is data cleaning? Existing data: –Reviewing logic consistency of data –Reviewing reliability of data –Correction of wrong values –Deletion or suppression of erroneous values Subsequent data cleaning can be reduced by proper design of data collection: –Make a data management strategy –Make sure you know how you will process collected data –Ensure consistency in design –Validation rules in Excel

What are we looking for? Common errors include: –0 when it should be “N/A” (not available/not applicable) –Totals do not match underlying data –Typing errors (and use of different location names) –Wrong interpretation of questions –Mismatch of units (cases/persons, days/months, square metres/hectares, pct/ratios, flow/stock, etc.) –Missing data –Percentages e.g. indicator values >100% –Date formats (12/01/06 or 01/12/06)

How do you clean data? Think logic! –Look at the data –Reflect over whether it makes sense Logical consistence (Mathematical/Statistical) e.g. Total population vs. children < 18 years Meaningful (e.g. is it really true that refugees survive without water and the camp is 2 square meters?) –Reliability of source Ask the data source about how data was collected What is covered What was the methodology Note that logical consistency alone does not imply that data is correct. Always check if data is meaningful

How do you clean data? Be creative! –Use graphs To spot outliers (high/low values) –Pivot tables To create summary tables of large datasets –Filters Easy to spot outliers (note the limit in Excel of 1,000 in drop-down list) –Sorting To spot outliers and spelling –Conditional formatting To spot invalid and dubious values or outliers

Example from Uganda SIR

How do you clean data? Be creative! –Lookup functions Easy to find non-existing codes (typos) –Formulas Check of mathematical and logic consistency –Compare with other sources (Triangulation) Validation of values/expected ranges (do we have approximately the same) –Compare with previous years Validation of values/expected ranges (do we have approximately the same)

Useful Excel Tools Validation (allows only certain values) Auto filters Conditional Formatting Pivot Tables Formulas

Some useful Excel functions Logic –And –Or –If –Not Mathematical/Statistical –Average –Count –CountA –CountBlank –CountIf –Dsum –SumIf –Rank Information –Trim –Concatenate –Left –Right –Mid –Len –Find –Proper –Lower –Upper –IsBlank –Vlookup –Yearfrac –Today Use the help in Excel which gives guidance on the use of each formula

Data cleaning: some tips Design good data collection forms Checking plausibility Outliers Trends analyses Using graphical views Triangulation Using filters, functions and formulas

Useful websites Google your questions Microsoft Online Help

Exercises Open the file: “Excel Training.xls” Follow the instructions. Ask your neighbour or the facilitators if you need assistance. In the file: “Excel Training Result.xls” you will find the result of the exercises including the formulas.