Data quality 1: Individual records

Slides:



Advertisements
Similar presentations
Microsoft Office XP Microsoft Excel
Advertisements

Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin TECHNOLOGY PLUG-IN T3 PROBLEM SOLVING USING EXCEL.
Introduction to Excel Chapter 2 Excel Fundamentals Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Using Excel to Understand Your Data Clayton County Public Schools Department of Research, Evaluation and Assessment Assistant Principal In-Service.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Chapter 5 Creating, Sorting, and Querying a Table
Chapter 6: Pivot Tables Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Excel: Pivot Tables Computer Information Technology Section 6-18.
USING PIVOT TABLES IN MICROSOFT EXCEL LOCAL GOVERNMENT CORPORATION RESOURCE 2015.
Integrating Access with the Web and with Other Programs.
Chapter 2 Querying a Database
Sensitivity Analysis A systematic way of asking “what-if” scenario questions in order to understand what outcomes could possibly occur that would affect.
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
LOGO Chapter V Formattings 1. LOGO Overview  Conditional formatting  Working with tables  Filtering  Sorting  Freeze panes  Pivot tables  How to.
Problem Solving Using Excel
Coding for Excel Analysis Optional Exercise Map Your Hazards! Module, Unit 2 Map Your Hazards! Combining Natural Hazards with Societal Issues.
11 Exploring Microsoft Office Excel 2007 Chapter 4: Working with Large Worksheets and Tables Chapter 04 - Lecture Notes (CSIT 104)
October 2003Bent Thomsen - FIT 3-21 IT – som værktøj Bent Thomsen Institut for Datalogi Aalborg Universitet.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
PIVOT TABLES AND CHARTS CS1100 Computer Science and its Applications CS1100Pivot tables and charts1.
Consolidate Consolidate Multiple Worksheets to a Single Sheet in Excel.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
Chapter 17 Creating a Database.
Excel: Pivot Tables Exploring Computer Science Lesson Supplemental.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
EXCEL CHAPTER 6. OBJECTIVES Create a PivotTable Change the values field Modify and Format PivotTable Create a PivotChart 2.
Sensitivity Analysis A systematic way of asking “what-if” scenario questions in order to understand what outcomes could possibly occur that would affect.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved Business Driven Information Systems 2e Plug-In T3: Problem Solving Using Excel 2007.
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
COM: 111 Introduction to Computer Applications Department of Information & Communication Technology Panayiotis Christodoulou.
Microsoft ® Excel ® 2013 Enhanced Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts.
By Martha Nelson Digital Learning Specialist Excel Pivot Tables.
Microsoft Office Tips Pivot tables. Agenda Learn how to create and use PivotTables Q&A Excel 2010 is very similar to 2007, I have tried to demonstrate.
Problem Solving Using Excel
Excel Tutorial 8 Developing an Excel Application
Elena Lazarevska, City of Boulder
Using PivotTables.
MS EXCEL PART 4.
Wrap text Wrapping text means you want your text to appear on multiple lines, rather than one long line of text. This allows you to keep the column width.
Exploring Computer Science Lesson Supplemental
Computer Fundamentals
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Copyright © Bird Media LLC
Practical Office 2007 Chapter 10
PIVOT TABLE BASICS.
TDA Direct Certification
Analyzing Data with Excel
Looking good! Slicing and dicing to visualize data in Excel Dashboards
Formatting a Workbook Part 2
Navya Thum February 13, 2013 Day 7: MICROSOFT EXCEL Navya Thum February 13, 2013.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Microsoft Office Access 2003
Microsoft Office Access 2003
Summary measures of mortality – Life expectancy and life tables
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Topic 8 – Pivot tables and Charts Lesson 1 – Pivot tables
REACH Computer Resource Center
Learning Objectives: Creating a new Table Style
Bent Thomsen Institut for Datalogi Aalborg Universitet
Pivot tables and charts
Assignment resource Working with Excel Tables, PivotTables, and Pivot Charts Fairhurst pp The commands on these slides work with the Week 2 Excel.
New Perspectives on Microsoft
Presentation transcript:

Data quality 1: Individual records Data analysis and Report writing workshop for Civil registration and vital statistics data. Social Statistician, SPC

Statistics help decision makers in the Pacific develop informed policies that impact Pacific people--- Dr. Ofa Ketu’u, Director SDD

The importance of quality data Poor data can lead to misleading analysis and subsequently poor decisions and policies Poor data costs money! Need to establish TRUST in our data That does not mean it needs to be perfect. It does mean that it should be the best of what we have available, and that we need to be honest about its limitations

Reviewing data quality should be continual During data collection; Review the systems to make sure that they are collecting data in the best possible way At analysis stage: Review individual records (unit record data) Review tabulated data before further analysis Review the plausibility of calculated measures; including comparing the measures to other sources of information We have talked about our data sources and systems. This session will focus on our individual records.

Data cleaning: Reviewing unit record data Check to ensure: All required data fields and records have been carried over into our working spreadsheet duplicate records have been removed inappropriate records have been excluded (for example, still births have been removed from live births data) records use variables which are consistent and can therefore be readily aggregated missing values are minimized wherever possible

Setting up unit record data One record per row and one field per column

Required data fields have been carried over into our working spreadsheet Variables or fields required Births First name Surname Date of Birth Sex Place of Birth (Hospital, Village, Island) Place of Residence (Village, Province, Island) Mothers first name Mothers surname Mothers age Live or still birth (or all live births) Birth weight (optional) Length (optional) Weeks gestation (optional) Ethnicity (optional) Deaths First name Surname Date of Birth Sex Date of death Age (use separate fields for days, months, and years) Place of Death (Hospital, Village, Province, Island) Place of Residence (Village, Province, Island) Spouse’s first name Spouse’s surname Causes of death (by line of death certificate – 1 variable per line) Underlying cause of death (more on this in chapter 16) External cause (if applicable) Occupation (optional) Ethnicity (optional) If these are not available in our data – is there a different data set we can use? Can we link this information from a different collection? [United Nations, 2001]

Records have been carried over into our working spreadsheet Do not work on the original data set-carry the data to a working spreadsheet When extracting data, ensure that not all records are transferred: Check the totals against the original source (such as the database) Look at the total number of records and make sure that it is within an expected range

2. Duplicate records have been removed Need to find and remove duplicate records Questions to ask - How do we know if it is a duplicate? Do all fields have to be an exact match? Is this the same person? Which record will we use if the data is not exactly the same Be careful when checking for duplicates that you don’t remove twins. Name Surname Sex DOB DoD Place of death Residence Province April Jones F April 2013 5/5/2013 Hospital Noumea South Baby 4/4/2013 6/5/2013

Data Matching – For deaths to be considered “matched” they must match on 3 of the following criteria (if surname included) or 4 if not. Surname (similar spelling or sound OK) Date or Death/Month of Report (same month) First name (similar spelling or sound OK) Island (place of death or report or residence) Age at Death (within 1 year) Some possibility of under-matching when data quality poor (i.e. Insufficient data to match criteria)

3. Inappropriate records have been excluded Stillbirths should be in a different file (not part of live births or deaths) - These are important events, but should be analysed as a separate category

4. Records use variables which are consistent and can therefore be readily aggregated Variables should have been entered in a consistent manner – but this is not always the case, especially when using older data In best practice – these should be controlled by your metadata standards Common problems Sex- if we are using M/F for sex, then all records should have one of these values in the field, rather than some having recorded as male, Male, or 1 etc Dates

5. Missing values are minimized wherever possible Is there original data missing? Can we obtain this information by looking up another source? Can we impute this information from our existing data? for example, if there is no age recorded for the mother in a birth record, but we have her date of birth, an age could be calculated and inputted into the records Is the name provided specific to one sex?

Sort function in Excel The most important tool when using excel to clean the data is the sort function which appears under the data tab in Excel 2007. By clicking on the button marked, you can sort highlighted text by any of the fields in your data set.

How to sort data in excel: Things to consider Ensure that you are not using your original data as sometimes things go wrong! Ensure that there is one record per line and one line per record—if this isnt the case, re-format your data. Similarly ensure that there’s one field per column When selecting data to sort –select ALL data by clicking on the arrow in the left upper corner (between the A and 1). Ensure there are no blank columns or rows which may interrupt the sort function Label your fields in the top row and make sure these are not repeated later in the data set.

Derived variables To make our analysis easier Derive YEAR variable from date data Tabulations are easier when we have all the years of our data in the same worksheet. can be derived from the using the formula =YEAR( ) Classify age by age-groups  Analysis will be done by age group rather than individual years of age

Age groups for collation of deaths Neonatal deaths (under 28 days) 28 days to <1 year 1 to 4 years 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75+ (and unknown)

Pivot tables and what they help us to achieve A pivot table is a special Excel tool that allows you to summarize and explore data interactively.  Our worksheets contain a large set of population data; In its current form, this data is hard to understand, because there's too much detail. To make sense of the information, we need to summarize it, and a pivot table is the perfect tool. Period of birth Births Male Female Total 2009-2011 389 349 738

How to build a pivot table Basic steps in building a pivot table: highlight the data sheet by clicking in the top left corner (between A and 1) On the Insert tab of the ribbon, click the PivotTable button In the Create PivotTable dialog box, check/ select the data and click OK Specify which variables to use as columns and rows to tabulate your data by moving them into the appropriate place. Use the count of function, and a variable which has no blanks to populate your table. Once a table is set up the way you want, copy it and paste it into a new worksheet, as pivot tables cannot be locked.

Basic tabulations BIRTHS Total Number of births by year Total Births by year by sex Number of births by year, by age of mother (5 year age groups) Number of births by geographic sub-region (where relevant) (and potentially by sex and age of mother if there is sufficient data) Number of births by major ethnicity (where relevant) DEATHS Total number of deaths Number of deaths by sex and age groups Number of deaths by sex, major ethnicity (where relevant), and age for age group Number of deaths by geographic region (where relevant) by sex and age group Number of neonatal deaths (deaths in infants aged 28 days or less) Number of deaths by age (for ages <1, 1-4, 5-9, 10-14............65-59, 70-74, 75+ years), sex, and underlying cause of death (according to the ICDv10 103 cause – General Mortality list 1).

Data quality lab 1