Copyright 2003-4, SPSS Inc. 1 Practical solutions for dealing with missing data Rob Woods Senior Consultant.

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

©2011 1www.id-book.com Evaluation studies: From controlled to natural settings Chapter 14.
FUNCTION FITTING Student’s name: Ruba Eyal Salman Supervisor:
Assumptions underlying regression analysis
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Copyright © 2010 Pearson Education, Inc. Systems of Linear Equations in Three Variables Learn basic concepts about systems in three variables Learn basic.
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Quantitative Methods II
Sociology 690 Multivariate Analysis Log Linear Models.
Lecture 8: Testing, Verification and Validation
9: Examining Relationships in Quantitative Research ESSENTIALS OF MARKETING RESEARCH Hair/Wolfinbarger/Ortinau/Bush.
More Two-Step Equations
Psychology Practical (Year 2) PS2001 Correlation and other topics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chi-Square Tests Chapter 12.
Chapter 11: The t Test for Two Related Samples
Chapter 11 Automatic Cluster Detection. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks.
Copyright © 2012 by Nelson Education Limited. Chapter 13 Association Between Variables Measured at the Interval-Ratio Level 13-1.
Slide 1 Incorporating Nonmetric Data with Dummy Variables For many of the multivariate techniques we will study, it is assumed that the independent or.
Data Analysis Statistics. Inferential statistics.
1 DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research.
Shipi Kankane Prashanth Nakirekommula.  Applying analytics and risk- management capabilities to health insurance through LexisNexis data platforms. 
Business and Economics 7th Edition
Data Analysis Statistics. Inferential statistics.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Directed Data Mining: Decision Trees
Data Mining Techniques
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Simple Linear Regression
The CRISP-DM Process Model
Analyzing and Interpreting Quantitative Data
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry
1 Further Maths Chapter 4 Displaying and describing relationships between two variables.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Chapter 13 Multiple Regression
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
CHAID. Example: Opening of Cinema/ Children’s Park/Exhibition Center To find consumer responses to opening of Cinema, Children’s park or Exhibition 903.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Chapter 6: Analyzing and Interpreting Quantitative Data
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
An Interactive Tutorial for SPSS 10.0 for Windows©
I. ANOVA revisited & reviewed
BINARY LOGISTIC REGRESSION
An Interactive Tutorial for SPSS 10.0 for Windows©
A Predictive Model for Student Retention Using Logistic Regression
CH 5: Multivariate Methods
Analyzing and Interpreting Quantitative Data
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
NBA Draft Prediction BIT 5534 May 2nd 2018
Response Analysis.
Introduction to Logistic Regression
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
CHAID.
Presentation transcript:

Copyright , SPSS Inc. 1 Practical solutions for dealing with missing data Rob Woods Senior Consultant

Copyright , SPSS Inc. 2 Common issues Issues Consequences of missing data Is my data really missing? How techniques deal with missing data Solutions Different approaches for dealing with missing data Solutions Different approaches for dealing with missing data

Copyright , SPSS Inc. 3 Issues

Copyright , SPSS Inc. 4 Consequences of missing data Descriptive statistics Missing data can distort descriptive statistics For example, if workers are surveyed about hours of work Shift workers are underrepresented in survey If shift workers work more hours but hours are more variable Overall worker mean and standard deviation of hours would be underestimated Predictive modelling Most modelling techniques require complete set of independent variables in order to make a prediction Missing data can result in no prediction for a case Procedure may not run if data set contains high percentage of missing data

Copyright , SPSS Inc. 5 Model estimation: Missing values Linear regression Decision trees Binary logistic regression Multinomial logistic regression Discriminant analysis Also listwise exclusion of missing values In order for a case to be scored a complete set of information on independent variables is required Binary logistic regression Multinomial logistic regression Discriminant analysis Also listwise exclusion of missing values In order for a case to be scored a complete set of information on independent variables is required

Copyright , SPSS Inc. 6 Example of decision tree

Copyright , SPSS Inc. 7 Possible imputation modelling techniques Missing value continuous Linear Regression Decision Trees C&RT Neural networks MLP Missing value categorical Binary logistic regression Multinomial logistic regression Discriminant analysis Ordinal regression Decision Trees CHAID C5.0 C&RT Neural Networks MLP Missing value categorical Binary logistic regression Multinomial logistic regression Discriminant analysis Ordinal regression Decision Trees CHAID C5.0 C&RT Neural Networks MLP

Copyright , SPSS Inc. 8 Is my data really missing? Always understand your data A field may appear to be missing but further investigations reveals it is… a not applicable survey response In the commercial world data often not collected with analysis in mind Is it a calculation you have made? Derived fields can create missing data eg. Log10(x) when x is 0 equals … Undefined Consider using Log10(1+x) instead In SPSS two ways to calculate a mean (x2 is missing) x1+x2+x3/3 will return a missing value Consider using MEAN function MEAN(x1,x2,x3)

Copyright , SPSS Inc. 9 Is my data really missing? Check original data source Has the data feed failed? Check your merge Have you accidentally dropped a field Have you appended two files together when only one file has the field you are interested in?

Copyright , SPSS Inc. 10 Solutions

Copyright , SPSS Inc. 11 Different approaches for dealing with missing data Look for fields with very high percentage of missing fields It may be necessary to exclude field and use an alternative Look for records with a high percentage of missing fields Consider excluding the case For example, someone who has started inputting a survey and given up after two questions!

Copyright , SPSS Inc. 12 Different approaches for dealing with missing data SPSS Missing Value module Missing value statistics Shows common patterns in missing data Performs statistical tests to see if the variables are affected by missing data Imputes missing data Regression EM (Expectation Maximisation) Easy to impute missing values for several fields in one step Use traditional modelling techniques to impute missing data Classification and Regression Tree (CRT) Chi-Square Automatic Interaction Detector (CHAID) Would impute one variable at a time Use traditional modelling techniques to impute missing data Classification and Regression Tree (CRT) Chi-Square Automatic Interaction Detector (CHAID) Would impute one variable at a time

Copyright , SPSS Inc. 13 Demonstration Data collected on 109 countries (five regions) Europe East Europe Pacific/Asia Africa Middle East Latn America Data collected on key national indicators such as Religion Life expectancy Male and female literacy Daily calorie intake

Copyright , SPSS Inc. 14 Summary Show how Missing Values module is a powerful tool for Describing and imputing missing values Evaluate possible consequences of ignoring missing data Showed different methods for imputing missing data EM (Expectation Maximisation) Regression Decision Trees

Copyright , SPSS Inc. 15 Any