Missing Values C5.2 Data Screening. Missing Data Use the summary function to check out the missing data for your dataset. summary(notypos)

Slides:



Advertisements
Similar presentations
Maintaining data quality: fundamental steps
Advertisements

Word List A.
Gaussian Elimination Matrices Solutions By Dr. Julia Arnold.
DFA & MANOVA PS503 - Unit 7. The Unit 8 Project We will be doing a Factor Analysis on the “complete_mooney_bp.sav” dataset from the Unit 5 assignment.
Some birds, a cool cat and a wolf
NextGen Trustee Receipting
Welcome to Dave’s Data Demonstration This presentation is designed for users of SPSS with some familiarity with the program and a willingness to experiment.
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
How to deal with missing data: INTRODUCTION
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Estimation 1.Appreciate the importance of random sampling 2.Understand the concept of estimation from samples 3.Understand the Central Limit Theorem 4.Be.
Repeated Measures ANOVA Used when the research design contains one factor on which participants are measured more than twice (dependent, or within- groups.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
9-1 Solving 3 by 3 Systems Day 1. Umm… did she say 3 by 3?? Just like the 2 by 2 systems, we will solve the 3 by 3 systems. How many methods did we choose.
Questionnaire Development: SPSS and Reliability Personality Lab October 8, 2010.
Data Analysis Lab 02 Using Crosstabs to compare percentages.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
A little help Formatting Your Magazine Prepared especially for you by Mrs. Bianco.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Accuracy Chapter 5.1 Data Screening. Data Screening So, I’ve got all this data…what now? – Please note this is going to deviate from the book a bit and.
The Beast of Bias Data Screening Chapter 5. Bias Datasets can be biased in many ways – but here are the important ones: – Bias in parameter estimates.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Multigroup Models Byrne Chapter 7 Brown Chapter 7.
Algebraic long division Divide 2x³ + 3x² - x + 1 by x + 2 x + 2 is the divisor The quotient will be here. 2x³ + 3x² - x + 1 is the dividend.
Multigroup Models Beaujean Chapter 4 Brown Chapter 7.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Inference for Distributions of Categorical Variables (C26 BVD)
» So, I’ve got all this data…what now? » Data screening – important to check for errors, assumptions, and outliers. » What’s the most important? ˃Depends.
3 Strategies to Tackling Multiple Choice questions 1.Plug in a number 2.Back-solving 3.Guessing.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
MTMM Byrne Chapter 10. MTMM Multi-trait multi-method – Traits – the latent factors you are trying to measure – Methods – the way you are measuring the.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Set-up a Data Entry Page Section 3. Set Up Columns Switch to Variable View. At the bottom left of your screen there are two tabs (Data View and Variable.
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
Tutorial I: Missing Value Analysis
CFA Model Revision Byrne Chapter 4 Brown Chapter 5.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Pre-Algebra Tutorial. Pre-Algebra Equations x + 3 = 5 What is the value of x? At first glance this may look easy since all you have to ask yourself is.
Creating a data set From paper surveys to excel. STEPS 1.Order your filled questionnaires 2.Number your questionnaires 3.Name your variables. 4.Create.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Strategies to help maximize your marks. Math Workout for the New SAT – The Princeton Review.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
SOC 305, Southeastern Louisiana University Prof. Robert Martin.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Schoolwires How to modify your classroom webpage.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
RUNNING GFU REPORTS Let’s Get Started!. VP’s: This is what it was.
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
Matrix. Matrix Matrix Matrix (plural matrices) . a collection of numbers Matrix (plural matrices)  a collection of numbers arranged in a rectangle.
Warm Up Use scalar multiplication to evaluate the following:
اختر أي شخصية واجعلها تطير!
Dealing with missing data
Working with missing Data
An AS Lesson Using the LDS to teach content on Data Collection and Processing.
Quarter 1.
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Presentation transcript:

Missing Values C5.2 Data Screening

Missing Data Use the summary function to check out the missing data for your dataset. summary(notypos)

Missing Data Missing data is an important problem. First, ask yourself, “why is this data missing?” – Because you forgot to enter it? – Because there’s a typo? – Because people skipped one question? Or the whole end of the scale?

Missing Data Two Types of Missing Data: – MCAR – missing completely at random (you want this) – MNAR – missing not at random (eek!) There are ways to test for the type, but usually you can see it – Randomly missing data appears all across your dataset. – If everyone missed question 7 – that’s not random. – (click on the dataset or use the View() function.

Missing Data MCAR – probably caused by skipping a question or missing a trial. MNAR – may be the question that’s causing a problem. – For instance, what if you surveyed campus about alcohol abuse? What does it mean if everyone skips the same question?

Missing Data How much can I have? – Depends on your sample size – in large datasets <5% is ok. – Small samples = you may need to collect more data. Please note: there is a difference between “missing data” and “did not finish the experiment”.

Missing Data How do I check if it’s going to be a big deal? – Try running your analysis on the dataset with missing data versus the dataset with the missing data filled in. – In R that’s easy! Yeah! You just change out the name of the dataset you are using, since we are saving them separately as we go.

Missing Data Deleting people / variables You can exclude people “pairwise” or “listwise” – Pairwise – only excludes people when they have missing values for that analysis – Listwise – excludes them for all analyses Variables – if it’s just an extraneous variable (like GPA) you can just delete the variable

Missing Data What if you don’t want to delete people (using special people or can’t get others)? – Several estimation methods to “fill in” missing data

Missing Data Mean substitution – the old way to enter missing data – Conservative – doesn’t change the mean values used to find significant differences – Does change the variance, which may cause significance tests to change with a lot of missing data

Missing Data Multiple imputation / expected maximization – now considered the best at replacing missing data – Creates an expected values set for each missing point – Using matrix algebra, the program estimates the probably of each value and picks the highest one

Missing Data DO NOT mean replace categorical variables – You can’t be 1.5 gender. – So, either leave them out OR pairwise eliminate them (aka eliminate only for the analysis they are used in).

Missing Data DataCategorical/IVsSTOPContinuous/DVsMnarSTOPMcarMore > 5%STOPLess < 5%MICE

Missing Data Figure out what you can replace. – First, figure out the percent missing by column. – Then, figure out the percent missing by row. Let’s write a function!

Missing Data Make up our own percent missing function. Percentmiss = ##save the function – function(x){ ##this line says make a new function – sum(is.na(x)) ## this line totals up the number of NA values – /length(x) ##divide by the length of the values – * 100 ##gives us the percent – } ##close function

Missing Data Let’s use apply to get percent missing by columns and rows. – apply(notypos, 2, percentmiss) ##columns – We will have to exclude several of these columns.

Missing Data Now, let’s use apply to get percent missing by rows – apply(notypos, 1, percentmiss) Too much info! missing = apply(notypos, 2, percentmiss) table(missing)

Missing Data Install the mice package. Load the mice library. Select only the data that you want to run mice on: – Eliminate bad rows. – Eliminate bad columns. – Bring them all back together

Missing Data ##subset out the bad rows replacepeople = notypos[ missing < 6, ] ##note we are going to fudge a little bit dontpeople = notypos[ missing >= 6, ]

Missing Data ##figure out the columns to exclude replacecolumn = replacepeople[, -c(1, 3, 13)] dontcolumn = replacepeople[, c(1,3,13)]

Missing Data Now run mice! Set a temporary place holder: – tempnomiss = mice(DATASET) – tempnomiss = mice(replacecolumn) This function figures out what and how to replace for you.

Missing Data Now, put the replaced data back into your dataset. – nomiss = complete(tempnomiss, 1) complete(dataset you ran mice on, number < 10) – summary(nomiss)

Missing Data Put everything back together – We want to take our replaced data – And add back in our columns we couldn’t replace Dontcolumn filledin_none = cbind(dontcolumn, nomiss) – And add back in our rows we couldn’t replace Dontpeople filledin_missing = rbind(dontpeople, filledin_none)