Download presentation
Presentation is loading. Please wait.
Published byAubrey Welch Modified over 9 years ago
1
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses
2
Data entered in computer l assuming reasonable care was taken l scanner probably most "error free" l checking physical forms against file l verifying any recoding or score calculations l "list cases"(mac) or "case summaries” (windows)
3
Data screening l descriptives: look for out of range values l check values against original forms l correct data in file
4
Missing data l respondents will not answer all questions on a survey l what to do about items where data is missing? l several options to consider/ways to address
5
Missing data (cont.) l single variable - is systematic bias present in the kinds of people who fail to answer an item? l if the amount of missing data is small don't really need to worry l use pairwise deletion l pairwise can cause problems
6
Missing data (cont.) l drop subject's data completely l if missing data on unimportant variable don't analyze l if a reasonable guess can be made based on other available variables, do it l numerical variable - use average
7
Missing data (cont.) l correlation between answered and unanswered questions regression equation to predict values on one variable based on others for which we have data l new variable that flags whether they answered question or not analyze for possible differences on some other variable.
8
Outliers l exert influence on the mean l inflate variance of the sample l identify - look at a graph or run explore requesting outliers l rule out some kind of data problem l can dump and not use l compromise is to move outlier l residual analysis and detecting multivariate outliers when we move on to multiple regression (e.g. Mahalanobis Dist.)
9
Normality l assessing univariate normality look at graph skew and kurtosis values can test significance divide by standard error result is a z score
10
Normality (cont.) l tells us whether skew/kurtosis is significantly different than "0” l does not necessarily mean it is a problem l Kline's (1998) recommendations skewness values > 3 and kurtosis > 10 l If seriously violated transforming is an option
11
Linearity of relationship l relationship between variables reasonably summarized by straight line l check scatterplot l may be curvilinear
12
Homoscedasticity l assumption that variation in one variable is constant across range of another variable l check scatterplot
13
Homoscedasticity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.