Presentation is loading. Please wait.

Presentation is loading. Please wait.

D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.

Similar presentations


Presentation on theme: "D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses."— Presentation transcript:

1 D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses

2 Data entered in computer l assuming reasonable care was taken l scanner probably most "error free" l checking physical forms against file l verifying any recoding or score calculations l "list cases"(mac) or "case summaries” (windows)

3 Data screening l descriptives: look for out of range values l check values against original forms l correct data in file

4 Missing data l respondents will not answer all questions on a survey l what to do about items where data is missing? l several options to consider/ways to address

5 Missing data (cont.) l single variable - is systematic bias present in the kinds of people who fail to answer an item? l if the amount of missing data is small don't really need to worry l use pairwise deletion l pairwise can cause problems

6 Missing data (cont.) l drop subject's data completely l if missing data on unimportant variable don't analyze l if a reasonable guess can be made based on other available variables, do it l numerical variable - use average

7 Missing data (cont.) l correlation between answered and unanswered questions regression equation to predict values on one variable based on others for which we have data l new variable that flags whether they answered question or not analyze for possible differences on some other variable.

8 Outliers l exert influence on the mean l inflate variance of the sample l identify - look at a graph or run explore requesting outliers l rule out some kind of data problem l can dump and not use l compromise is to move outlier l residual analysis and detecting multivariate outliers when we move on to multiple regression (e.g. Mahalanobis Dist.)

9 Normality l assessing univariate normality look at graph skew and kurtosis values can test significance divide by standard error result is a z score

10 Normality (cont.) l tells us whether skew/kurtosis is significantly different than "0” l does not necessarily mean it is a problem l Kline's (1998) recommendations skewness values > 3 and kurtosis > 10 l If seriously violated transforming is an option

11 Linearity of relationship l relationship between variables reasonably summarized by straight line l check scatterplot l may be curvilinear

12 Homoscedasticity l assumption that variation in one variable is constant across range of another variable l check scatterplot

13 Homoscedasticity


Download ppt "D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses."

Similar presentations


Ads by Google