D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.

Slides:



Advertisements
Similar presentations
Repeated Measures/Mixed-Model ANOVA:
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Stats Lunch: Day 2 Screening Your Data: Why and How.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Psych 524 Andrew Ainsworth Data Screening 1. Data check entry One of the first steps to proper data screening is to ensure the data is correct Check out.
BA 555 Practical Business Analysis
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 6: Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Multiple Regression – Assumptions and Outliers
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
1 4. Multiple Regression I ECON 251 Research Methods.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Multiple Regression Research Methods and Statistics.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Forecasting with Simple Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression Dr. Andy Field.
Screening the Data Tedious but essential!.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Introduction to Linear Regression and Correlation Analysis
Multivariate Statistical Data Analysis with Its Applications
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Basics of Data Cleaning
Regression Models Residuals and Diagnosing the Quality of a Model.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Chapter 16 Data Analysis: Testing for Associations.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Pearson’s Correlation and Bivariate Regression Lab Exercise: Chapter 9 1.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Canonical Correlation. Canonical correlation analysis (CCA) is a statistical technique that facilitates the study of interrelationships among sets of.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
AP Review Exploring Data. Describing a Distribution Discuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Regression Analysis Simple Linear Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 12: Regression Diagnostics
Regression.
Linear Regression Models
Residuals and Diagnosing the Quality of a Model
Nasty data… When killer data can ruin your analyses
Lecture Slides Elementary Statistics Thirteenth Edition
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
CH2. Cleaning and Transforming Data
Checking the data and assumptions before the final analysis.
Regression Assumptions
Checking Assumptions Primary Assumptions Secondary Assumptions
Regression Assumptions
Presentation transcript:

D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses

Data entered in computer l assuming reasonable care was taken l scanner probably most "error free" l checking physical forms against file l verifying any recoding or score calculations l "list cases"(mac) or "case summaries” (windows)

Data screening l descriptives: look for out of range values l check values against original forms l correct data in file

Missing data l respondents will not answer all questions on a survey l what to do about items where data is missing? l several options to consider/ways to address

Missing data (cont.) l single variable - is systematic bias present in the kinds of people who fail to answer an item? l if the amount of missing data is small don't really need to worry l use pairwise deletion l pairwise can cause problems

Missing data (cont.) l drop subject's data completely l if missing data on unimportant variable don't analyze l if a reasonable guess can be made based on other available variables, do it l numerical variable - use average

Missing data (cont.) l correlation between answered and unanswered questions regression equation to predict values on one variable based on others for which we have data l new variable that flags whether they answered question or not analyze for possible differences on some other variable.

Outliers l exert influence on the mean l inflate variance of the sample l identify - look at a graph or run explore requesting outliers l rule out some kind of data problem l can dump and not use l compromise is to move outlier l residual analysis and detecting multivariate outliers when we move on to multiple regression (e.g. Mahalanobis Dist.)

Normality l assessing univariate normality look at graph skew and kurtosis values can test significance divide by standard error result is a z score

Normality (cont.) l tells us whether skew/kurtosis is significantly different than "0” l does not necessarily mean it is a problem l Kline's (1998) recommendations skewness values > 3 and kurtosis > 10 l If seriously violated transforming is an option

Linearity of relationship l relationship between variables reasonably summarized by straight line l check scatterplot l may be curvilinear

Homoscedasticity l assumption that variation in one variable is constant across range of another variable l check scatterplot

Homoscedasticity