Psych 524 Andrew Ainsworth Data Screening 1. Data check entry One of the first steps to proper data screening is to ensure the data is correct Check out.

Slides:



Advertisements
Similar presentations
Dummy Variables. Introduction Discuss the use of dummy variables in Financial Econometrics. Examine the issue of normality and the use of dummy variables.
Advertisements

Transformations & Data Cleaning
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Stats Lunch: Day 2 Screening Your Data: Why and How.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Detecting univariate outliers Detecting multivariate outliers
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
Lecture 6: Multiple Regression
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Introduction to Linear Regression and Correlation Analysis
Multivariate Statistical Data Analysis with Its Applications
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Measures of Dispersion & The Standard Normal Distribution 9/12/06.
Basics of Data Cleaning
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
The Beast of Bias Data Screening Chapter 5. Bias Datasets can be biased in many ways – but here are the important ones: – Bias in parameter estimates.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Canonical Correlation Psy 524 Andrew Ainsworth. Matrices Summaries and reconfiguration.
Adjusted from slides attributed to Andrew Ainsworth
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
» So, I’ve got all this data…what now? » Data screening – important to check for errors, assumptions, and outliers. » What’s the most important? ˃Depends.
Chapter 10 The t Test for Two Independent Samples
Multiple regression.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Outliers Chapter 5.3 Data Screening. Outliers can Bias a Parameter Estimate.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
Chapter 6: Analyzing and Interpreting Quantitative Data
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
ALISON BOWLING STRUCTURAL EQUATION MODELLING. WHAT IS SEM? Structural equation modelling is a collection of statistical techniques that allow a set of.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Canonical Correlation. Canonical correlation analysis (CCA) is a statistical technique that facilitates the study of interrelationships among sets of.
MANOVA Lecture 12 Nuance stuff Psy 524 Andrew Ainsworth.
Factorial BG ANOVA Psy 420 Ainsworth. Topics in Factorial Designs Factorial? Crossing and Nesting Assumptions Analysis Traditional and Regression Approaches.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Statistics for Education Research Lecture 4 Tests on Two Means: Types and Paired-Sample T-tests Instructor: Dr. Tung-hsien He
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Unit 9: Dealing with Messy Data I: Case Analysis
Exploring Group Differences
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Chapter 12: Regression Diagnostics
Fundamentals of regression analysis
Linear Regression Models
Review Quiz.
Nasty data… When killer data can ruin your analyses
Multiple Linear Regression
Unit 3 – Linear regression
CH2. Cleaning and Transforming Data
Checking the data and assumptions before the final analysis.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Structural Equation Modeling
Presentation transcript:

Psych 524 Andrew Ainsworth Data Screening 1

Data check entry One of the first steps to proper data screening is to ensure the data is correct Check out each person’s entry individually Makes sense if small data set or proper data checking procedure Can be too costly so… range of data should be checked

Assumption Checking

Normality All of the continuous data we are covering need to follow a normal curve Skewness (univariate) – this represents the spread of the data

Normality skewness statistic is output by SPSS and SE skewness is

Normality Kurtosis (univariate) – is how peaked the data is; Kurtosis stat output by SPSS Kurtosis standard error = for most statistics the skewness assumption is more important that the kurtosis assumption

Skewness and Kurtosis

Outliers technically it is a data point outside of you distribution; so potentially detrimental because may have undo effect on distribution

Outliers Univariate (brains in arc) Should always check that data is coded correctly Two ways of looking at it a data point represents an outlier if it is disconnected from the rest of the distribution Data is an outlier if it has a Z-score above 3.3 If there is a concern – run data with and without to see if it has any influence on the data

Outliers Leverage – is how far away a case is from the rest of the data Discrepancy – is the degree to which a data point lines up with the rest of the data Influence – amount of change in the regression equation (Bs) when a case is deleted. Calculated as a combination of Leverage and Discrepancy

Outliers

Dealing w/ univariate outliers Once you find outliers Look into the case to see if there are indicators that the case is not part of your intended sample If this is true delete the case Reduce influence of outlier Move value inward toward the rest of the distribution, while still leaving it extreme

Multivariate Outliers Subject score may not be an outlier on any single variable; but on a combination of variables the subject is an outlier “Being a teenager is normal, making $50,000 a year is normal, but a teenager making $50,000 a year is a multivariate outlier”.

Multivariate Outliers Mahalanobis distance – measurement of deviance from the centroid (center of multivariate distribution created by the means of all the variables) Computing Mahalanobis distances you get a chi square distribution  2 (df = # variables), Lookup critical value (with α =.001) if MD is above the CV the participant is a multivariate outlier If Multivariate outliers found, not much to do except delete the case

Linearity relationships among variables are linear in nature; assumption in most analyses Example resptran in arc

Homoscedasticity (geese in arc) For grouped data this is the same as homogeneity of variance For ungrouped data – variability for one variables is the same at all levels of another variable (no variance interaction)

Multicollinearity/Singularity If correlations between two variables are excessive (e.g..95) then this represents multicollinearity If correlation is 1 then you have singularity Often Multicollinearity/Singularity occurs in data because one variable is a near duplicate of another (e.g. variables used plus a composite of the variables)