CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.

Slides:



Advertisements
Similar presentations
Transformations & Data Cleaning
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Psych 524 Andrew Ainsworth Data Screening 1. Data check entry One of the first steps to proper data screening is to ensure the data is correct Check out.
Lecture 25 Multiple Regression Diagnostics (Sections )
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 6: Multiple Regression
Lecture 24 Multiple Regression (Sections )
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Topic 3: Regression.
Regression Diagnostics Checking Assumptions and Data.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Screening the Data Tedious but essential!.
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Multivariate Statistical Data Analysis with Its Applications
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
CJT 765 Quantitative Methods in Communication Structural Equation Modeling: Highlights for Quiz 1.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Basics of Data Cleaning
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
» So, I’ve got all this data…what now? » Data screening – important to check for errors, assumptions, and outliers. » What’s the most important? ˃Depends.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Tutorial I: Missing Value Analysis
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
CJT 765: Structural Equation Modeling
Regression Analysis Simple Linear Regression
Chapter 12: Regression Diagnostics
Multivariate Analysis Lec 4
Nasty data… When killer data can ruin your analyses
CH2. Cleaning and Transforming Data
Regression Diagnostics
Regression III.
Checking the data and assumptions before the final analysis.
Chapter 13 Additional Topics in Regression Analysis
Clinical prediction models
Presentation transcript:

CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement

Outline of Class  Finishing up Multiple regression  Data Screening  Fixing Distributional Problems  Statistical Inference and Power of NHSTs  Factors affecting size of r

Possible Relationships among Variables

Multicollinearity  Existence of substantial correlation among a set of independent variables.  Problems of interpretation and unstable partial regression coefficients  Tolerance = 1 – R 2 of X with all other X  VIF = 1/Tolerance  VIF > 8.0 not a bad indicator  How to fix:  Delete one or more variables  Combine several variables

Standardized vs. Unstandardized Regression Coefficients  Standardized coefficients can be compared across variables within a model  Standardized coefficients reflect not only the strength of the relationship but also variances and covariances of variables included in the model as well of variance of variables not included in the model and subsumed under the error term  As a result, standardized coefficients are sample-specific and cannot be used to generalize across settings and populations

Standardized vs. Unstandardized Regression Coefficients (cont.)  Unstandardized coefficients, however, remains fairly stable despite differences in variances and covariances of variables in different settings or populations  A recommendation: Use std. coeff. To compare effects within a given population, but unstd. coeff. To compare effects of given variables across populations.  In practice, when units are not meaningful, behavioral scientists outside of sociology and economics use standardized coefficients in both cases.

Fixing Distributional Problems  Analyses assume normality of individual variables and multivariate normality, linearity, and homoscedasticity of relationships  Normality: similar to normal distribution  Multivariate normality: residuals of prediction are normally and independently distributed  Homoscedasticity: Variances of residuals vary across values of X

Transformations: Ladder of Re-Expressions  Power  Inverses (roots)  Logarithms  Reciprocals

Suggested Transformations Distributional Problem Transformation Mod. Pos. Skew Square root Substantial pos. skew Log (x+c)* Severe pos. skew, L-shaped 1/(x+c)* Mod. Negative skew Square root (k-x) Substantial neg. skew Log (k-x) Severe. Neg. skew, J shapred 1/(k-x)

Dealing with Outliers  Reasons for univariate outliers:  Data entry errors--correct  Failure to specify missing values correctly--correct  Outlier is not a member of the intended population-- delete  Case is from the intended population but distribution has more extreme values than a normal distribution—modify value  3.29 or more SD above or below the mean a reasonable dividing line, but with large sample sizes may need to be less inclusive

Multivariate outliers  Cases with unusual patterns of scores  Discrepant or mismatched cases  Mahalanobis distance: distance in SD units between set of scores for individual case and sample means for all variables

Linearity and Homoscedasticity  Either transforming variable(s) or including polynomial function of variables in regression may correct linearity problems  Correcting for normality of one or more variables, or transforming one or more variables, or collapsing among categories may correct heteroscedasticity. “Not fatal,” but weakens results.

Missing Data  How much is too much?  Depends on sample size  20%?  Why a problem?  Reduce power  May introduce bias in sample and results

Types of Missing Data Patterns  Missing at random (MAR)—missing observations on some variable X differ from observed scores on that variable only by chance. Probabilities of missingness may depend on observed data but not missing data.  Missing completely at random (MCAR)—in addition to MAR, presence vs. absence of data on X is unrelated to other variables. Probabilities of missingness also not dependent on observed ata.  Missing not at random (MNAR)

Methods of Reducing Missing Data  Case Deletion  Substituting Means on Valid Cases  Substituting estimates based on regression  Multiple Imputation  Each missing value is replaced by list of simlulated values. Each of m datasets is analyzed by a complete-data method. Results combined by averaging results with overall estimates and standard errors.  Maximum Likelihood (EM) method:  Fill in the missing data with a best guess under current estimate of unknown parameters, then reestimate from observed and filled-in data

Measures  Validity  Construct  Criterion  Reliability  Internal Consistency  Test-rest

Checklist for Screening Data  Inspect univariate descriptive statistics  Evaluate amount/distribution of missing data  Check pairwise plots for nonlinearity and heteroscedasticity  Identify and deal with nonnormal variables  Identify and deal with multivariate outliers  Evaluate variables for multicollinearity  Assess reliability and validity of measures