Statistical Control Statistical Control is about –underlying causation –under-specification –measurement validity –“Correcting” the bivariate relationship.

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

Transformations & Data Cleaning
Survey of Major Correlational Models/Questions Still the first & most important is picking the correct model… Simple correlation questions simple correlation.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Research Hypotheses and Multiple Regression: 2 Comparing model performance across populations Comparing model performance across criteria.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Regression Bivariate Multivariate. Simple Regression line – line of best fit Ŷ = a + bX The mean point is always on the line Y is the criterion variable.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Research Hypotheses and Multiple Regression Kinds of multiple regression questions Ways of forming reduced models Comparing “nested” models Comparing “non-nested”
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Statistical Control Regression & research designs Statistical Control is about –underlying causation –under-specification –measurement validity –“Correcting”
Regression Models w/ k-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Introduction to Path Analysis Reminders about some important regression stuff Boxes & Arrows – multivariate models beyond MR Ways to “think about” path.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Multiple Regression Models
Survey of Major Correlational Models/Questions Still the first & most important is picking the correct model… Simple correlation questions simple correlation.
Introduction the General Linear Model (GLM) l what “model,” “linear” & “general” mean l bivariate, univariate & multivariate GLModels l kinds of variables.
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
Introduction to Path Analysis & Mediation Analyses Review of Multivariate research & an Additional model Structure of Regression Models & Path Models Direct.
Multivariate Analyses & Programmatic Research Re-introduction to Programmatic research Factorial designs  “It Depends” Examples of Factorial Designs Selecting.
Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.
Introduction to Multivariate Research & Factorial Designs
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
Introduction to Path Analysis Ways to “think about” path analysis Path coefficients A bit about direct and indirect effects What path analysis can and.
Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Design Conditions & Variables Limitations of 2-group designs “Kinds” of Treatment & Control conditions Kinds of Causal Hypotheses Explicating Design Variables.
Simple Correlation RH: correlation questions/hypotheses and related statistical analyses –simple correlation –differences between correlations in a group.
Today Concepts underlying inferential statistics
An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Pearson’s r Scatterplots for 2 Quantitative Variables Research and Null Hypotheses for r Casual Interpretation of Correlation Results (& why/why not) Computational.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression Dr. Andy Field.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Understanding Statistics
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Types of validity we will study for the Next Exam... internal validity -- causal interpretability external validity -- generalizability statistical conclusion.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Correlation & Regression
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 4 Copyright © 2015 by R. Halstead. All rights reserved.
Regression Models w/ 2 Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting the regression.
Regression Models w/ 2-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
Education 795 Class Notes P-Values, Partial Correlation, Multi-Collinearity Note set 4.
Review #2.
Simple Correlation Scatterplots & r –scatterplots for continuous - binary relationships –H0: & RH: –Non-linearity Interpreting r Outcomes vs. RH: –Supporting.
CORRELATIONS: PART II. Overview  Interpreting Correlations: p-values  Challenges in Observational Research  Correlations reduced by poor psychometrics.
1 Psych 5510/6510 Chapter 13: ANCOVA: Models with Continuous and Categorical Predictors Part 3: Within a Correlational Design Spring, 2009.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
SOCW 671 #11 Correlation and Regression. Uses of Correlation To study the strength of a relationship To study the direction of a relationship Scattergrams.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, partial correlations & multiple regression Using model plotting to think about ANCOVA & Statistical control Homogeneity.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
General Linear Model What is the General Linear Model?? A bit of history & our goals Kinds of variables & effects It’s all about the model (not the data)!!
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Quantitative Methods in the Behavioral Sciences PSY 302
Multiple Regression.
Chapter 15: Correlation.
Multiple Regression.
An Introduction to Correlational Research
Presentation transcript:

Statistical Control Statistical Control is about –underlying causation –under-specification –measurement validity –“Correcting” the bivariate relationship Regression & Residual formulas Correlations among y, x, y’, & residuals Control for Design & Measurement problems “Controlling” proxy variables Limitations & advantages of statistical control

Variations of statistical control Control of single variables partial correlation -- correlation between two variables (x & y) controlling both for some 3 rd variable (z) -- r yx.z semi-partial correlation -- correlation between two variables (part correlation) (x & y) controlling one of the variables for some 3 rd variable (z) -- r y (x.z) & r x (y.z)

Control of multiple variables… multiple partial correlation -- like partial, but with “multiple 3 rd variables” -- r yx.zabc multiple semi-partial correlation -- like semi-partial, but with “multiple 3 rd variables” -- r y(x.zabc) vs. r x(y.zabc) ANCOVA -- ANalysis of COVAriance -- a kind of semi-partial or multiple semi-partial corr -- test of the DV mean difference between IV conditions that is independent of some 3 rd variable (called the “covariate”) -- r DV (IV.cov)

Statistical control is about the underlying causal model of the variables … X Y r Y,X Z 3 rd variable problem … relationship between x & y exists because a 3 rd variable is influencing both Z might be a causal or a psychometric influence X Y Z r Y,X βx βz Combining the two… Part of the story may be “indirect effects" X Y r Y,X Z βz βx Misspecification problem … by ignoring Z, the X-Y relationship misses part of the story

“3rd variable problems” Here’s a well-known example... Ice cream sales Violent crimes When plotted by week or by month -- there is a +r between ice cream sales & amount of violent crime. Huh? Does eating ice cream make you violent ? Does being violent make you crave ice cream ? Is there some “3 rd variable” variable that is “producing” the bivariate correlation? What might it be ??? Violent crimes Ice cream sales Here’s the same scatterplot, but with the size of each data point representing the average temperature of that period. We can see that there is no relationship between violent crimes and ice cream sales after controlling both for temperature. Temperature might be that “3 rd variable”.

We found a -r between # therapy sessions and amount of symptomatic improvement! Huh?!? Let’s think through this... The sample is heterogeneous with respect to initial level of depression Initial level of depression is likely to be related to # sessions they attend Initial level of depression is likely to be related to symptomatic improvement So, is the relationship each of these variables has with initial level of depression “producing” the bivariate correlation we found? # sessions symptomatic improvement # sessions dotsize indicates initial depression… Those who are more depressed come to more sessions and show less improvement. Those who are less depressed come to fewer sessions and show more improvement. Another example...

Misspecification problems… When we take a bivariate look at part of a multivariate picture … we’ll underestimate how much we can know about the criterion we’ll likely mis-estimate how that predictor relates to the criterion leaving predictors out usually leads to over-estimation r > β leaving predictors out changes the collinearity structure, and so, might cause us to miss suppressor effects r < β Statistical control in an attempt to improve this with a “better r”… what “would be” the bivariate relationship between these variables, in a population for which the control variable(s) is a constant (and so is not collinear with these variables) ? it is very much like looking at the β for that predictor in a multiple regression, but is in the form of a “corrected” simple correlation r y(x.z) ≈ β x from y’ = β x X + β z Z “What is relationship between y & part of x independent of Z?

Statistical control is about “correcting” the bivariate correlation to take the control variable(s) “into account” What do we get from this? Here’s where opinions differ … 1.A substitute for experimantal control ? 2.A better estimate of the causal relationship of the 2 variables ? 3.A substitute for construct validity ? 4.Solves under-specification problem ? 5.Probably a better description of the relationship of these 2 variables than is the bivariate analysis ? Rejection of 1-4 (probably for good reasons) has led some to reject the 5 th as well …but then what are we to do? Only perform bivariate analyses (known to be flawed) ? Expect to construct the “full story” from convergent research? (which is already the answer for “what’s the correct study”!!!) More complex models are, on average, more likely to be accurate!

Apparent Bivariate Correlation Bivariate Correlation after Correction for 3 rd Variable Some examples of taking the 3 rd variable into account

Of course, it can be uglier than that… Here, what “correction” you get depends upon which variable you control for – remember, larger models are only more accurate “on average” Here, the addition of the “Z” variable will help, but there is also a 3 rd – the interaction of X & Z

Here are two formula that contain “all you need to know” y’ = bx + a residual = y - y’ y = the criterionx = the predictor y’ = the predicted criterionvalue -- that part of the criterion that is related to the predictor ( Note: r yx = r yy’ ) residual = difference between criterion and predicted criterion values -- the part of the criterion not related to the predictor (Note: r x res = 0 or r x(y-y’) = 0 )

Summary of Interrelationships among the key values… XYY’Y - Y’ X--- y’ “is” x Y r yx--- so, y is related to y’ as to x Y’1.0 r yx--- Y - Y’0.0  (1 - r yx 2 ) the residual is the part of the criterion not related to the y or y’ (which “is” x)

Examples of Statistical Control A design problem … We want to assess the correlation between the amount of homework completed and performance on exams. Since we can not RA participants to amount of homework, we are concerned that motivation will be a confound. Specifically, we think that motivation will influence both the number of homeworks each participant completes and how well they prepare for the exam. So, the question we want to ask can be phrased as, “What is the correlation between amount of homework completed and test performance that is independent of (not related to) motivation?” To do this we want to correlate the part of homeworks completed that is not related to motivation, with the part of test scores that is not related to motivation. E H M r H,E ?

Design control using Partial Correlation by residualization Step 1a predict # homeworks completed from motivation #hwks’ = b(motivation) + a Step 1b find the residual (part of #hwks not related to motivation) residual #hwks = #hwks - #hwks’ ( rem: r x (y-y’) = 0 ) Step 2a predict test scores from motivation test’ = b(motivation) + a Step 2b find the residual (part of test scores not related to motivation) residual test = test - test’ ( again: r x (y-y’) = 0 ) Step 3 correlate the residuals (the part of # hwks not related to motivation and the part of test scores not related to motivation) to find the relationship between # hwks and test scores that is independent of motivation

Design control using Semi-Partial Correlation by residualization Consider a version of this question for which we want to control only homework scores for motivation Step 1a predict # homeworks completed from motivation #hwks’ = b(motivation) + a Step 1b find the residual (part of #hwks not related to motivation) residual #hwks = #hwks - #hwks’ ( rem: r x (y-y’) = 0 ) Step 2 correlate the residual of homework (the part of # hwks not related to motivation) and original test scores to find the relationship between test scores and that part of homework scores that is independent of motivation. Notice that the difference between partial and semi-partial (part) correlations is that when doing the semi-partial we residualize only one of the variables we want to correlate.

Examples of Statistical Control A measurement problem … We want to assess the correlation between performance on quizzes and performance on the final exam. But we know that both of these variables include “test taking speed”. So, we are concerned that the correlation between the two types of performance will be “tainted” or “inflated”. To do this we want to correlate that part of quiz scores that is not related to test taking speed, with that part of final exam scores that is not related to test taking speed. So, the question we want to ask can be phrased as, “What is the correlation between quiz performance and final exam performance that is independent of (not related to) test taking speed?” Q E r Q,E S Knowledge at time of Quiz Knowledge at time of Exam ?

Measurement control using Partial Correlation by residualization Step 1a predict quiz scores from test taking speed quiz’ = b(speed) + a Step 1b find the residual (part of quiz scores not related to test taking speed) residual quiz = quiz - quiz’ Step 2a predict final exam scores from test taking speed final’ = b(speed) + a Step 2b find the residual (part of final exam scores not related to test taking speed) residual final = final - final’ Step 3 correlate the residuals (the part of quiz scores not related to speed and the part of final scores not related to speed) to find the relationship between quiz scores and final scores that is independent of test taking speed

Measurement control using Semi-Partial Correlation by residualization Consider a version of this question that involves controlling only quiz scores for test taking speed. Step 1a predict quiz scores from test taking speed quiz’ = b(speed) + a Step 1b find the residual (part of quiz scores not related to test taking speed) residual quiz = quiz - quiz’ ( rem: r x (y-y’) = 0 ) Step 2 correlate the residual of quiz scores (the part of quiz scores not related to speed) and original final scores to find the relationship between final scores that part of quiz scores that is independent of test taking speed Notice that the difference between partial and semi-partial (part) correlations is that when doing the semi-partial we residualize only one of the variables we want to correlate.

“Customized” Statistical Control… It is possible to “control” the X and Y variables for different variables, for example… controlling Final scores for test taking speed while controlling Homework scores for motivation & #practices. final’ = b(speed) + a residual final = final - final’ #hwks’ = b(motivation) b(#pract) + a residual #hwks = #hwks - #hwks’ Correlate residual final & residual #hwks Please note: 1) There is no MReg equivalent, 2) There must be a theoretical reason for doing this! Step 1 Step 2 Step 3

Statsitical control is an obvious help with under-specification, but it can also be help with proxy variables … Say we find r experf, hwperf =.4  we might ask for what is “homework performance” a proxy? Consider other things that are related to hwperf & add those to the model, to see if “the part of hwperf that isn’t them” is still related to experf… lots of possible results  each with a different interp!!! r experf,(hwperf.mot) =.1  part of hwperf is “really” motivation (usual collinearity result) r experf,(hwperf.mot) =.4  none of hwper is motivation (mot and hwperf not collinear) r experf,(hwperf.mot) =.6  part of hwperf not related to mot is more corr with experf than hwperf (suppressor) Should continue with other variables  can’t control everything, so we should focus on most important variables to consider as confounds or measurement confounds.

More about “Problems” with statistical control In analyses such as the last two examples, we have statistically changed the research question, for example. 1st e.g. -- Instead of asking, “What is the relationship between quiz scores and final exam scores that is independent of test taking speed in the population represented by the sample?” We are asking, “What would be the correlation between quiz scores and final exam scores in the population, if all the members of that population had the same test taking speed?” 2nd e.g. -- Instead of asking, “What is the relationship between the amount of homework completed and test performance that is independent of motivation?” We are asking, “What would be the correlation between amount of homework completed and test performance in the population, if all members of that population had the same motivation?” Populations such as these are unlikely to be representative !!

Advantages of Statistical Control 1st one -- Sometimes experimental control is impossible. Sometimes “intrusion free” measures aren’t available 2nd one -- Statistical control is often less expensive (might even be possible with available data, if the control variables have been collected). Doing the analysis with statistical control can help us decide whether or not to commit the resources to perform the experimental control or create better measures 3rd one -- often the only form of experimental control available is post-hoc matching (because you’re studying natural or intact groups). Preliminary analyses that explore the most effective variables for “statistical control” can help you target the most appropriate variables to match on. Can also retain sample size (hoc matching of mis-matched groups can decrease sample size dramatically)