G Lecture 2 Regression as paths and covariance structure

Slides:



Advertisements
Similar presentations
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Advertisements

1 G Lect 4M Interpreting multiple regression weights: suppression and spuriousness. Partial and semi-partial correlations Multiple regression in.
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,
The Simple Linear Regression Model: Specification and Estimation
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 4 Multiple Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
The Simple Regression Model
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
G Lect 31 G Lecture 3 SEM Model notation Review of mediation Estimating SEM models Moderation.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Objectives of Multiple Regression
3.1 Ch. 3 Simple Linear Regression 1.To estimate relationships among economic variables, such as y = f(x) or c = f(i) 2.To test hypotheses about these.
Soc 3306a Lecture 8: Multivariate 1 Using Multiple Regression and Path Analysis to Model Causality.
7.1 Multiple Regression More than one explanatory/independent variable This makes a slight change to the interpretation of the coefficients This changes.
G Lect 21 G Lecture 2 Regression as paths and covariance structure Alternative “saturated” path models Using matrix notation to write linear.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
1 G Lect 8b G Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Ordinary Least Squares Regression.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
1 G Lect 2M Examples of Correlation Random variables and manipulated variables Thinking about joint distributions Thinking about marginal distributions:
G Lecture 7 Confirmatory Factor Analysis
1 G Lect 2w Review of expectations Conditional distributions Regression line Marginal and conditional distributions G Multiple Regression.
G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.
G Lecture 3 Review of mediation Moderation SEM Model notation
1 G Lect 3M Regression line review Estimating regression coefficients from moments Marginal variance Two predictors: Example 1 Multiple regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
Methods of Presenting and Interpreting Information Class 9.
Multiple Regression.
The simple linear regression model and parameter estimation
Chapter 4: Basic Estimation Techniques
Regression Analysis AGEC 784.
REGRESSION G&W p
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Repeated Measures Analysis: An Example Math tools: Notation
CH 5: Multivariate Methods
Regression.
Chapter 5 STATISTICS (PART 4).
Evgeniya Anatolievna Kolomak, Professor
The Simple Regression Model
Understanding Standards Event Higher Statistics Award
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
CHAPTER 29: Multiple Regression*
Inference about the Slope and Intercept
Multiple Regression.
Inference about the Slope and Intercept
Scientific Practice Regression.
OVERVIEW OF LINEAR MODELS
Structural Equation Modeling
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Product moment correlation
Regression Analysis.
The Multiple Regression Model
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
3 basic analytical tasks in bivariate (or multivariate) analyses:
MGS 3100 Business Analysis Regression Feb 18, 2016
Structural Equation Modeling
Presentation transcript:

G89.2247 Lecture 2 Regression as paths and covariance structure Alternative “saturated” path models Using matrix notation to write linear models Multivariate Expectations Mediation G89.2247 Lect 2

Question: Does exposure to childhood foster care (X) lead to adverse outcomes (Y) ? Example of purported "causal model" X Y Y = B0 + B1X + e Regression approach B0 and B1 can be estimated using OLS Estimates can be expressed in terms of the sample variance of X (S2X), the sample covariance of X and Y (SXY), and the means of the two variables (MY and MX) e B1 G89.2247 Lect 2

Question: Does exposure to childhood foster care (X) lead to adverse outcomes (Y) ? Regression approach, continued In addition to estimating the structural coefficients, we will be interested in estimating the amount of variation in Y that is not explained by the model. .i.e., Var(Y|X) = Var(e). The correlation, rXY = SXY/SXSY, can be used to estimate the variance of the residual, e, V(e). S2e = S2Y(1-r2XY) = S2Y - S2XY/S2X G89.2247 Lect 2

A Covariance Structure Approach If we have data on Y and X we can compute a covariance matrix This estimates the population covariance structure, s2Y can itself be expressed as B21s2X + s2e Three statistics in the sample covariance matrix are available to estimate three population parameters G89.2247 Lect 2

Covariance Structure Approach, Continued A structural model that has the same number of parameters as unique elements in the covariance matrix is "saturated". Saturated models always fit the sample covariance matrix. G89.2247 Lect 2

Another saturated model: Two explanatory variables The first model is likely not to yield an unbiased estimate of foster care because of selection factors (Isolation failure). Suppose we have a measure of family disorganization (Z) that is known to have an independent effect on Y and also to be related to who is assigned to foster care (X) Y X Z e b2 b1 rXZ G89.2247 Lect 2

Covariance Structure Expression The model: Y=b0+b1X+b2Z+e If we assume E(X)=E(Z)=E(Y)=0 and V(X) = V(Z) = V(Y) = 1 then b0=0 and b's are standardized The parameters can be expressed When sample correlations are substituted, these expressions give the OLS estimates of the regression coefficients. G89.2247 Lect 2

Covariance Structure: 2 Explanatory Variables In the standardized case the covariance structure is: Each correlation is accounted by two components, one direct and one indirect There are three regression parameters and three covariances. G89.2247 Lect 2

The more general covariance matrix for two IV multiple regression If we do not assume variances of unity the regression model implies G89.2247 Lect 2

More Math Review for SEM Matrix notation is useful G89.2247 Lect 2

A Matrix Derivation of OLS Regression OLS regression estimates make the sum of squared residuals as small as possible. If Model is Then we choose B so that e'e is minimized. The minimum will occur when the residual vector is orthogonal to the regression plane In that case, X'e = 0 G89.2247 Lect 2

When will X'e = 0? When e is the residual from an OLS fit. G89.2247 Lect 2

Multivariate Expectations There are simple multivariate generalizations of the expectation facts: E(X+k) = E(X)+k = mx+k E(k*X) = k*E(X) = k*mx V(X+k) = V(X) = sx2 V(k*X) = k2*V(X) = k2*sx2 Let XT=[X1 X2 X3 X4], mT=[m1 m2 m3 m4] and let k be scalar value E(k*X) = k*E(X) = k*m E(X+k* 1) = {E(X) + k* 1} = m + k*1 G89.2247 Lect 2

Multivariate Expectations In the multivariate case Var(X) is a matrix V(X)=E[(X-m) (X-m)T] G89.2247 Lect 2

Multivariate Expectations The multivariate generalizations of V(X+k) = V(X) = sx2 V(k*X) = k2*V(X) = k2*sx2 Are: Var(X + k*1) = S Var(k* X) = k2S Let cT = [c1 c2 c3 c4]; cT X is a linear combination of the X's. Var(cT X) = cT S c This is a scalar value If this positive for all values of c then S is positive definite G89.2247 Lect 2

Partial Regression Adjustment The multiple regression coefficients are estimated taking all variables into account The model assumes that for fixed X, Z has an effect of magnitude bZ. Sometimes people say "controlling for X" The model explicitly notes that Z has two kinds of association with Y A direct association through bZ (X fixed) An indirect association through X (magnitude bXrXZ) G89.2247 Lect 2

Pondering Model 1: Simple Multiple Regression Y X Z e b2 b1 rXZ The semi-partial regression coefficients are often different from the bivariate correlations Adjustment effects Suppression effects Randomization makes rXZ = 0 in probability. G89.2247 Lect 2

Mathematically Equivalent Saturated Models Two variations of the first model suggest that the correlation between X and Z can itself be represented structurally. Y X Z eY b2 b1 eZ b3 Y X Z eY b2 b1 eX b3 G89.2247 Lect 2

Representation of Covariance Matrix Both models imply the same correlation structure The interpretation, however, is very different. G89.2247 Lect 2

Model 2: X leads to Z and Y X is assumed to be causally prior to Z. eY b2 b1 eZ b3 X is assumed to be causally prior to Z. The association between X and Z is due to X effects. Z partially mediates the overall effect of X on Y X has a direct effect b1 on Y X has an indirect effect (b3b2) on Y through Z Part of the bivariate association between Z and Y is spurious (due to common cause X) G89.2247 Lect 2

Model 3: Z leads to X and Y Z is assumed to be causally prior to X. eY b2 b1 eX b3 Z is assumed to be causally prior to X. The association between X and Z is due to Z effects. X partially mediates the overall effect of Z on Y Z has a direct effect b2 on Y Z has an indirect effect (b3b2) on Y through X Part of the bivariate association between X and Y is spurious (due to common cause Z) G89.2247 Lect 2

Choosing between models Often authors claim a model is good because it fits to data (sample covariance matrix) All of these models fit the same (perfectly!) Logic and theory must establish causal order There are other possibilities besides 2 and 3 In some instances, X and Z are dynamic variables that are simultaneously affecting each other In other instances both X and Z are outcomes of an additional variable, not shown. G89.2247 Lect 2

Mediation: A theory approach Sometimes it is possible to argue on theoretical grounds that Z is prior to X and Y X is prior to Y The effect of Z on Y is completely accounted for by the indirect path through X. This is an example of total mediation If b2 is fixed to zero, then Model 3 is no longer saturated. Question of fit becomes informative Total mediation requires strong theory G89.2247 Lect 2

A Flawed Example Someone might try to argue for total mediation of family disorganization on low self-esteem through placement in foster care Baron and Kenny(1986) criteria might be met Z is significantly related to Y Z is significantly related to X When Y is regressed on Z and X, b2 is significant but b1 is not significant. Statistical significance is a function of sample size. Logic suggests that children not assigned to foster care who live in a disorganized family may suffer directly. G89.2247 Lect 2

A More Compelling Example of Complete Mediation If Z is an experimentally manipulated variable such as a prime X is a measured process variable Y is an outcome logically subsequent to X It should make sense that X affects Y for all levels of Z E.g. Chen and Bargh (1997) Are participants who have been subliminally primed with negative stereotype words more likely to have partners who interact with them in a hostile manner? G89.2247 Lect 2