Presentation is loading. Please wait.

Presentation is loading. Please wait.

Path Analysis with Manifest Variables

Similar presentations


Presentation on theme: "Path Analysis with Manifest Variables"— Presentation transcript:

1 Path Analysis with Manifest Variables
Mysterious Endogeneity Haiyan Wang Zach Andersen 11/18/2014

2 Outline of the Presentation
Introduction of path analysis Review of PLS Model justification Review of lavaan package User Example of matrix form Simulation

3 Problem: OLS Bias (mysterious endogeneity)
Nonrecursive models (causal loops) OLS results greatly biased Recursive (one direction) (recall PLS-path models) OLS and path analysis both work fine We will focus on the nonrecursive case since no one else has so far In the path diagram I’ll show in a minute, the endogeneous variables are those with the single arrow pointing to them

4 Solutions to nonrecursive OLS bias
Method Instrumental Variables Implied covariance matrix Both above work equally well We will cover implied covariance matrix since this is a multivariate course Latent variables don’t work in OLS

5 Review of PLS Path Analysis
PLS Goals “uncover a common structure among blocks of variables.” [2] No covariance structure: Does not assume a ground truth (focuses on what the data tells you) Does not seek causal relationship, only relationships What does PLS do “obtain score values of latent variables for prediction purposes” [2] 1. From Tim and Jennifer’s slides 2. Gaston “Partial Least Squares with R” Review of PLS will tell us where we’ve been and transition us to path analysis

6 Review of SEM SEM Goal “Test and estimate the (causal) relationships among observable measures and non-observable theoretical (or latent) variables” [1] What does SEM do Seeks to approximate a ground truth by fitting a covariance model to observed covariances [1] Jiyoon and Kiran SEM presentation [2]. Gaston “Partial Least Squares with R”

7 Path analysis (w/ manifest)
Goal: “determines whether your theoretical model successfully accounts for the actual relationships in the sample data” (1) Like SEM unlike PLS Path Analysis What does path analysis (manifest) do Fits a covariance model: seeks approximation of ground truth [2] Uses manifest variables Unlike either SEM PLS or Path Analysis 1: “A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling” by Larry Hatcher

8 What does path analysis (manifest) do
Uses the implied covariance matrix as the link between your data and your model, [3] Implied covariance matrix relates model to your data’s observed variances and covariances The estimated parameters are those that make the observed variance and covariances match as closely as possible to those of the model [3] Source: I will simply refer to our presentation as path analysis

9 A simple Path Analysis with manifest variables example (nonrecursive)
Intelligence Work Performance Supervisory Support This path diagram is a pictoral representation simultaneous equations.(we’ll see in a minute). The equations can then be converted to matrix form (Haiyan). Exogeneous and endogeneous variables. The reason work performance might cause motivation is that if you have not performed, you might never be motivated. Performance would make you more into working hard, seeing the benefits of the performance. Motivation Work Place Norms 1: “A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling” by Larry Hatcher Figure 4.3

10 How is path analysis operationalized
Uses only manifest variables (no latent variables) Allows user to specify exogenous variables effects (single arrows) on endogenous variables Allows user to specify covariance between antecedent variables (double arrows) Allows recursive (one direction) and non-recursive (>1 direction) “A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling” by Larry Hatcher Our focus is non-recursive: a possible solution for OLS mysterious endogenetiy

11 Requirements of path analysis
Causal model must have enough equations to solve for unknown parameters Otherwise an infinite number of solutions Sufficient observations: Ugly rule of thumb 5 observations for every parameter to be estimated “A Step-by-Step Approach to Using SAS for Factor Analysis and Structural Equation Modeling” by Larry Hatcher Equations are based on theory and are pulled from the path diagrams (one eq for each endogeneous variable). They then converted into an implied covariance matrix.

12 Model Justification--Definition
Model is written as simultaneous equations from path diagram One per endogenous variable Over-parameterized Model (# of Parameters > # of equations and no unique solution) Just-identified Model (# of Parameters = # of equations and have unique solution) Under-parameterized Model (# of Parameters < # of equations) Use weighted least square method or ML method to find some solutions that make the two sides of the equations close enough The equations contain relationships with random variables and parameters, according to theory. They come from the path diagram (one equation for each endogenous variable). Haiyan later: The number of equations, as Haiyan will mention in the matrix form, is the doff of the observed (unrestricted) covariance matrix. [Terminology: observed = unrestricted, fitted = restricted] The number of equations are the number of degrees of freedom of the (unrestricted) covariance matrix. Please note: Haiyan will discuss a supply and demand model later, and will revisit parameters and equations concept in matrix form. I will introduce it now.

13 Simple Math Example Model 1 Model 2 Model 3 X+Y=2 (2 param. & 1 eq.)
Over-parameterized model: infinite solutions Model 2 X+Y=2; X-Y=10 (2 param. 2 eq.) Just-identified model: one solution Model 3 X+Y=2; X-Y=10; 2X+Y=5 (2 param. 3 eq.) Under-parametrized model: can only approximate As stated, an equation is a relationship between parameters and random variables, according to theory (as shown in a path diagram) The X and y are 2 parameters. Because these are simultaneous equations, you need enough equations to estimate the parameters. The number of degrees of freedom are the number of equations we have. Model 1: infinite solutions (-1 3, -2 4, etc.) Model 2: just one solution Model 3: no solutions (only approximations)

14 Simple Lavaan tuturial
Jiyoon and Kiran SEM presentation No latent variables here, as in Jiyoon and Kiran’s SEM presentation. SEM one of three lavaan functions.

15 Our dataset SAS 13.1 Users Guide: CALIS procedure

16 Operational model using Lavaan
Our model specification DispInc, FoodCons, FoodCostRatio, RatioPrecYear, Year Q = FoodCons P = FoodCostRatio D = DispInc F = RatioPrecYear Y = Year data.k = data.frame(Q,P,D,F,Y) econ.mod = 'Q ~ P + D P ~ Q + F + Y Q ~~P ' fit <- sem(econ.mod, data=data.k) Food consumption is regressed on food cost ratio and disposable income: direct effects. However the food cost ratio is regressed on food consumption ratio and others= > indirect effects.

17 R code Summary function Output Discussion Degrees of freedom
Regression estimates (direct and indirect effects) Variances R squared Zach will view and explain each of these things in the r code: (Haiyan will count parameters later)

18 Outline Chi-square likelihood ratio test
Supply-and-demand model Example (SEM) Simulation

19 When a researcher has a model in his mind, he always ask himself a question. Is my model good enough? How can I test if my model is good or not? After Zach talked about so many definitions about recursive and norecursive model, model justification, path analysis, simultaneous equations, you many have questions like does those thing really matters with my reaserch, how can I apply this to my research. I have a economic background. One of the application of SEM is the supply and demand model. In this model, all the variables are manifest variables (observed). And the quantity and price effect on each other simultaneously.

20 Like what Dr. Westfall said in the ISQS 5347 class that model produces data. A good model should produce data that is close to the real data. This implies that we can test the null hypothesis: Σ=Σ(λ) The Chi-square likelihood ratio test is one of the method we can use for testing. (Dr.Wesfall ISQS 6348 class on 10/14/2014)

21 Chi-square Likelihood Ratio Test
Our goal is to test the null hypothesis: Σ=Σ(λ), where Σ is the observed covariance matrix (unrestricted model) , λ is a vector of the parameters to be estimated, and Σ(λ) is the covariance matrix implied by our model (Restricted model).

22 Chi-square Likelihood Ratio Test
The null probability distribution of the test statistic can be approximated by a Chi-square distribution with (df1 − df2)degrees of freedom, where df1 and df2 are the degrees of freedom of unrestricted Σ model and restricted model Σ(λ), respectively.

23 Chi-square Likelihood Ratio Test
In other words, the number of degree freedom of the unrestricted model is the number of equations we have. The number of degree freedom of the restricted model is the number of parameters in our model. Back go Zach identification example. If you still feel confused so far, don’t worry, I will show you an example and explain this concept.

24 Example: Simultaneous Equations with Mean Structures and Reciprocal Path
The supply-and-demand food example of Kmenta (1971,pp.565,582). The variables of a simultaneous equation model may be linked through direct relationships, indirect relationships, reciprocal relationships, feedback loops, and/or correlations between disturbances.

25

26 Path Diagram of the Supply-and-Demand Model
Dt Var? Qt E1 C? P? θ3 Θ2 P? β1 Ft P? C? Cov? Pt C? P? θ4 E2 Yt Var? Path diagrams : variables shown in rectangles are observed variables, single-headed arrows denote the direction of influence, and double-headed arrows depict a covariance not explained in the model. The path diagram of this example is a nonrecursive model. In order to use the chi-square likelihood ratio test, this is our restricted model, we want to count how many parameters we have. E1 and E2 are the error terms. We will have 8 parameters to be estimated from our restricted model.

27 How many equations will we have from the unrestricted model?
 Number of equations= (p ( p + 1 ) ) / 2 p=the number of manifest variables By Dr. Westfall notes

28 In our supply-and –demand example, we have 5 manifest variables(Q,P,D,F,Y) , so the number of equations will be 5*6/2=15. But 6 of the equations involve variances and covariances among exogenous manifest variables that are not explained by any model. So the total number of equations are 9.

29 In our data set we have 5 manifest variables, so we have 15 number of equations but 6 of them are redundant ( I will show you why) These are the 6 equations not include in counting total number of equations

30 Now, you know we have 8 parameters and 9 equations and probably you have already figured out that this example is the under-paramterized model case. But you may curious about what those “equations” look like and what is the mysterious behind Σ=Σ(λ).

31 Mysterious behind Σ=Σ(λ)
Endogenous variables, denoted by Y, are outcome variables or variables determined within the model. The vector of endogenous variable has a dimensions p*1.Exogenous variable, denoted with the q*q vector, x, are exogenous variables (they are not explained by the model) in the model.

32 Table1. Notation for Simultaneous Equation Models Vector/Matrix
Definition Dimensions Variables Y Endogenous p×1 X Exogenous q×1 E Disturbance(error) terms Coefficients Γ Coefficient matrix for exogenous variables; direct effects of X on Y p×q Β Coefficient matrix for endogenous variables; direct effects of Y on Y p×p Covariance matrices Φ Covariance matrix of X q×q ψ Covariance matrix of E The error term include all variables influencing y that are omitted from the equation and are assumed to have expected value of zero, further assumed to be uncorrelated with exogenous variables, homoscedastic and noautocorrelation. In our model the p is the number endogenous (2) and q is the number of exogenous (3),

33

34 Some Fun Math Reduced form put all the endogenous on the left hand side of the equal sign, and put all the exogenous variables on the right hand side.

35 Why this is important? When researchers typically have a set of variables in which they are interested, and they have some models in mind of how these variables fit together. To decide which model to use, the bulk of information needed for specification and estimation is summarized in the variances and covariance between the observed variables. That is , the total raw association between the variables is captured in the matrix Σ and is known at least in the data. Creating the implied covariance matrix, Σ(θ) allows a researcher to break down exactly how a hypothesized model relates to the known variance and covariance among the observed variables.

36 By assumption that Cov(X,E)=0 , in this covariance matrix, we have 8 parameters (beta1,beta2,gamma1,gamma2,gamma3, and two variance of error term and cov(E1,E2).Go back to take a look of the diagram. Let’s count how many equations we have.

37 Structures Behind the Fitted Covariance Matrix

38 Blue color are the parameters will be estimated.
If you are interested in the other 8 equations, it’s on my scratch paper. I would like to share with you after class.

39

40 Results-R lavaan package

41 Results-R lavaan package
For endogenous vaiable P, the R-square is , which is obviously an invalid value. In fact, because there are correlated errors (between E1 and E2) and reciprocal paths (path to and from Q and P), the model departs from the regular assumptions of multiple regression analysis. As a result, you should not interpret the R-squares for this example. Also mention the degree of freedom is 1.

42 Check Σ(λ)= Σ The fitted covariance is very close to the observed covariance. You can produce the fitted covariance matrix from the estimated coefficients. The small difference could be explained by chance alone.

43 In this supply-and-demand model example since we don’t know the true model, it’s seems hard to say our estimated model is the best model. What is the best way to check? Simulation!!

44 Simulate a simple model to show the mystery behind Σ= Σ(λ)

45 Path Diagram of the True Model
Var? y1 e1 x P? 1.0 0.5 P? P? 1.0 y2 e2 Var? In this example we have 3 path, 2 variances of the error term. We assume there is no correlation between the two error terms. Totally, we have 5 parameters to be estimated. In this simulation example we have 5 parameters to be estimated.

46

47 We will use these reduced form equations to simulate our data
We will use these reduced form equations to simulate our data. And check that the models are correct, and all values of check1 and check2 should be 0 within machine roundoff.

48 ##Simulate the data set of x and residuals e1 = rnorm(10000,0,1)
R code of simulation: ##Simulate the data set of x and residuals e1 = rnorm(10000,0,1) e2 = rnorm(10000,0,1) x = rnorm(10000,0,1) ##Use simulated x and residuals to run the reduced form model and get the data set of y1 andy2. y1 = 2*x + e1 + 2*e2 y2 = 2*x + 2*e1 + 2*e2 Run the simulation R code at this point

49 In this example we exactly have 5 parameters and 5 equations, we call this just identified. Notice the degree of freedom for this example is 5-5=0

50

51

52 R output-observed covariance matrix(Σ)
Fitted Covariance matrix Observed Covariance matrix

53 The null hypothesis is Σ(λ)= Σ, the difference can be explained by chance alone and increase the number of simulation will make the difference smaller and smaller (by Law of Larger Numbers). However, one of the important thing in this simulation example is that the df=0, which we cannot use the chi-square test to test the model. When the model is just identified, even it’s wrong, there is no way to test it. By Dr. Westfall

54 Compare OLS v.s. SEM Table2. Results comparison of OLS, SEM, and True model I round off the number into 3 decimals

55 You may also curious about what will happen if we have more parameters than equations?
Notice the degree of freedom is -1 because we add cov(e1,e2) as the extra parameter.

56 Thank you!


Download ppt "Path Analysis with Manifest Variables"

Similar presentations


Ads by Google