1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al. 2006.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

5.4 Basis And Dimension.
Chapter 5 Multiple Linear Regression
Shortest Vector In A Lattice is NP-Hard to approximate
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Multiple Regression Analysis
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Topic Outline Motivation Representing/Modeling Causal Systems
Weakening the Causal Faithfulness Assumption
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Totally Unimodular Matrices
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Correlation and regression Dr. Ghada Abo-Zaid
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Ch11 Curve Fitting Dr. Deshi Ye
STAT 497 APPLIED TIME SERIES ANALYSIS
Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Statistics for Managers Using Microsoft® Excel 5th Edition
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Basics of regression analysis
Bayes Net Perspectives on Causation and Causal Inference
Objectives of Multiple Regression
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Structural Equation Modeling 3 Psy 524 Andrew Ainsworth.
Hypothesis Testing in Linear Regression Analysis
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Yaomin Jin Design of Experiments Morris Method.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Section 2.3 Properties of Solution Sets
Course files
Chapter 13 Multiple Regression
Latent Growth Modeling Byrne Chapter 11. Latent Growth Modeling Measuring change over repeated time measurements – Gives you more information than a repeated.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
Computacion Inteligente Least-Square Methods for System Identification.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Estimating standard error using bootstrap
Comparing Counts Chi Square Tests Independence.
Chapter 15 Multiple Regression Model Building
The simple linear regression model and parameter estimation
Linear Algebra Review.
Computability and Complexity
Regression.
CJT 765: Structural Equation Modeling
Chapter 25 Comparing Counts.
Markov Properties of Directed Acyclic Graphs
Undergraduated Econometrics
Chapter 26 Comparing Counts.
Properties of Solution Sets
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Chapter 2. Simplex method
Presentation transcript:

1

Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al. 2006

These variables cannot be measured directly They are estimated by asking people to answer questions, and constructing a model that relates the measured answers to the unobserved variables Problems: What is the relationship between the measured variables and the latent variables to be estimated? Some questions Might be caused by multiple latent variables Might be caused by answers to previous questions Might be caused by latent variables that are not being estimated 3

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 This edge is not identifiable (unlike single factor case where all of the latent connections are identifiable if the measurement model is simple). 4

A set of variables V is causally sufficient iff each cause that is a direct cause relative to V of any pair of variables in V, is also in V. It is minimal if the set formed by removing any latent variables is not causally sufficient. 5

L1 L3 L5 L2 L4 L6 The stuctural graph has all and only the latent variables, and the edges between the latent variables. 6

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 The measurement graph has a minimal causally sufficient set of variables, and all of the edges except the latent-latent edges. 7

A pure n-factor measurement model for an observed set of variables O is such that: Each observed variable has exactly n latent parents. No observed variable is an ancestor of other observed variable or any latent variable. A set of observed variables O in a pure n-factor measurement model is a pure cluster if each member of the cluster has the same set of n parents. 8

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 9 Strategy: (1) find a subset of variables for which (i) the measurement model is simple, and (ii) it is possible to determine that it is simple, without knowing the true structural model; (2) then find structural model.

L1 L3 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 L2 L4 10

L1 L3 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 L2 L4 Actual Impure Measurement Model

L1 L3 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 L2 L4 If treat measurement model as pure, no structural model will fit the data well. But adding an L1 -> L3 edge may improve the fit because it allows for correlations between X1 – X6 and X7 – X11. Assumed Pure Measurement Model

Causally unconnected variables are independent. No observed variable is a cause of a latent variable. No correlations are close to 0 or to 1 (pre-process) All of the sub covariance matrices are invertible No feedback (In practice) There is a one-factor pure measurement submodel Each variable is a linear function of its parents in the graph + a noise term that is uncorrelated with any of the other noise terms – linear structural equation model. 13

Let be the submatrix with rows from A and columns from B For each quartet of variables there are 3 different tetrad constraints: Only two of the constraints are independent: any two entail the third. 14

For each sextuple of variables there are 10 different sextad constraints: 15

An algebraic constraint is linearly entailed by a DAG if it is true of the implied covariance for every value of the free parameters (the linear coefficients and the variances of the noise terms) 16

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 A trek in G from i to j is an ordered pair of directed paths (P1; P2) where P1 has sink i, P2 has sink j, and both P1 and P2 have the same source k. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 A trek in G from i to j is an ordered pair of directed paths (P1; P2) where P1 has sink i, P2 has sink j, and both P1 and P2 have the same source k. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) 17

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 The two paths of a simple trek intersect only at the source. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) X13 side; X14 side L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 The two paths of a simple trek intersect only at the source. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) X13 side; X14 side 18

A = {1,2,3} B = {4,5,6} CA = {L1} CB = {L2} A is t-separated from B by -> 19

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 Let A, B, CA, and CB be four subsets of V (G) which need not be disjoint. The pair (CA;CB) trek separates (or t-separates) A from B if for every trek (P1; P2) from a vertex in A to a vertex in B, either P1 contains a vertex in CA or P2 contains a vertex in CB. 20

The submatrix Σ A,B has rank less than or equal to r for all covariance matrices consistent with the graph G if and only if there exist subsets (C A,C B ) included in V(G) with #C A + #C B ≤ r such that (C A,C B ) t-separates A from B. Consequently, rk(Σ A,B ) ≤ min{#CA + #CB : (C A,C B ) t-separates A from B}; and equality holds for covariance matrices consistent with G (Lebesgue measure 1 over parameters). If rank of submatrix is n, then the determinant of every n+1 x n+1 determinant is zero 21

Algebraic Constraint Faithfulness Assumption: If an algebraic constraint holds in the population distribution, then it is linearly entailed to hold by the causal DAG. Partial Correlations Tetrads Sextads Strong Faithfulness Assumption (for finite sample sizes) A causal DAG does not have parameters such that non-entailed vanishing sextad constraints are very close to zero. 22

Violations of Algebraic Faithfulness Assumption are Lebesgue measure 0. There is a lower dimensional surface in the space of parameters on which faithfulness is violated. Violations of Strong Algebraic Faithfulness Assumption are not Lebesgue measure 0. The surface of parameters on which almost faithfulness is violated is not lower dimensional than the space of parameters As the number of variables grows, the probability of some violation of faithfulness becomes large. 23

Advantages No need for estimation of model. No iterative algorithm No local maxima. No problems with identifiability. Fast to compute. Disadvantages Does not contain information about inequalities. Power and accuracy of tests? Difficulty in determining implications among constraints 24

Input – Data from observed variable in linear model Output – Set of variables that appear in (almost) pure measurement model, clustered into (almost) pure subsets We haven’t defined almost pure (not Silva 06 sense) – there is a list of impurities that can’t be detected by constaint search, but we don’t know whether it is complete. The basic idea with trivial modifications (in theory) can be applied to arbitrary numbers of latent parents, using different constraints. 25

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 26

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 27

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 not appear in any entailed sextad. Remove one of the variables. Heuristic – remove the variable which appears in the fewest sextads that hold. 28

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 not appear in any entailed sextad. Remove one of the variables. Heuristic – remove the variable which appears in the fewest sextads that hold. 29

A subset of 5 variables is a good pentuple iff when add any sixth variable to the pentuple, the resulting sextuple is complete 30

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 Any subset of X1-X6 with 5 variables is a good pentuple 31

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 32

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 33

For a given set of variables, if all subsets of 5 are good pentuples, merge them. All subsets of size 5 of X1-X6 are good pentuples, so merge. L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 34

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6 35

X12 and X13 do not appear in any good pentuples. If X13 is removed, all subsets of size 5 of X7-X12 become good pentuples, so they are merged. (Similarly for X12.) L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 L2 L4 L6 36

We can (conceptually) remove L5 because it is not needed to make a causally sufficient set. However, L6 has to remain, and X7-X12 is not pure by our definition because X12 has 3 latent parents. L1 L3 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 L2 L4 L6 37

Choke sets where L7 on the X6 side 38

Choke sets 39

However, the spider model and the collider model do not receive the same chi-squared score when estimated, so in principle they can be distinguished from a 2-factor model. Expensive Requires multiple restarts Need to test only pure clusters If non-Gaussian, may be able to detect additional impurities. 40

For sextads, the first step is to check 10 * n choose 6 sextads. However, a large proportion of social science contexts, there are at most 100 observed variables, and 15 or 16 latents. If based on questionairres, generally can’t get people to answer more questions than that. Simulation studies by Kummerfeld indicate that given the vanishing sextads, the rest of the algorithm is subexponential in the number of clusters, but exponential in the size of the clusters. 41

Tests require (algebraic) independence among constraints. Additional complication – when some correlations or partial correlations are non-zero, additional dependencies among constraints arise Some models entail that neither of a pair of sextad constraints vanish, but that they are equal to each other 42

For single factor submodels, the algorithm can be applied to more than a hundred measured variables, with comparable accuracy to Silva 06 algorithm. 43

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 20 trials # 2 cluster – 15/20 # 1 cluster – 5/20 # 0 clusters – 2/20 Average misassigned: 1 Average left out if 2 cluster: 1 Average impurities left in:.1 44

L1 L3 L5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6 Theory: As long as parts (choke sets to observed) of the graph are linear with additive noise, t-separation theorem still holds. Practice: The algorithm can be applied (with same caveats) even if the structural model is non-linear or has feedback. 45

Described algorithm that relies on weakened assumptions Weakened linearity assumption to linearity below the latents Weakened assumption of existence of pure submodels to existence of n-pure submodels Conjecture correct if add assumptions of no star or collider models, and faithfulness of constraints Is there reason to believe in faithfulness of constraints when non-linear relationships among the latents? 46

Give complete list of assumptions for output of algorithm to be pure. Speed up the algorithm. Modify algorithm to deal with almost unfaithful constraints as much as possible. Add structure learning component to output of algorithm. Silva – Gaussian process model among latents, linearity below latents Identifiability questions for stuctural models with pure measurement models. 47

Silva, R. (2010). Gaussian Process Structure Models with Latent Variables. Proceedings from Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10). Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006a). Learning the structure of linear latent variable models. J Mach Learn Res, 7, Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Ann Stat, 38(3),

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials 49 Cluster 1Cluster 2Cluster 3Impurities 5/64/64/52 3/54/64/51 3/54/64/52 5/64/64/52 6/6 -3 3/63/5-1 3/ /6--3 3/6--3

3 latents, 6 measures, 10 trials 50 Clusters +Clusters -UnassignedMisassigned

Main Example 51 Clusters +Clusters -UnassignedMisassignedImpure

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials 52 UnassignedMisassignedImpurities Missed

Suppose A = {X 2,X 3 }, B = {X 4,X 5 }, C A = {L 1 }, C B = X 2 = 3 X 1 + f 2 (e 2,X 6 ) X 4 = 0.6 L 1 + f 4 (e 4 ) X 1 = 2 L 1 + f 1 (e 1 ) X 5 = 0.9 L 1 + f 5 (e 5 ) X 3 = 0.8 L 1 + f 3 (e 3 ) D(C A,A) = {X 1,X 2,X 3 } D(C B,B) = 53

Theorem: Suppose G is a directed graph containing C A, A, C B, and B, t-separates A and B, and A and B are linear below their choke sets C A and C B. Then rank(cov(A,B)) ≤ #C A + #C B. Theorem 2: Suppose G is a directed graph containing C A, A, C B, and B, and A and B are linear below C A, C B but does not t-separate A and B. Then there is a covariance matrix compatible with the graph in which rank(cov(A,B)) > #C A + #C B. Proof: This follows from Sullivant et al. for linear models. Question: Is there a natural sense in which the set of parameters for which the rank(cov(A,B)) ≤ #C A + #C B is of measure 0 if it is not entailed by t-separation, even for the non-linear case? 54