1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.

1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang

2 Economical stability Political stability Economical stability Political stability Economical stability Political stability Economical stability Political stability L ? ? ? ?

Country XYZ 1. GNP per capita: _____ 2. Energy consumption per capita: _____ 3. Labor force in industry: _____ 4. Ratings on freedom of press: _____ 5. Freedom of political opposition: _____ 6. Fairness of elections: _____ 7. Effectiveness of legislature _____ Task: learn causal model 3

To draw causal conclusions about the unmeasured Economical stability and Political stability variables we are interested in, use hypothesized causal relations between X’s, Es and Ps statistics gathered on X’s (correlation matrix) 4 Economical stability Political stability ? Pure Measurement Model X1X1 X2X2 X3X3 X5X5 X6X6 X7X7 X4X4

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4

A pure n-factor measurement model for an observed set of variables O is such that: Each observed variable has exactly n latent parents. No observed variable is an ancestor of other observed variable or any latent variable. A set of observed variables O in a pure n-factor measurement model is a pure cluster if each member of the cluster has the same set of n parents. 8

L 1 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 Bifactor L 2 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 Higher-Order L 1 L 3 L 2 Higher-Order ⊂ Bifactor ⊂ Connected Bifactor ⊂ Connected Two-Factor

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 1. Estimate and test pure Higher-order model. 2. Estimate and test pure Two-Factor model. 3. Choose whichever one fits best.

If a measurement model is impure, and you assume it is pure, this will hinder the inference of the correct structural model. If a higher-order model has impurities, it will fit a more inclusive pure model such as a pure two-factor model better than a pure higher-order model.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 Generating Model

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 Data fits model with black edges + pure measurement model better than model without black edges + pure measurement model.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 Generating Model

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 Worse Fit

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 Better Fit

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 Generating Model 1. Identify pure submodel {1,2,3,4,5,8,9,10,11,12,13}. 2. See if it fits Higher-order. 3. If it does select Higher –order, otherwise see if it fits Two-Factor model.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 8 X 9 X 10 X 11 X 12 X 13 Pure submodel fits Higher-order model, so select Higher-order. ?

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 Data will also fit Two-Factor model (slightly lower chi-squared), but when adjusted for degrees of freedom, p-value will be lower. ? ?

An algebraic constraint is linearly entailed by a DAG if it is true of the implied covariance for every value of the free parameters (the linear coefficients and the variances of the noise terms) 21

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 (C A :C B ) trek-separates A from B iff every trek between A and B intersects C A on the A side or C B on the B side.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 If (C A :C B ) trek-separates A from B, and the model is an acyclic linear Gaussian model, then rank(cov(A,B)) ≤ #C A + #C B.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 If #C A + #C B ≤ #C’ A + #C’ B for all (C’ A :C’ B ) that trek- separate A from B, then for generic linear acyclic Gaussian models, rank(cov(A,B)) = #C A + #C B.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 If #C A + #C B > r for all (C A :C B ) that trek-separate A from B in DAG G, then for some linear Gaussian parameterization, rank(cov(A,B)) > r.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3}:{10,11,12} linear acyclic below f(L 1, ε L3 ) g(L 2, ε L4 )

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3}:{10,11,12} not linear acyclic below f(L 1, ε L3 ) g(L 2, ε L4 )

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 If (C A :C B ) trek-separates A from B, and model is linear acyclic below (C A :C B ) for A, B, then rank(cov(A,B)) ≤ #C A + #C B.

CACA … … full rank A B CBCB

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 If #C A + #C B > r for all (C A :C B ) that trek-separate A from B in DAG G, then for some linear acyclic below (C A :C B ) for A, B parameterization, rank(cov(A,B)) > r.

If a rank constraint is not entailed by the graphical structure, then the rank constraint does not hold. If the constraints do not hold for the whole space of parameters (i.e. they are not entailed), but are the roots of rational equations in the parameters, they are of Lebesgue measure 0.

This says nothing about the measure of constraints that are not entailed but “almost” hold (i.e. cannot be distinguished from 0 reliably given the power of the statistical tests.) However, the performance of the algorithm will not depend upon the extent to which individual non- entailed constraints “almost” hold, but the extent to which sets of non-entailed constraints “almost” hold. This depends upon which sets of constraints affect the performance of the algorithm, and the joint distribution of the constraints which we do not know.

Advantages No need for estimation of model. No iterative algorithm No local maxima. No problems with identifiability. Fast to compute. Disadvantages Does not contain information about inequalities. Power and accuracy of tests? Difficulty in determining implications among constraints 37

Find a list of pure pentads of variable. Merge pentads on list that overlap. Select which merged subsets to output.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 For each subset of size 5, if it is Pure, add to PureList. {1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13}; {8,9,10,12,13}; {8,9,10,11,12}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 Trek-Separate All Partitions of {1,2,3,4,5,x}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 No Pair Trek-Separate All Partitions of {1,2,3,4,8,x}, e.g. {1,2,8}:{3,4,9}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 No Pair Trek-Separates All Partitions of {1,2,3,4,6,x}, e.g. {1,2,6}:{3,4,7}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13}; {8,9,10,12,13}; {8,9,10,11,12} → {1,2,3,4,5}; {8,9,10,11,12,13}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3,4,5}; {9,10,11,12,13}; {8,10,11,12,13}; {8,9,11,12,13}; {8,9,10,12,13}; {8,9,10,11,12}; {1,2,3,8,9} (false positive)

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {9,10,11,12,13}; {8,10,11,12,13} → {8,9,10,11,12,13}; All subsets of size 5 of {8,9,10,11,12,13} are in PureList so accept merger, and remove both from PureList.

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3,4,5}; {1,2,3,8,9} → {1,2,3,4,5,8,9} All subsets of size 5 except {1,2,3,8,9} and {1,2,3,4,5}not on PureList – so reject merger

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3,4,5}; {8,9,10,11,12,13}; {1,2,3,8,9}

L 1 L 3 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 2 L 4 {1,2,3,4,5}; {8,9,10,11,12,13}; {1,2,3,8,9} Output {8,9,10,11,12,13} because it is largest. Output {1,2,3,4,5} because it is next largest that is disjoint.

If The causal graph contains as a subgraph a pure 2-factor measurement model with at least six indicators and at least 5 variables in ech cluster; The model is linear acyclic below the latent variables; Whenever there is no trek between two variables they are independent; There are no correlations equal to zero or one; The distribution is LA faithful to the causal graph; then the population FTFC algorithm outputs a clustering in which any two variables in the same output cluster have the same pair of latent parents.

L 1 L 3 L 5 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 L 6 L 2 L 4

X 1 X 2 X 3 X 4 X 5 X 6 Spider Model (Sullivant, Talaska, Draisma) L L1L1 L2L2 L3L3 L4L4 L5L5 L6L6

However, the spider model (and the collider model) do not receive the same chi-squared score when estimated, so in principle they can be distinguished from a 2-factor model. Expensive Requires multiple restarts Need to test only pure clusters If non-Gaussian, may be able to detect additional impurities. 53

In case of linear pure single factor models (with at least 3 indicators per cluster), all of the latent-latent edges are guaranteed to be identifiable. Can apply causal search model using the estimated covariance matrix among the latents as input.

For sextads, the first step is to check 10 * n choose 6 sextads. However, a large proportion of social science contexts, there are at most 100 observed variables, and 15 or 16 latents. If based on questionairres, generally can’t get people to answer more questions than that. Simulation studies by Kummerfeld indicate that given the vanishing sextads, the rest of the algorithm is subexponential in the number of clusters, but exponential in the size of the clusters. 56

Σ IJ is the I×J submatrix of the inverse of Σ IJ, and Σ IJ × IJ is the (I ∪ J) × (I ∪ J) submatrix of Σ. This can be turned into a statistical test by substituting the maximum likelihood estimate of Σ in for the population values of Σ.

τ is a column vector of independent population sextad differences implied by a model to vanish t is a column vector of corresponding sample sextad differences σ is a column vector of covariances that appear in one of more vanishing sextad differences in t Σ ss is the covariance matrix of the limiting distribution of sample covariances appearing in t, σ efgh is the fourth order moment matrix.

Tests require (algebraic) independence among constraints. Additional complication – when some correlations or partial correlations are non-zero, additional dependencies among constraints arise Some models entail that neither of a pair of sextad constraints vanish, but that they are equal to each other 59

3 hypothesized latent variables: Stress, Depression, and (religious) Coping. 21 indicators for Stress, 20 each for Depression and Coping n = 127

Lee model p(χ 2 ) = 0

Silva et al. model p(χ 2 ) =.28

The current version of the FTFC algorithm cannot be applied to all 61 measured indicators in the Lee data set as input in a feasible amount of time. We applied it at several different signicance levels to look for 2-pure sub-models of the 3 original given subsets of measured indicators. We ran the FTFC algorithm at a number of dierent significance levels. Using the output of FTFC as a starting point, we searched for a model that had the highest p-value using a chi-squared test. The best model that we found contained a cluster of 9 coping variables, 8 stress variables, and 8 depression variables (all latent variables directly connected). p(χ 2 ) = 0.27.

L 1 L 3 L 5 X 1 X 2 X 3 … X 10 X 11 … X 20 X 21 … X 30 L 2 L 4 L 6 Generated from model, and pure submodel. 3 sample sizes: n = 100 (alpha =.1), 500 (alpha =.1), 1000 (alpha =.4). Non-linear function are convex combination of linear and cubic.

P/I – Generated from pure/impure submodel L/N – Generated from linear/non-linear latent-latent functions L/N – Generated from linear/non-linear latent-measured connections Purity – percentage of output cluster from same pure subcluster.

The average number of clusters output ranged between 2.7 and 3.1 for each kind of model and sample size, except for PNN (pure submodel, non-linear latent-latent and latent-measured functions.) For PNN at sample sizes 100, 500, and 1000 average number of clusters were 1.05, 1.38, and 1.54 respectively. This is expected, because non-linear latent-measurd connections violates the assumptions under which the algorithm is correct.

The percentage of each pure subcluster that was in the output cluster.

Larger clusters are more stably produced and more likely to be (almost) correct.

Described algorithm that relies on weakened assumptions Weakened linearity assumption to linearity below the latents Weakened assumption of existence of pure submodels to existence of n-pure submodels Conjecture correct if add assumptions of no star or collider models, and faithfulness of constraints Is there reason to believe in faithfulness of constraints when non-linear relationships among the latents? 70

Give complete list of assumptions for output of algorithm to be pure. Speed up the algorithm. Modify algorithm to deal with almost unfaithful constraints as much as possible. Add structure learning component to output of algorithm. Silva – Gaussian process model among latents, linearity below latents Identifiability questions for stuctural models with pure measurement models. 71

Silva, R. (2010). Gaussian Process Structure Models with Latent Variables. Proceedings from Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10). Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006a). Learning the structure of linear latent variable models. J Mach Learn Res, 7, 191-246. Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Ann Stat, 38(3), 1665-1685. 72

Drton, M., Massam, H., and Olkin, I. (2008) Moments of minors of Wishart matrices, Annals of Statistics 36, 5, pp. 2261-2283. Drton, M., Sturmfels, B., Sullivant, S. (2007) Algebraic factor analysis: tetrads, pentads and beyond, Probability Theory and Related Fields, 138, 3-4, 463-493 Harman, H. (1976) Modern Factor Analysis, University of Chicago Press Books 73

1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.

Similar presentations

Presentation on theme: "1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang.

Similar presentations

Presentation on theme: "1 Peter Spirtes, Richard Scheines, Joe Ramsey, Erich Kummerfeld, Renjie Yang."— Presentation transcript:

Similar presentations

About project

Feedback