1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006
Pag. 2 Causality & MDL What can be learnt about the world from observations? We have to look for regularities & model them
Pag. 3 Causality & MDL MDL-approach to Learning Occam’s Razor “Among equivalent models choose the simplest one.” Minimum Description Length (MDL) “Select model that describes data with minimal #bits.” model = shortest program that outputs data length of program = Kolmogorov Complexity Learning = finding regularities = compression
Pag. 4 Causality & MDL Randomness vs. Regularity random string=incompressible=maximal information regularity of repetition allows compression Separation by the Two-part code
Pag. 5 Causality & MDL Model of Multivariate Systems Variables Probabilistic model of joint distribution with minimal description length? Experimental data
Pag. 6 Causality & MDL 1 variable Average code length = Shannon entropy of P(x) Multiple variables With help of other, P(E| A…D) (CPD) Factorization Mutual information decreases entropy of variable
Pag. 7 Causality & MDL Reduction of factorization complexity Bayesian Network I. Conditional Independencies Ordering 1Ordering 2
Pag. 8 Causality & MDL II. Faithfulness Joint Distribution Directed Acyclic Graph Conditional independencies d-separation Theorem: if a faithful graph exists, it is the minimal factorization.
Pag. 9 Causality & MDL Definition through interventions III. Causal Interpretation
Pag. 10 Causality & MDL Reductionism Causality = reductionism Canonical representation: unique, minimal, independent Building block = P(X i |parents i ) Whole theory is based on modularity like asymmetry of causality Intervention = change of block
Pag. 11 Causality & MDL Ultimate motivation for causality Model = canonical representation able to explain all regularities close to reality Example taken from Spirtes, Glymour and Scheines 1993, Fig RealityLearnt
Pag. 12 Causality & MDL Incompressible (random distribution) Causal model is MDL of joint distribution if
Pag. 13 Causality & MDL d-separation tells what we can expect from a causal model A Bayesian network with unrelated, random CPDs is faithful Eg. D depends on C, unless a dependency in P(D|C,E) P(d 1 |c 0,e 0 ).P(e 0 )+ P(d 1 |c 0,e 1 ).P(e 1 ) = P(d 1 |c 1,e 0 ).P(e 0 )+ P(d 1 |c 1,e 1 ).P(e 1 )
Pag. 14 Causality & MDL When do causal models become incorrect? Other regularities!
Pag. 15 Causality & MDL A. Lower-level regularities Compression of the distributions
Pag. 16 Causality & MDL B. Better description form Pattern in figure random patterns -> distribution Causal model?? Other models are better Why? Complete symmetry among the variables
Pag. 17 Causality & MDL C. Interference with independencies X and Y independent by cancellation of X → U → Y and X → V → Y dependency of both paths = regularity
Pag. 18 Causality & MDL Violation of weak transitivity condition One of the necessary conditions for faithfulness
Pag. 19 Causality & MDL Deterministic relations Y=f(X 1, X 2 ) Y becomes (unexpectedly) independent from Z conditioned on X 1 and X 2 ~ violation of the intersection condition Solution: augmented model - add regularity to model - adapt inference algorithms Learning algorithm: variables possibly contain equivalent information about another Choose simplest relation
Pag. 20 Causality & MDL Conclusions Interpretation of causality by the regularities Canonical, faithful representation ‘Describe all regularities’ Causality is just one type of regularity? Occam’s Razor works Choice of simplest model models close to ‘reality’ but what is reality? Atomic description of regularities that we observe? Papers, references and demos: