1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.

Slides:



Advertisements
Similar presentations
Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:
Advertisements

When are Graphical Causal Models not Good Models? CAPITS 2008 Jan Lemeire September 12 th 2008.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
DATA MINING LECTURE 7 Minimum Description Length Principle Information Theory Co-Clustering.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Part II: Graphical models
Bayesian Networks A causal probabilistic network, or Bayesian network,
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
MML, inverse learning and medical data-sets Pritika Sanghi Supervisors: A./Prof. D. L. Dowe Dr P. E. Tischer.
Bayesian Network Representation Continued
1 gR2002 Peter Spirtes Carnegie Mellon University.
Bayesian Learning Rong Jin.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Bayes Net Perspectives on Causation and Causal Inference
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Target Tracking with Binary Proximity Sensors: Fundamental Limits, Minimal Descriptions, and Algorithms N. Shrivastava, R. Mudumbai, U. Madhow, and S.
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Artificial Intelligence CS 165A Thursday, November 29, 2007  Probabilistic Reasoning / Bayesian networks (Ch 14)
Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt.
Introduction to Bayesian Networks
Generalizing Variable Elimination in Bayesian Networks 서울 시립대학원 전자 전기 컴퓨터 공학과 G 박민규.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Coding Theory Efficient and Reliable Transfer of Information
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
Bayesian Network By Zhang Liliang. Key Point Today Intro to Bayesian Network Usage of Bayesian Network Reasoning BN: D-separation.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Chapter 6 Bayesian Learning
Conditional Probability Distributions Eran Segal Weizmann Institute.
INTRODUCTION TO Machine Learning 3rd Edition
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
Bayesian Learning Provides practical learning algorithms
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Identification in Econometrics: A Way to Get Causal Information from Observations? Damien Fennell, LSE UCL, May 27, 2005.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS 2750: Machine Learning Directed Graphical Models
Introduction to Information theory
Minimum Description Length Information Theory Co-Clustering
Markov Properties of Directed Acyclic Graphs
A Bayesian Approach to Learning Causal networks
Quantum Information Theory Introduction
Center for Causal Discovery: Summer Short Course/Datathon
CAP 5636 – Advanced Artificial Intelligence
Bayesian Networks Independencies Representation Probabilistic
An Algorithm for Bayesian Network Construction from Data
CS 188: Artificial Intelligence
I-equivalence Bayesian Networks Representation Probabilistic Graphical
Chapter 14 February 26, 2004.
Presentation transcript:

1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006

Pag. 2 Causality & MDL What can be learnt about the world from observations? We have to look for regularities & model them

Pag. 3 Causality & MDL MDL-approach to Learning Occam’s Razor “Among equivalent models choose the simplest one.” Minimum Description Length (MDL) “Select model that describes data with minimal #bits.” model = shortest program that outputs data length of program = Kolmogorov Complexity Learning = finding regularities = compression

Pag. 4 Causality & MDL Randomness vs. Regularity random string=incompressible=maximal information regularity of repetition allows compression Separation by the Two-part code

Pag. 5 Causality & MDL Model of Multivariate Systems Variables Probabilistic model of joint distribution with minimal description length? Experimental data

Pag. 6 Causality & MDL 1 variable Average code length = Shannon entropy of P(x) Multiple variables With help of other, P(E| A…D) (CPD) Factorization Mutual information decreases entropy of variable

Pag. 7 Causality & MDL Reduction of factorization complexity Bayesian Network I. Conditional Independencies Ordering 1Ordering 2

Pag. 8 Causality & MDL II. Faithfulness Joint Distribution  Directed Acyclic Graph Conditional independencies  d-separation Theorem: if a faithful graph exists, it is the minimal factorization.

Pag. 9 Causality & MDL Definition through interventions III. Causal Interpretation

Pag. 10 Causality & MDL Reductionism Causality = reductionism Canonical representation: unique, minimal, independent Building block = P(X i |parents i ) Whole theory is based on modularity like asymmetry of causality Intervention = change of block

Pag. 11 Causality & MDL Ultimate motivation for causality Model = canonical representation able to explain all regularities close to reality Example taken from Spirtes, Glymour and Scheines 1993, Fig RealityLearnt

Pag. 12 Causality & MDL Incompressible (random distribution) Causal model is MDL of joint distribution if

Pag. 13 Causality & MDL d-separation tells what we can expect from a causal model A Bayesian network with unrelated, random CPDs is faithful Eg. D depends on C, unless a dependency in P(D|C,E) P(d 1 |c 0,e 0 ).P(e 0 )+ P(d 1 |c 0,e 1 ).P(e 1 ) = P(d 1 |c 1,e 0 ).P(e 0 )+ P(d 1 |c 1,e 1 ).P(e 1 )

Pag. 14 Causality & MDL When do causal models become incorrect? Other regularities!

Pag. 15 Causality & MDL A. Lower-level regularities Compression of the distributions

Pag. 16 Causality & MDL B. Better description form Pattern in figure random patterns -> distribution Causal model?? Other models are better Why? Complete symmetry among the variables

Pag. 17 Causality & MDL C. Interference with independencies X and Y independent by cancellation of X → U → Y and X → V → Y dependency of both paths = regularity

Pag. 18 Causality & MDL Violation of weak transitivity condition One of the necessary conditions for faithfulness

Pag. 19 Causality & MDL Deterministic relations Y=f(X 1, X 2 ) Y becomes (unexpectedly) independent from Z conditioned on X 1 and X 2 ~ violation of the intersection condition Solution: augmented model - add regularity to model - adapt inference algorithms Learning algorithm: variables possibly contain equivalent information about another Choose simplest relation

Pag. 20 Causality & MDL Conclusions Interpretation of causality by the regularities Canonical, faithful representation ‘Describe all regularities’ Causality is just one type of regularity? Occam’s Razor works Choice of simplest model models close to ‘reality’ but what is reality? Atomic description of regularities that we observe? Papers, references and demos: