Talking Points Joseph Ramsey. LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional.

Slides:



Advertisements
Similar presentations
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Weakening the Causal Faithfulness Assumption
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.
Command Line Tetrad  We don’t have an extensive command line interface programmed, but what we do have has proven useful to many people.  We have a command.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Visual Recognition Tutorial
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Tirgul 9 Amortized analysis Graph representation.
Chapter 12a Simple Linear Regression
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of scientific research When you know the system: Estimation.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Session 6: Introduction to cryptanalysis part 1. Contents Problem definition Symmetric systems cryptanalysis Particularities of block ciphers cryptanalysis.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Microsoft ® Office Word 2007 Training Mail Merge II: Use the Ribbon and perform a complex mail merge [Your company name] presents:
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.
Bayes Net Perspectives on Causation and Causal Inference
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey.
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Computer vision: models, learning and inference Chapter 19 Temporal models.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Combinatorial Algorithms Reference Text: Kreher and Stinson.
Yaomin Jin Design of Experiments Morris Method.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Confidence Interval & Unbiased Estimator Review and Foreword.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Lecture 2: Statistical learning primer for biologists
Surya Bahadur Kathayat Outline  Ramses  Installing Ramses  Ramses Perspective (Views and Editors)  Importing/Exporting Example.
Simulink Simulink is a graphical extension to MATLAB for modeling and simulation of systems. In Simulink, systems are drawn on screen as block diagrams.
Principal Component Analysis (PCA)
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.
1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Installing and Using Evolve Evolve is written in Java and runs on any system with Java 1.6 Download Evolve from Unzip.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
The NP class. NP-completeness
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Static Optimality and Dynamic Search Optimality in Lists and Trees
Microsoft Office 2013 Coming to a PC near you!.
Chapter 10: Solving Linear Systems of Equations
Center for Causal Discovery: Summer Short Course/Datathon
Center for Causal Discovery: Summer Short Course/Datathon
Linear Hierarchical Modelling
Introduction to Data Structure
Presentation transcript:

Talking Points Joseph Ramsey

LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional independence tests. Usually tests that assume linearity and Gaussianity. LiNGAM uses a different approach. Assumes linearity and non-Gaussianity. Runs Independent Components Analysis (ICA) to estimate the coefficient matrix. Rearranges the coefficient matrix to get a causal order. Prunes weak coefficients by setting them to zero.

ICA Although complicated, the basic idea is very simple. a11 X a1n Xn = e1... an1 X ann Xn = en Assume e1,...,en are i.i.d. Try to maximize the non-Gaussianity of w1 X wn Xn = ? There are n ways to do it up to symmetry! (Cf. Central Limit Theorem, Hyavarinen et al., 2002) You can use the coefficients for e1, or for e2, or for... All other linear combinations of e1,...,en are more Gaussian.

ICA This equation is usually denoted Wx = s But also X = BX + s where B is the coefficient matrix So Wx = (I – B)x = e s is the vector of independent components x is the vector of variables Just showed that under strong conditions we can estimate W. So we can estimate B! (But with unknown row order) Using assumptions of linearity and non-Gaussianity (of all but one variable) alone. More sophisticated analyses allow errors to be non-i.i.d.

LiNGAM LiNGAM runs ICA to estimate the coefficient matrix B. The order of the errors is not fixed by ICA, so some rearranging of the B matrix needs to be done. Rows of the B matrix are swapped so the it is lower triangular. a[i][j] should be non-zero (representing an edge) just in case i  j Typically, a cutoff is used to determine if a matrix element is zero. The rearranged matrix corresponds to the idea of a causal order.

LiNGAM Once you know which nodes are adjacent in the graph and what the causal order is, you can infer a complete DAG. Review: Use data from a linear non-Gaussian model (all but one variable non-Gaussian) Infer a complete DAG (more than a pattern!)

Generalized SEM In order to try LiNGAM we first need to simulate some linear non-Gaussian data, for which we will need to use the Generalized SEM Model. The Generalized SEM is a generalization of the linear SEM model. Allows for arbitrary connection functions Allows for arbitrary distributions Simulation from cyclic models supported.

Hand On Create a DAG. Parameterize it as a Generalized SEM. Open the Generalized SEM and select Apply Templates from the Tools menu. Apply the default template to variables, which will make them all linear functions. For errors, select a non-Gaussian distribution, such as U(0, 1). Save.

Hand On Attach a Generalized SEM IM. Attach a data set, simulate 1000 points. Attach a Search box and run LiNGAM. Attach another search box to Data and run PC. Compare PC to LiNGAM.

Special Variants of Algorithms PC Pattern PC Pattern enforces the requirement that the output of the algorithm will be a pattern. PCD PCD adds corrective code to PC for the case where some variables stand in deterministic relationships. This results in fewer edges being removed from the graph. For example, if X _||_ Y | Z but Z determines Y, X---Y is not taken out.

Special Variants of Algorithms CPC The PC algorithm may jump too quickly to the conclusion that a collider and noncolliders should be oriented, X->Y<-Z, X---Y---Z The CPC algorithm uses a much more conservative test for colliders and noncolliders, double and triple checking to make sure they should be oriented, against different adjacents to X and to Z. The result is a graph with fewer but more accurate orientations.

Hands On Simulate data from a “complicated” DAG using a SEM IM. Choose the Search from Simulated Data item from the Templates menu. Make a random 20 node 20 edge DAG. Parameterize as a linear SEM, accepting defaults. Run CPC. Attach another search box to data. Run PC. Layout the PC graph using Fruchterman-Reingold. Copy the layout to the CPC graph. Open PC and CPC simultaneously and note the differences.

Special Variants of Algorithms CFCI Same idea as for CPC but for FCI instead. KPC The PC algorithm typically uses independence tests that assume linearity. The KPC algorithm makes two changes: It uses a non-parametric independence test. It adds some steps to orient edges that are unoriented in the PC pattern.

Special Variants of Algorithms PcLiNGAM If some variables are Gaussian (more than one), others non-Gaussian, this algorithm applies. Runs PC, then orients the unoriented edges (if possible) using non-Gaussianity. LiNG Extends LiNGAM to orient cycles using non-Gaussianity

Special Variants of Algorithms JCPC Uses a Markov blanket style test to add/remove individual edges, using CPC style orientation. Allows individual adjacencies in the graph to be revised from the initial estimate using the PC adjacency search.

Time Series Simulation (Hands On) Tetrad includes support for doing time series simulations. First, one creates a time series graph. Then one parameterizes the time series graph as a SEM. Then one instantiates the SEM. Then one simulates data from the SEM Instantiated Model.

Time Series Simulation One can, e.g., calculate a vector auto-regression for it. (One can do this as well from time series data loaded in.) Attach a data manipulation box to the data. Select vector auto-regression. Attach a search and run GES. Should give the graph among concurrent variables. One can create staggered time series data and run GES. Attach a data manipulation box. Select create time series data. Attach a search box and run GES. Should give the time lag graph with some extra edges in the highest lag.

Command Line Tetrad We don’t have an extensive command line interface programmed, but what we do have has proven useful to many people. We have a command line interface for a number of the basic search algorithms in Tetrad. We also have a command line interface for the IMaGES algorithm. Some upcoming version of Tetrad will include a more extensive command line interface.

How to get it Go to the Tetrad downloads directory, ad/ Look for files beginning with the prefix “tetradcmd-”. Pick the one with the latest version.

How to run a search at the command line... Example: java -jar tetradcmd jar -data munin1.txt -datatype discrete –algorithm fci -depth 3 -significance 0.0

Command line options -data: Gives the data file -datatype: continuous or discrete (mixed not supported) -algorithm: pc, cpc, fci, cfci, ccd, ges -depth: Default is -1 (unlimited) -significance: Default is Some others.

IMaGES command line IMaGES (which I’ll talk about) is a more specialized algorithm and uses its own command line interface. me if you’d like to use it.

Tetrad Source We regularly get requests for the Tetrad source code. The secret is, it’s online, freely available, you just have to know where to look! Again, look in the Tetrad downloads directory Look for the latest “dist” (distribution) file, unzip it.

Source All of the code will be in the distribution, except for private project code. This can be useful if you want to modify or extend algorithms, or if you want to set up specific kinds of testing, or if the command line tools provided are insufficient for your needs.

Java The source code is in Java, which can be interfaced with several other platforms with a bit of work. Matlab, R, Mathematica, also can be called from the command line programmatically from various languages. Also, since it’s in Java, it’s cross-platform compatible, so it will probably run on your machine.