Areas of Research … Causal Discovery Application Integration

Slides:



Advertisements
Similar presentations
Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
Advertisements

Correlational and Differential Research
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
SEM PURPOSE Model phenomena from observed or theoretical stances
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Simple Linear Regression
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.
PSY 307 – Statistics for the Behavioral Sciences
Additional Topics in Regression Analysis
Review: The Logic Underlying ANOVA The possible pair-wise comparisons: X 11 X 12. X 1n X 21 X 22. X 2n Sample 1Sample 2 means: X 31 X 32. X 3n Sample 3.
Chapter 7: Variation in repeated samples – Sampling distributions
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Correlational Designs
Linear Regression Analysis
Chapter 8 Introduction to Hypothesis Testing
Causal Research Design: Experimentation. Concept of Causality A statement such as "X causes Y " will have the following meaning to an ordinary person.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
+ Research Methods CYPA AP Psychology Review Session 2.
Chapter 18 Some Other (Important) Statistical Procedures You Should Know About Part IV Significantly Different: Using Inferential Statistics.
Interaction Effects and Theory Testing Kaiser et al. (2006) social identity theory –tested hypotheses about attention to prejudice cues in the environment.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a +
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Correlational Research Designs. 2 Correlational Research Refers to studies in which the purpose is to discover relationships between variables through.
What Are Statistics and What are They Used For?. Statistics is the science of collecting, organizing, summarizing, analyzing, and making inferences from.
Solving Problems by searching Well defined problems A probem is well defined if it is easy to automatically asses the validity (utility) of any proposed.
Research Methods How we collect the data Strengths/Weaknesses for each.
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
© 2001 Prentice-Hall, Inc.Chap 7-1 BA 201 Lecture 11 Sampling Distributions.
I231B QUANTITATIVE METHODS ANOVA continued and Intro to Regression.
CORRELATIONS: PART II. Overview  Interpreting Correlations: p-values  Challenges in Observational Research  Correlations reduced by poor psychometrics.
Chapter Two Methods in the Study of Personality. Gathering Information About Personality Informal Sources of Information: Observations of Self—Introspection,
DEPARTMENT OF STATISTICS Statistical literacy. DEPARTMENT OF STATISTICS Damaged for life by too much TV.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Psychology Research Methods. Characteristics of Good Psychological Research © 2002 John Wiley & Sons, Inc.
Variable selection in Regression modelling Simon Thornley.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Methods of Presenting and Interpreting Information Class 9.
Distributions of Nominal Variables
A little VOCAB.
Discovery and Dissemination
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair Research, Department of Biomedical Informatics.
Inferential Statistics
Distributions of Nominal Variables
Virtual COMSATS Inferential Statistics Lecture-26
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.
Correlation A Lecture for the Intro Stat Course
Chapter 15 Linear Regression
Chapter 9 Correlational Research Designs
CHAPTER 10 Correlation and Regression (Objectives)
Myers chapter 1 (C): Experimental Research Designs
I271B Quantitative Methods
2. Find the equation of line of regression
سرطان الثدي Breast Cancer
Discovery and Dissemination
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair, Department of Biomedical Informatics Research involves.
A Short Tutorial on Causal Network Modeling and Discovery
Correlation for a pair of relatives
BA 275 Quantitative Business Methods
DATA.
7.1 Draw Scatter Plots & Best-Fitting Lines
LECTURE 07: BAYESIAN ESTIMATION
Analysis of Complex Designs
Correlation Correlation: a measure of the extent to which two events vary together, and thus how well either predicts the other. The correlation coefficient.
Research Methods & Statistics
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Propagation of Error Berlin Chen
Presentation transcript:

Areas of Research … Causal Discovery Application Integration Robustness Application Causal Discovery Sofia Triantafillou Assistant Professor, Department of Biomedical Informatics, University of Pittsburgh email: sot16@upitt.edu phone: +1 773 403 8781 A B C D E A B D C E …

Integrative Causal Discovery Breast Cancer Protein C Contraceptives Thrombosis Protein Z Protein E Thrombosis Contraceptives Protein C Cancer Protein Y Protein Z Study 1 observational Yes No 10.5 … 0.01 Study 2 0.03 9.3 3.4 22.2 Study 3 RCT Protein C 0 (Control) 5.0 (Treat.) 8.9 Study 4 RCT contraceptives No (Ctrl) Yes(Treat) Same system, different studies -Different variables -Different experimental designs One (true, unknown) Causal Model -marginals/experiments can be modeled with causal graphs Integrative Causal Discovery: Find the causal graph(s) that simultaneously fit all studies

Integrative Causal Discovery How? -Measure conditional independencies in the data. -Constrain graph paths after modeling experiments. -Convert to SAT instance. -Solutions are graphs that fit all observed statistical constraints. Why? -Increase robustness of causal discovery by using all data. -Make novel inferences by combining different data. -e.g. Predict the association (+ correlation coefficient). between variables never measured together. -Predictions successfully validated in 30 public data sets. A B C D E … A B D C E A B D C E Data Causal graph(s) Paths [ E 𝐴→D ∨ E 𝐴→B ∧ E 𝐵→D ∨ E 𝐴→C ∧ E 𝐶→D ∨ ⋮ [ E 𝐴→C ∨ E 𝐴→B ∧ E 𝐵→C ∨ E 𝐴↔C ∧ E 𝐶→D ∨ (In)dependencies Logic formula

Robust Causal Discovery Breast Cancer Protein C Contraceptives Thrombosis Protein Z Protein E Breast Cancer Protein C Contraceptives Thrombosis Protein Z Protein E Breast Cancer Protein C Contraceptives Thrombosis Protein Z Protein E Breast Cancer Protein C Contraceptives Thrombosis Protein Z Protein E best fitting graph Close to best fitting graphs What is P(Contraceptives --> Thrombosis | Data)? How? -Compute the probability of a graph (not very easy when you have confounders). -Find the probability of causal features over all graphs -Efficiency? Why? -Many graphs fit the data (almost) equally well. -In low sample sizes, it is hard to distinguish. -Be conservative: Identify features that are present in most high-probability graphs.

Applied Causal Discovery Identify causal protein phosphorylation signaling relationships from mass cytometry data. Local Causal Discovery Predictions phosphoprotein A -> phosphoprotein B Reproducibility in independent data sets Mass cytometry data (Bendall et al, 2011)