No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments)

Slides:



Advertisements
Similar presentations
Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics.
Advertisements

Learning Algorithm Evaluation
Gall C, Katch A, Rice T, Jeffries HE, Kukuyeva I, and Wetzel RC
Lab 1. Overview  Instructor Introduction & Syllabus Distribution Attendance – Don’t miss labs. Assignments – Things are due EVERY week. See calendar/table.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Chapter 9 Tests of Significance Target Goal: I can perform a significance test to support the alternative hypothesis. I can interpret P values in context.
Assessment of Risk Against Regulator Requirements for Duration of Long Supply Interruptions Mohd Ikhwan Muhammad Ridzuan Dr Sasa Djokic The University.
Breast cancer is a complex and heterogeneous disease Tumor samples Protein expression Clinical features Mutational status Adapted from TCGA, Nature 2012.
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.
Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.
Reliability and Validity in Experimental Research ♣
PSY 307 – Statistics for the Behavioral Sciences
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Correlational Designs
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Inferential Statistics: SPSS
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Discovering Interesting Subsets Using Statistical Analysis Maitreya Natu and Girish K. Palshikar Tata Research Development and Design Centre (TRDDC) Pune,
Chapter 11 Inference for Distributions AP Statistics 11.1 – Inference for the Mean of a Population.
Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Assume we have two experimental conditions (j=1,2) We measure expression of all genes n times under both experimental conditions (n two- channel.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Experimental Evaluation of Learning Algorithms Part 1.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Evaluating Results of Learning Blaž Zupan
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Statistical Testing with Genes Saurabh Sinha CS 466.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
The sbv IMPROVER species translation challenge Sometimes you can trust a rat Sahand Hormoz Adel Dayarian KITP, UC Santa Barbara Gyan Bhanot Rutgers Univ.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
7. Performance Measurement
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Evaluation of IR Systems
Evaluating Results of Learning
Statistical Testing with Genes
Significance analysis of microarrays (SAM)
Employee Turnover: Data Analysis and Exploration
Evaluation and Its Methods
Predict Failures with Developer Networks and Social Network Analysis
Learning Algorithm Evaluation
Ensembles.
Volume 4, Issue 1, Pages e10 (January 2017)
Areas of Research … Causal Discovery Application Integration
Evaluation and Its Methods
Roc curves By Vittoria Cozza, matr
Statistical Testing with Genes
Evaluation and Its Methods
Reliability of Assessment of Protein Structure Prediction Methods
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Volume 3, Issue 6, Pages e3 (December 2016)
Presentation transcript:

No definitive “gold standard” causal networks Use a novel held-out validation approach, emphasizing causal aspect of challenge Training Data (4 treatments) Test Data (N-4 treatments) FGFR1/3i AKTi AKTi+MEKi DMSO All Data (N treatments) Test1 Test2 …. Test(N-4) Participants infer 32 networks using training data Inferred networks assessed using test data

Assessment: How well do inferred causal networks agree with effects observed under inhibition in test data? Step 1: Identify “gold standard” with a paired t-test to compare DMSO and test inhibitors for each phosphoprotein and cell line/stimulus regime phospho1 (a.u.) p-value = 3.2x10 -5 time DMSO Test1 Phospho2 (a.u.) time DMSO Test p-value = 0.45 e.g. UACC812/Serum, Test1 phosphoproteins “gold standard”

FP TP Compare descendants of test inhibitor target to “gold standard” list of observed effects in held-out data #TP(τ), #FP(τ) Step 2: Score submissions threshold, τ Vary threshold τ ROC curve and AUROC score # TP # FP Test1 phosphoproteins AUROC Test1 Obtain protein descendants downstream of test inhibitor target Matrix of predicted edge scores for a single cell line/stimulus regime

74 final submissions Each submission has 32 AUROC scores (one for each cell line/stimulus regime) 3.58 x x x x non-significant AUROC significant AUROC best performer

Scoring procedure: 1.For each submission and each cell line/stimulus pair, compute AUROC score 2.Submissions ranked for each cell line/stimulus pair 3.Mean rank across cell line/stimulus pairs calculated for each submission 4.Rank submissions according to mean rank 32 cell line/stimulus pairs Submissions AUROC scores 32 cell line/stimulus pairs Submissions AUROC ranks Submissions mean rank Submissions final rank

Verify that final ranking is robust Procedure: 1.Mask 50% of phosphoproteins in each AUROC calculation 2.Re-calculate final ranking 3.Repeat (1) and (2) 100 times phosphoproteins rank Top 10 teams 5.40 x

Gold-standard available: Data-generating causal network Participants submitted a single set of edge scores Edge scores compared against gold standard -> AUROC score Participants ranked based on AUROC score 3.11 x non-significant AUROC (51) significant AUROC (14) best performer Robustness Analysis: 1.Mask 50% of edges in calculation of AUROC 2.Re-calculate final ranking 3.Repeat (1) and (2) 100 times rank Top 10 teams 3.90 x

59 teams participated in both SC1A and SC1B Reward for consistently good performance across both parts of SC1 Average of SC1A rank and SC1B rank Top team ranked robustly first

FGFR1/3i AKTi AKTi+MEKi DMSO Test1 Test2 …. Test(N-4) Training Data (4 treatments) Test Data (N-4 treatments) All Data (N treatments) Participants build dynamical models using training data and make predictions for phosphoprotein trajectories under inhibitions not in training data Predictions assessed using test data

Participants made predictions for all phosphoproteins for each cell line/stimulus pair, under inhibition of each of 5 test inhibitors Assessment: How well do predicted trajectories agree with the corresponding trajectories in the test data? Scoring metric: Root-mean-squared error (RMSE), calculated for each cell line/phosphoprotein/test inhibitor combination e.g. UACC812, Phospho1, Test1

14 final submissions 1.35 x x x x non-significant AUROC significant AUROC best performer Final ranking: Analogously to SC1A, submissions ranked for each regime and mean rank calculated

Verify that final ranking is robust Procedure: 1.Mask 50% of data points in each RMSE calculation 2.Re-calculate final ranking 3.Repeat (1) and (2) 100 times Top 10 teams x rank 6.97 x Incomplete submission 2 best performers

Participants made predictions for all phosphoproteins for each stimulus regime, under inhibition of each phosphoprotein in turn Scoring metric is RMSE and procedure follows that of SC2A x x Robustness Analysis: 1.Mask 50% of data points in each RMSE calculation 2.Re-calculate final ranking 3.Repeat (1) and (2) 100 times non-significant AUROC significant AUROC best performer 7.71 x Top 10 teams rank 0.99 Incomplete submission

10 teams participated in both SC2A and SC2B Reward for consistently good performance across both parts of SC2 Average of SC2A rank and SC2B rank Top team ranked robustly first

14 submissions 36 HPN-DREAM participants voted – assigned ranks 1 to 3 Final score = mean rank (unranked submissions assigned rank 4)

Submissions rigorously assessed using held-out test data: SC1A: Novel procedure used to assess network inference performance in setting with no true “gold standard” Many statistically significant predictions submitted For further investigation: Explore why some regimes (e.g. cell line/stimulus pairs) are easier to predict than others Determine why different teams performed well in experimental and in silico challenges Identify the methods/approaches that yield the best predictions Wisdom of crowds – does aggregating submissions improve performance and lead to discovery of biological insights?