IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Objectives (BPS chapter 24)
Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Functional genomics and inferring regulatory pathways with gene expression data.
Differentially expressed genes
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Modeling the Gene Expression of Saccharomyces cerevisiae Δcin5 Under Cold Shock Conditions Kevin McKay Laura Terada Department of Biology Loyola Marymount.
Radial Basis Function Networks
Software Refactoring and Usability Enhancement for GRNmap, a Gene Regulatory Network Modeling Application Mathematical Model Equation 2. Equation 3. Future.
Inferential Statistics
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Deletion of ZAP1 as a transcriptional factor has minor effects on S. cerevisiae regulatory network in cold shock KARA DISMUKE AND KRISTEN HORSTMANN MAY.
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
GRNmap Testing Analysis Grace Johnson and Natalie Williams June 10, 2015.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Geo597 Geostatistics Ch9 Random Function Models.
Reconstruction of Transcriptional Regulatory Networks
Bioinformatics III1 V9 Topologies and Dynamics of Gene Regulatory Networks Who are the players in GRNs?SILAC technology What are the kinetic rates? DREAM3.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
GRNmap and GRNsight June 24, Systems Biology Workflow DNA microarray data: wet lab-generated or published Generate gene regulatory network Modeling.
Data Analysis and GRNmap Testing Grace Johnson and Natalie Williams June 24, 2015.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
In silico gene targeting approach integrating signaling, metabolic, and regulatory networks Bin Song Jan 29, 2009.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
© Copyright McGraw-Hill 2004
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Student: Trixie Anne M. Roque, Tessa A. Morris Faculty Mentors: Dr. Kam D. Dahlquist, Dr. Ben G. Fitzpatrick, & Dr. John David N. Dionisio SURP 2015 Final.
Bioinformatics 3 – WS 15/16 V 9 – Bioinformatics 3 V9 – Reconstruction of Gene Regulatory Networks - Benchmarking Mon, Nov 23,
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Journal club Jun , Zhen.
Chapter 7. Classification and Prediction
Student: Trixie Anne M. Roque, Tessa A. Morris
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
Evaluation of inferred networks
Markov Random Fields Presented by: Vladan Radosavljevic.
Correlation and Regression
Principle of Epistasis Analysis
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Presentation transcript:

IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., & Gerstein, M. (2010). I PloS one, 5(1), e8121. Grace Johnson, Tessa Morris, and Trixie Roque Loyola Marymount University June 10, 2015

Integration of Models from Deletion and Perturbation Data Resulted in Successfully Modeling GRNs ●Won the DREAM3 Challenge, a competition for reverse engineering GRNs, by combining a Noise Model from deletion data and Differential Equation Model from perturbation data ●Created their integrated model from their two types of data and made their prediction in seven batches ●Summary: It is beneficial to use multiple data sources ●Implications: It would be advantageous to look at a noise model in our work

Integration of Models from Deletion and Perturbation Data Resulted in Successfully Modeling GRNs ●Won the DREAM3 Challenge, a competition for reverse engineering GRNs, by combining a Noise Model from deletion data and Differential Equation Model from perturbation data ●Created their integrated model from their two types of data and made their prediction in seven batches ●Summary: It is beneficial to use multiple data sources ●Implications: It would be advantageous to look at a noise model in our work

Constructed GRNs for Fifteen Known Regulatory Networks ●Computationally reconstructed GRNs using provided data from Yeast and E. Coli o Each node represents a TF (gene and protein) o Edges show regulatory relationships between nodes ●Attempted to model 15 known regulatory networks

Found Simple Regulatory Relationships from Deletion Data and More Complex Ones from Perturbation Data ●Deletion data may not be sufficient for decoding complicated regulation (gene is expressed as long as one of the TFs is active) ●Traditional time course data can be used to detect missing edges (low abundance and impaired expression rate) ●Learned simple regulatory relationships from deletion data by noise models o Homozygous vs. Heterozygous deletion data ●Learned more complex regulatory relationships from perturbation data by differential equation models ●Integrate the two models to predict the GRNs

Noise Model Determines if the Deviation Between Expression Level in the Deletion Strain and WT is Due to Noise 1.Calculate the probability of regulation for each pair of genes based on the current reference points. a.Observed deviation must be less than 0.05 to be treated as a potential regulation 2.Using the set P to re-estimate the variance of the Gaussian noise 3.Re-estimate each gene’s wild-type expression level by the mean of its observed expression levels in strains in which its expression level is unaffected by the deletion 4.After the iterations the probability of regulation is computed by using the final estimate of the reference points and the variance of the Gaussian noise

Two Differential Equations Were Used to Model Perturbation Data 1.General form 2.Linear model: assumes a linear relationship between the expression level of the regulators and the resulting expression rate of the target ○Advantage: small number of parameters (|S| +2) ○Disadvantage: Real biological regulatory systems seem to exhibit nonlinear characteristics 3.Sigmoidal Model: assumes a sigmoidal relationship between the regulators and the target (|S| +3) parameters 4.Least Square Optimizer: used to determine which regulator set (S) predicts the observed expression levels well

We Use a Similar Differential Equation to Model Gene Expression for Cold Shock Our Sigmoidal model Their Sigmoidal model

Integration of Models from Deletion and Perturbation Data Resulted in Successfully Modeling GRNs ●Won the DREAM3 Challenge, a competition for reverse engineering GRNs, by combining a Noise Model from deletion data and Differential Equation Model from perturbation data ●Created their integrated model from their two types of data and made their prediction in seven batches ●Summary: It is beneficial to use multiple data sources ●Implications: It would be advantageous to look at a noise model in our work

Performance of the Integrated Model was Determined by Grouping Predictions in Batches ●Batches were created to rank pairwise predictions according to confidence o Batch 1: all predictions from noise model homozygous data with probability of regulation greater than 0.99 o Batch 2: significant predictions according to both differential equation models (linear and sigmoidal) o Batch 3: significant predictions according to both differential equation models, where the regulator sets are guided by predictions made from the previous batches o Batch 4: same as Batch 2, except predictions can be made by either linear OR sigmoidal o Batch 5: same as Batch 3, except predictions can be made by either linear OR sigmoidal o Batch 6: all predictions from both heterozygous and homozygous noise models with a probability of regulation greater that 0.95 and the same sign prediction o Batch 7: all remaining predicted regulation pairs

AUROC: area under the receiver-operator characteristics curve pAUROC: the p-value of AUROC based on the distribution of AUROC values in 100,000 random network link permutations Pairwise Predictions were Significantly Better than Random, Regardless of Network Size

●Their model fails to distinguish between the direct and indirect regulation. ●Their model: G01 activates G09 and represses G04 ●Actual network: G01 represses G04 which represses G09 Actual network Their top 10 predictions Their Model Cannot Distinguish Between Direct and Indirect Data

For size 10 networks, overall predictions are 18% accurate. Predictions made by Batch 1 are 71% accurate The Best Predictions are Made by Batch 1 from the Noise Model

For size 50 networks, overall predictions are 4.5% accurate Predictions made by Batch 1 are 48% accurate The Best Predictions are Made by Batch 1 from the Noise Model

For size 100 networks, overall predictions are 2.7% accurate Predictions made by Batch 1 are 34% accurate The Best Predictions are Made by Batch 1 from the Noise Model

Switching the Order of Batches 1 and 2 Does Not Change the Number of Correct Predictions ●In addition, it was found that most predictions previously made by the noise model were not predicted by the differential equation models. These are hypothesized as unique predictions due to indirect or more complex regulation events

The Qualitative Importance of the Differential Equation Models is Shown by p-values of Batches 2-6 In half the cases, predictions made in batches 2-6 are significantly better than random at the 0.05 level.

Predictions Made from the Two Models are Complementary (b) Deleting G3 results a small increase in expression of G7 that is difficult to detect (c) Expression of G7 increases even though expression of G8 and G10 remains high (e) Deleting G5 has a negligible effect on G6 because this interaction is masked by G1 (f) Expression of G6 is anti-correlated with G1 (suppressor)

Integration of Models from Deletion and Perturbation Data Resulted in Successfully Modeling GRNs ●Won the DREAM3 Challenge, a competition for reverse engineering GRNs, by combining a Noise Model from deletion data and Differential Equation Model from perturbation data ●Created their integrated model from their two types of data and made their prediction in seven batches ●Summary: It is beneficial to use multiple data sources ●Implications: It would be advantageous to look at a noise model in our work

Summary ●Most correct predictions come from Batch 1 based on noise model o A more accurate statement is that the noise model is supplemented by the differential equation model o It does, to some extent, demonstrate the advantage of combining multiple types of data -- correct predictions made by the noise model were not made by the differential equation model, and vice versa, showing the two models are complementary ●Benefit of noise model: takes much less computation power and time ●Results demonstrate the advantages of combining multiple types of data

Integration of Models from Deletion and Perturbation Data Resulted in Successfully Modeling GRNs ●Won the DREAM3 Challenge, a competition for reverse engineering GRNs, by combining a Noise Model from deletion data and Differential Equation Model from perturbation data ●Created their integrated model from their two types of data and made their prediction in seven batches ●Summary: It is beneficial to use multiple data sources ●Implications: It would be advantageous to look at a noise model in our work

Though Their Approach is Slightly Different, We Could Benefit by Adopting Some of their Methods ●Their differential equation was extremely similar to ours ●Each regulatory relationship is considered independently of the other connections in the network ●Perturbation and deletion data are analyzed separately with different models, then their predictions are combined o In our work, we combine perturbation and deletion in our raw data, and analyze with one model ●Where ours is a differential equation model, theirs is essentially a noise model supplemented by differential equation model ●It could be beneficial to use their noise model the initial process of choosing which genes to go in our network

Acknowledgments ●Dr. Dahlquist ●Dr. Fitzpatrick ●Dondi