Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.

Slides:



Advertisements
Similar presentations
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Advertisements

. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Gene expression analysis summary Where are we now?
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Functional genomics and inferring regulatory pathways with gene expression data.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Epistasis Analysis Using Microarrays Chris Workman.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Discovering Regulatory Networks from Gene Expression and Promoter Sequence Eran Segal Stanford University.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
RNAseq analyses -- methods
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to biological molecular networks
Cluster validation Integration ICES Bioinformatics.
Flat clustering approaches
Module Networks BMI/CS 576 Mark Craven December 2007.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Thanh Le, Katheleen J. Gardiner University of Colorado Denver
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Canadian Bioinformatics Workshops
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Principle of Epistasis Analysis
Predicting Gene Expression from Sequence
Distinct subtypes of CAFs are detected in human PDAC
Presentation transcript:

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

ActivatorRepressor Regulated gene ActivatorRepressor Regulated gene Activator Regulated gene Repressor State 1 Activator State 2 Activator Repressor State 3 Gene Regulation: Simple Example Regulated gene DNA Microarray Regulators DNA Microarray Regulators

truefalse true false Regulation Tree Activator? Repressor? State 1State 2State 3 true Regulation program Module genes Activator expression Repressor expression Genes in the same module share the same regulation program

Module Networks Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators Modules HAP4  CMK1  true false true false

Expression level in each module is a function of expression of regulators Module Network Probabilistic Model Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level What module does gene “g” belong to? Expression level of Regulator 1 in experiment BMH1  GIC2  Module P(Level | Module, Regulators) HAP4  CMK1  0 0 0

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

Learning Problem Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level HAP4  CMK1  Find gene module assignments and tree structures that maximize P(M|D) Goal: Gene module assignments Tree structures Hard Genes: Regulators: ~500

Learning Algorithm Overview Relearn gene assignments to modules clustering Gene module assignment Regulatory modules Learn regulation programs HAP4  CMK1 

Learning Regulation Programs Experiments Module genes Experiments sorted in original order Experiments sorted by Hap4 expression log P(M|D)  log P(D| ,  ) + log P( ,  ) HAP4  log P(M|D)  log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(  HAP4 ,  HAP4 ,  HAP4 ,  HAP4  ) SIP4  log P(M|D)  log P(D SIP4  |  SIP4 ,  SIP4  ) + log P(D SIP4  |  SIP4 ,  SIP4  ) + log P(  SIP4 ,  SIP4 ,  SIP4 ,  SIP4  ) log P(M|D)  log P(D HAP4  |  HAP4 ,  HAP4  ) + log P(D CMK1  |  CMK1 ,  CMK1  ) + log P(D CMK1  |  CMK1 ,  CMK1  ) + … HAP4  CMK1  Module genes Hap4 expression Regulator

Learning Algorithm Performance Bayesian score (avg. per gene) Algorithm iterations Algorithm iterations Gene module assignment changes (% from total) Significant improvements across learning iterations Many genes (50%) change module assignment in learning

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

Yeast Stress Data Genes Selected 2355 that showed activity Experiments (173) Diverse environmental stress conditions: heat shock, nitrogen depletion,…

Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Expression level of each gene is a function of expression of regulators Fragment of learned Bayesian network 2355 variables (genes) 173 instances (experiments)

Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Module Network SPRKF ’03 (UAI) Solutions Robustness  sharing parameters Interpretability  module-level model Regulator 1 Regulator 2 Regulator 3 Level Module

Comparison to Bayesian Networks Problems Robustness Interpretability Solutions Robustness  sharing parameters Interpretability  module-level model Test Data Log-Likelihood (gain per instance) Number of modules Bayesian Network performance Learn which parameters are shared (by learning which genes are in the same module)

Module From Model to Regulatory Modules Regulator 1 Regulator 2 Regulator 3 Level HAP4  CMK1  Biologically relevant? HAP4  CMK1  0 0 0

Respiration Module Regulation program Module genes Energy production (oxid. phos. 26/55 P< ) Hap4+Msn4 known to regulate module genes Module genes functionally coherent? Module genes known targets of predicted regulators?   Predicted regulator

Energy, Osomlarity, & cAMP Signaling Tpk1:  Regulation by non-TFs (Tpk1 is a catalytic unit of cAMP dependent protein kinase)  Module contains known Tpk1 targets (e.g. Tps1)  Tpk1-mediated STRE motif (50/64 genes; p<3x )

EM: Biological Improvement

Hap4Xbp1Yer184cYap6Gat1Ime4Lsg1Msn4Gac1Gis1 Ypl230w Not3Sip2 Amino acid metabolism Energy and cAMP signaling DNA and RNA processing nuclear STREN41HAP REPCARCAT8N26ADR HSFHAC1XBP MCM1N ABF_CN Kin82Cmk1Tpk1Ppt1 N11GATA 8109 GCN4CBF1_B Tpk2Pph N14N13 Regulation supported in literatureRegulator (Signaling molecule)Regulator (transcription factor) Inferred regulation 48 Module (number) Experimentally tested regulator Enriched cis-Regulatory Motif Bmh1Gcn20 GCR1 18 MIG1N18 11

Biological Evaluation Summary Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? 46/50 30/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) Known targets = direct biological experiments reported in the literature

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

From Model to Detailed Predictions Prediction: Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment HAP4  Ypl230w X ?

Does ‘X’ Regulate Predicted Genes? Experiment: knock out Ypl230w (stationary phase) 1334 regulated genes (312 expected by chance) wild-typemutant >4x Regulated genes Rank modules by regulated genes Predicted modules ModuleSig. Protein foldingP< Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 ModuleSig. Protein foldingP< Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 Modules predicted to be regulated by Ypl230w Ypl230w regulates computationally predicted genes

Regulated genes (1014) Ppt1 knockout (hypo-osmotic stress) wild-typemutant Regulated genes (1034) wild-typemutant Kin82 knockout (heat shock) ModuleSig. Energy and osmotic stressP< Energy, osmolarity & cAMP signalingP<0.006 mRNA, rRNA and tRNA processingP<0.02 ModuleSig. Ribosomal and phosphate metabolismP<0.009 Amino acid and purine metabolismP<0.01 mRNA, rRNA and tRNA processingP<0.02 Protein foldingP<0.02 Cell cycleP<0.02 Does ‘X’ Regulate Predicted Genes?

Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes New yeast biology suggested Ypl230w activates protein-folding, cell wall and ATP-binding genes Ppt1 represses phosphate metabolism and rRNA processing Kin82 activates energy and osmotic stress genes

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC

Why does it work? Underlying assumption: Regulators are transcriptionally regulated Regulators are part of regulatory structures in which they are themselves regulated* Statistical methods can detect associations between regulators and their targets * [Shen-Orr et al., ’02] find many such structures

Regulator Chain Respiration module Time Active protein level mRNA expression level Phd1 Hap4 Targets Phd1 Hap4 Targets Phd1 (TF) Hap4 (TF) Cox4Cox6Atp17 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

Auto Regulation Snf kinase regulated processes module Yap6 (TF) Vid24Tor1Gut2 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

Positive Signaling Loop Sporulation and cAMP pathway module Sip2 (SM) Msn4 (TF) Vid24Tor1Gut2 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

Negative Signaling Loop Energy and osmotic stress module Tpk1 (SM) Msn4 (TF) Nth1Tps1Glo1 Black: regulators that cannot be detected Red: correctly predicted regulator Blue: targets

Why Does it Work? Feed-forward and feedback loops Some transcription factors and signal transduction molecules have a detectable expression signature Module Networks infers their regulatory relationships

Assignment Download the yeast stress expression dataset Download the list of transcription factor regulators Randomly partition the dataset in a 5-fold cross validation scheme For k=50: Create a hard-clustering model (use code from earlier exercise). At each array, this model has a separate Gaussian distribution for each of the 50 values of the cluster variable Use the assignment of genes to clusters that you learned in the hard-clustering, and for each cluster, learn a decision tree with at most: (1) one split (2) two splits (3) three splits Note 1: allow only splits with >=5 arrays in each side of the split Note 2: split question is whether the expression level of the transcription factor is greater than some value

Assignment Continued Note 3: at each leaf of the resulting model, there is a single Gaussian distribution that is used for all arrays that map to that leaf Compute the log-likelihood of the test data for each model (hard-clustering, and each of the three regulation models) Plot the avg. and std. test log-likelihood for each model For the model with two splits on each cluster, use the Gaussian distribution at each array to sample a new expression dataset with exactly the same number of genes and number of arrays. For each original gene and array, you sample from the Gaussian distribution associated with that gene and that array Learn a model with two splits for each cluster Plot the number of regulation tree splits that are identical between the model that sampled the data and the new model that you learned