From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Methods to read out regulatory functions
Periodic clusters. Non periodic clusters That was only the beginning…
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Identifying co-regulation using Probabilistic Relational Models by Christoforos Anagnostopoulos BA Mathematics, Cambridge University MSc Informatics, Edinburgh.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Functional genomics and inferring regulatory pathways with gene expression data.
Transcription factor binding motifs (part I) 10/17/07.
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony.
Ab initio motif finding
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
Epistasis Analysis Using Microarrays Chris Workman.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Analysis of the yeast transcriptional regulatory network.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Module Networks BMI/CS 576 Mark Craven December 2007.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Transcription factor binding motifs (part II) 10/22/07.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Regulation of Gene Expression
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Dennis Shasha, Courant Institute, New York University With
Presented by, Jeremy Logue.
Volume 106, Issue 6, Pages (September 2001)
Presented by, Jeremy Logue.
Presentation transcript:

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.) Nir Friedman (Hebrew U.) Daphne Koller (Stanford)

Understanding Cellular Processes u Complex biological processes (e.g. cell cycle)  Coordination of multiple events  Each event requires different modules S G2 M G1 Can we recover the regulatory circuits that control such processes?

Gene Structure Coding Region Promoter Region CTAGTAGATATCGATCAG mRNA Protein

Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A AGACTTCAGA Sequence Motif mRNA

Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Swi5 - Transcription Factor mRNA

Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Activated A Swi5 mRNA More mRNA (higher expression)

Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Activated A Swi5 B B B B AGTTGA mRNA

Gene Regulation Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 A A A Swi5 B B B B Ndd1 Activated B A +mRNA

Goal ACTAGTGCTGA CTATTATTGCA CTGATGCTAGC + AGCTAGCTGAGACTGCACACTGATCGAG CCCCACCATAGCTTCGGACTGCGCTATA TAGACTGCAGCTAGTAGAGCTCTGCTAG AGCTCTATGACTGCCGATTGCGGGGCGT CTGAGCTCTTTGCTCTTGACTGCCGCTTA TTGATATTATCTCTCTTGCTCGTGACTGC TTTATTGTGGGGGGGACTGCTGATTATGC TGCTCATAGGAGAGACTGCGAGAGTCGT CGTAGGACTGCGTCGTCGTGATGATGCT GCTGATCGATCGGACTGCCTAGCTAGTA GATCGATGTGACTGCAGAAGAGAGAGGG TTTTTTCGCGCCGCCCCGCGCGACTGCT CGAGAGGAAGTATATATGACTGCGCGCG CCGCGCGCCGGACTGCAGCTGATGCAT GCATGCTAGTAGACTGCCTAGTCAGCTG CGATCGACTCGTAGCATGCATCGACTGC AGTCGATCGATGCTAGTTATTGGACTGC GTAGTAGTGCGACTGCTCGTAGCTGTAG R(t 1 ) G1 t 1 Motif R(t 2 ) G2 t 2 Motif

Model of Gene Regulation GeneExperiment Expression Sequence Probabilistic Relational Models (PRMs) Pfeffer and Koller (1998) Friedman et al (1999) Segal et al (2001) Promoter sequences Regulation by transcription factors Expression measurements Context Cluster

Regulation to Expression Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type R(t 1 ) = yes  t 1 regulates gene R(t 1 ) = no  t 1 does not regulate gene Exp. cluster

Regulation to Expression Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type R(t 1 ) R(t 2 ) E type   0 0 I II … CPD P(Level) Level P(Level) Level Exp. cluster

Modeling Context Specificity Level GeneExperiment Expression R(t 1 ) Exp. type Exp. type = G1 R(t 2 )=ye s true false true R(t 1 ) = Yes false true false... 3 P(Level) Level 0 P(Level) Level 2 P(Level) Level u Gaussian decision tree u T1 only relevant in G1 u T2 only relevant in G2 Exp. cluster R(t 2 )

Sequence Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Sequence Assumptions:  Binding site is of length k  Binding may occur at any k-mer  TF regulates gene if binding occurs anywhere Exp. cluster

From Sequence to Regulation u Assumptions:  Binding site is of length k  Binding may occur at any k-mer  TF regulates gene if binding occurs anywhere u PSSM:  Background distribution  Motif distribution  Discriminative training where

From Sequence to Regulation u Model for one gene g, promoter region of length 5 and k=2 S1S1 S3S3 S2S2 S4S4 S5S5 sequence residues g.R(t) variable for “t regulates g” m[1].B m[2].B m[3].B m[4].B k-mer binding events Logistic function motif model

Joint Probabilistic Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster k-mer s1s1 sksk … B(t 1 )B(t 2 ) Discriminative model: Maximizes Discriminative model: Maximizes

Localization Assay

Swi5 DNA u Induce TF protein level Swi5

DNA Localization Assay Swi5 Gene Bound Gene Not Bound  TF binds to targets u Induce TF protein level

Localization Assay DNA u Measure TF binding to promoter of every gene  Assign confidence for each binding Swi5 Gene Bound Gene Not Bound  TF binds to targets u Induce TF protein level

Localization Assay Simon et al (2001) u Localization data: measure TF binding to promoter of each gene (assign binding confidence)

Is Regulation Observed? u Not quite… u Localization is measured for specific conditions u Localization is measured for large DNA regions u Localization is noisy

Incorporating Localization Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster L(t 1 ) L(t 2 ) Observed localization u Localization p-value is noisy sensor of actual regulation  If regulation occurs, p-value likely to be low  If no regulation, p-value likely to be high

Gene R(t 1 ) L(t 1 ) Localization Model u Localization p-value is noisy sensor of actual regulation  If regulation occurs, p-value likely to be low  If no regulation, p-value likely to be high Observed

Joint Probabilistic Model Level GeneExperiment Expression R(t 1 ) R(t 2 ) Exp. type Exp. Cluster promoter s1s1 sksk … L(t 1 ) L(t 2 )

Learning the Models ACGCCTAACGCCTA Experimental Details L E A R N E R Level Gene R(t 1 ) R(t 2 ) Ehase ster Clu s1s1 sksk B(t 1 )B(t 2 ) Localization Data Exp. Phase = IV R(t 1 ) true false true R(t 1 ) = Yes false R(t 2 ) = Yes true false truefalse R(t 1 ) R(t 2 ) E Phase   0 0 I II …

Learning the Models u Ndd1 activates Ace2 and Swi5 in G1, which together activate in S u Mcm1 activates the DNA repair pathway in S ACGCCTAACGCCTA Experimental Details L E A R N E R Level Gene R(t 1 ) R(t 2 ) Ehase ster Clu s1s1 sksk B(t 1 )B(t 2 ) Localization Data

Model Learning u Structure Learning:  Tree structure u Missing Data:  Experiment cluster  Regulation variables u Motif Model:  Parameter estimation u Expectation Maximization u Bayesian score u Heuristic search u Discriminative training (conjugate gradient)

Model Learning Gene Expression R(t 2 ) R(t 1 ) Experiment Exp. type Level + Experimental Details Localization Data ACGCCTAACGCCTA promoter s1s1 sksk … Exp. cluster L(t 1 )

Resulting Bayesian Network Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster

Model Learning: E-Step Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster Loopy belief propagation

Model Learning: M-Step Level 1,2 R(t 2 ) 1 R(t 1 ) 1 Exp. type Exp. type 2 Level 1,1 Level2, 2 R(t 2 ) 2 R(t 1 ) 2 Level 2,1 Level 3,2 R(t 2 ) 3 R(t 1 ) 3 Level 3,1 L(t 2 ) 1 L(t 1 ) 1 L(t 2 ) 2 L(t 1 ) 2 L(t 2 ) 3 L(t 1 ) 3 s 11 s k1 s 12 s k2 s 13 s k3 Exp. cluster Standard ML estimation Conjugate Gradient

Experimental Results Yeast u Cell Cycle expression data (Spellman et al) u Localization data for 9 TFs (Simon et al) u Yeast genome (promoters)

Generalization Level Gene Expression R(t 1 ) R(t 2 ) Experiment Exp. Cluster Gene log-likelihood u Clustering genes

Generalization Level Gene Expression L(t 1 ) L(t 2 ) Experiment Exp. type Gene log-likelihood u Clustering genes u Localization

Generalization Level Gene Expression R(t 1 ) R(t 2 ) Experiment Exp. type Exp. Cluster L(t 1 ) L(t 3 ) Gene log-likelihood u Clustering genes u Localization u Localization + exp. cluster

Generalization Level Gene Expression R(t 1 ) R(t 2 ) promoter s1s1 sksk … Experiment Exp. type Exp. Cluster L(t 1 ) L(t 3 ) Gene log-likelihood u Clustering genes u Localization u Localization + exp. cluster u + Sequence

Generating Hypotheses Example: Genes regulated by Swi6, not by Mcm1 and not by Fkh2, exhibit unique expression pattern in phase G1 in the cell cycle Gene functions: DNA repair [P 3e-09] DNA synthesis [P 7e-05]

Expression vs Regulation alpha cdc15cdc28elu Phase Swi5 regulated Swi5 expression Genes predicted to be regulated by Swi5 are probably real Swi5 targets

Combinatorial Effects alpha cdc15cdc28elu Phase Fkh2 & Swi4 Fkh2 & Ndd1

Combinatorial Effects alpha cdc15cdc28elu Mcm1 & Ndd1 Mcm1 & Ace2 Mcm1 & Swi5 Phase

Localization Assignment Changes

Motifs Found u Ndd1 Simon et al. Expanded Set Remaining Genes Expanded set identified additional genes regulated by Ndd1

TFSimonExpandedRestP-Value Ace e-6 Fkh e-10 Fkh e-11 Mbp e-45 Mcm e-18 Ndd e-24 Swi e-26 Swi e-15 Swi e-48

Induced Interaction Network u TF pairs whose regulation predicts expression of same gene cluster Ace2 Swi5 Ndd1 Fkh2 Fkh1 Swi4 Swi6 Mcm1 Mbp1 G1 S G2 M M/G1 M G1 G2 S

Conclusions u Unified probabilistic model explaining gene regulation using sequence, localization and expression data u Models complex interactions between regulators u Discriminative model maximizing P(Expr. | Seq.) u Sequence data helps explain expression patterns

Big Picture u Goal: unified probabilistic framework  Models complex biological domains  Incorporates heterogeneous data u Framework incorporates explicitly within model basic biological building blocks:  Genes, TFs, proteins, patients, cells, species, … u Much closer connection between biology and model  Can read biology directly from model  Can incorporate prior knowledge easily u Can explicitly represent and learn biological models