Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.

Slides:



Advertisements
Similar presentations
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Advertisements

Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Gene Ontology John Pinney
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Gene expression analysis summary Where are we now?
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Functional genomics and inferring regulatory pathways with gene expression data.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Ab initio motif finding
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Analysis of microarray data
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Microarrays (Gene Chips) Pioneered by Pat Brown in mid 1990’s To monitor thousands of mRNAs simultaneously Comparative Northern blot on thousands of genes.
Reconstruction of regulatory modules based on heterogeneous data sources Karen Lemmens PhD Defence September 29th 2008.
Location analysis of transcription factor binding sites Guy Naamati Andrei Grodzovky.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Finish up array applications Move on to proteomics Protein microarrays.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
ICML-Tutorial, Banff, Canada, 2004 Measured by gene expression microarrays Gene Regulation System Biology Gene expression: two-phase process 1.Gene is.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Introduction to biological molecular networks
Regulation of Gene Expression in Bacteria and Their Viruses
Module Networks BMI/CS 576 Mark Craven December 2007.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Results for all features Results for the reduced set of features
Learning gene regulatory networks in Arabidopsis thaliana
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Evaluating classifiers for disease gene discovery
The cBio Cancer Genomics Portal.
Building and Analyzing Genome-Wide Gene Disruption Networks
Cold Adaptation in Budding Yeast
HIS-24 regulates expression of infection-inducible genes.
SEG5010 Presentation Zhou Lanjun.
Principle of Epistasis Analysis
Presentation transcript:

Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne Koller (Stanford)

Our Goals u Find patterns in gene expression data

Experiments Genes Data Organization Induced Repressed i j A ij - mRNA level of gene i in experiment j

Experiments Genes Standard Clustering Organization

Bi-Clustering Organization Experiments Genes Undetected Similarity

Note: rows and columns no longer correspond to genes and experiments Desired Organization Detect similarities over subsets of genes and experiments

Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Incorporate Heterogeneous Data u Find correlations directly u Focus on novel discoveries

Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Our Approach Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type LEARNERLEARNER hypotheses

Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models (Koller & Pfeffer 98; Friedman,Getoor,Koller & Pfeffer 99)

Level Gene Exp. cluster Experiment Gene Cluster Expression + Resulting Bayesian Network Gene Cluster 1 Level 1,1 Gene Cluster 2 Gene Cluster 3 Exp. Cluster 2 Exp. Cluster 1 Level 2,1 Level 2,2 Level 3,1 Level 3,2 Level 1,2

G Cluster E Cluster   … CPD Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models 0.8 P(Level) Level P(Level) Level -0.7

Level Gene Exp. cluster Experiment Gene Cluster Adding Heterogeneous Data Expression Lipid Endoplasmatic u Annotations HSF GCN4 u Binding sites Exp. type u Experimental details

Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type + Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Resulting Bayesian Network Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 Lipid 1 HSF 1 Endoplasmatic 1 GCN4 1 Gene Cluster 2 Lipid 2 HSF 2 Endoplasmatic 2 GCN4 2 Gene Cluster 3 Lipid 3 HSF 3 Endoplasmatic 3 GCN4 3 Exp. type 1 Exp. cluster 2 Exp. type 2 Exp. cluster 1 Level 2,1 Level 1,1 Level 3,1

Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Problem: Exponential Blowup GC LP END HSF EC TYP   1 No No No No No No … 6 parents 2 6 cases k parents 2 k cases!

Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment UV = NoUV = Yes Repair = Yes Repair = No Ultra Violet Light DNA DamageDNA repair genes transcribed

Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment UV = NoUV = Yes 0 0 Ultra Violet Light DNA repair genes transcribed DNA Damage

Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment UV = Yes truefalse Repair = Yes true false Ultra Violet Light DNA repair genes transcribed DNA Damage

Modeling Context Specificity Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Grouping = a leaf in the tree Exp. Cluster = 2 HSF= Yes true false true Lipid = Yes false GCN4 = Yes true... false GCN4 = Yes -3 P(Level) Level... truefalsetruefalse 2 P(Level) Level 3 P(Level) Level 0 P(Level) Level

How do I learn these models?

LEARNERLEARNER Learning the Models Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Exp. Cluster = 2 HSF= Yes Lipid = Yes GCN4 = Yes... GCN4 = Yes... G C E C   …… Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type

Automatic Induction u Structure Learning:  Dependency structure  Tree structure u Missing Data:  Gene cluster & experiment cluster never observed u Bayesian score u Heuristic search u Expectation Maximization (EM) Learning Algorithm

Learning Process Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression

Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Experiment Similarity Exp. Cluster = 2 Level

Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Gene Similarity Exp. Cluster = 2 Level Gene Cluster = Yes

Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Separability by binding site Exp. Cluster = 2 Level HSF= Yes... Gene Cluster = Yes

Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Attribute dependencies: induce cluster changes Exp. Cluster = 2 Level HSF HSF= Yes... Gene Cluster = Yes

Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Exp. Cluster = 2 Level HSF HSF= Yes GCN4 = Yes... Achieved desired clustering Gene Cluster = Yes...

Yeast Stress Data (Gasch et al 2001) u Measured response to stress cond. u 92 arrays u We selected ~900 genes u Added data: TRANSFAC, MIPS Results: u 15 significant TFs u 7 significant function categories u 793 Groupings

Context Specific Groupings u Metabolism of amino acids u Transporter genes u Down in nitrogen depletion

Context Specific Groupings u Metabolism of nitrogen u Transporter genes u Up in Starvation, Nitrogen depletion & DTT

Example Biological Finding u Discovered grouping of 17 genes  All induced in diauxic shift  All have  2 binding sites for MIG1 transcription factor  Many not known to be regulated by MIG1 u Context-sensitive groupings were key to finding cluster

Compendium Data (Hughes et al 2000) u 300 samples of yeast deletion mutants Expression Level Gene ACluste r GCluster Lipid Lipid (of mutated gene) GCluster (of mutated gene) HSF Endoplasmatic GCN4 Array/Mutated Gene

Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Gene 1 Gene 2 Gene 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene 4 Gene Cluster 3 Resulting Bayesian Network

Experimental Setup Array. cluster u Example: predicting the effect of mutating gene 4 Gene 4 mutant ? ? u Available information:  Attributes of gene 4 Lipid 4 Gene Cluster 4 HSF 4  Gene Cluster of gene 4 as a gene u Goal: predict the effect of mutating specific genes without performing the experiment (!)

Experimental Setup ? Lipid 4 Array. cluster ? Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene Cluster 3 Gene 4 mutant

Results Training set: 180 mutants Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type Test set: 20 mutants u 44 arrays predicted at 99% confidence and 95% accuracy u Relational model is key to prediction PRMs Accuracy (%) 95% accuracy

Conclusions u Presented a unified probabilistic framework:  Models complex biological domains  Expressive data organization  Incorporates heterogeneous data u Future directions:  Incorporate DNA and protein sequence data  Discover regulatory networks u Paper: u Software (soon): u Contact: Thank You!