Download presentation
Presentation is loading. Please wait.
Published byEzra Potter Modified over 9 years ago
1
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne Koller (Stanford)
2
Our Goals u Find patterns in gene expression data
3
Experiments Genes Data Organization Induced Repressed i j A ij - mRNA level of gene i in experiment j
4
Experiments Genes Standard Clustering Organization
5
Bi-Clustering Organization Experiments Genes Undetected Similarity
6
Note: rows and columns no longer correspond to genes and experiments Desired Organization Detect similarities over subsets of genes and experiments
7
Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Incorporate Heterogeneous Data u Find correlations directly u Focus on novel discoveries
8
Clinical information Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Our Approach Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type LEARNERLEARNER hypotheses
9
Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models (Koller & Pfeffer 98; Friedman,Getoor,Koller & Pfeffer 99)
10
Level Gene Exp. cluster Experiment Gene Cluster Expression + Resulting Bayesian Network Gene Cluster 1 Level 1,1 Gene Cluster 2 Gene Cluster 3 Exp. Cluster 2 Exp. Cluster 1 Level 2,1 Level 2,2 Level 3,1 Level 3,2 Level 1,2
11
G Cluster E Cluster 1 1 0.8 1.2 1 2 -0.7 0.6 … CPD Level Gene Exp. cluster Experiment Gene Cluster Expression Probabilistic Relational Models 0.8 P(Level) Level P(Level) Level -0.7
12
Level Gene Exp. cluster Experiment Gene Cluster Adding Heterogeneous Data Expression Lipid Endoplasmatic u Annotations HSF GCN4 u Binding sites Exp. type u Experimental details
13
Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type + Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Resulting Bayesian Network Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 Lipid 1 HSF 1 Endoplasmatic 1 GCN4 1 Gene Cluster 2 Lipid 2 HSF 2 Endoplasmatic 2 GCN4 2 Gene Cluster 3 Lipid 3 HSF 3 Endoplasmatic 3 GCN4 3 Exp. type 1 Exp. cluster 2 Exp. type 2 Exp. cluster 1 Level 2,1 Level 1,1 Level 3,1
14
Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Problem: Exponential Blowup GC LP END HSF EC TYP 1 No No No 1 1 0.8 1.2 1 No No No 1 2 0.7 0.6 … 6 parents 2 6 cases k parents 2 k cases!
15
Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 0 UV = NoUV = Yes Repair = Yes Repair = No Ultra Violet Light DNA DamageDNA repair genes transcribed
16
Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 00 0 UV = NoUV = Yes 0 0 Ultra Violet Light DNA repair genes transcribed DNA Damage
17
Solution: Context Specificity Level DNA repair UV Light Gene Expression Experiment 0 0 0 0 UV = Yes truefalse Repair = Yes true false Ultra Violet Light DNA repair genes transcribed DNA Damage
18
Modeling Context Specificity Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. type Expression Grouping = a leaf in the tree Exp. Cluster = 2 HSF= Yes true false true Lipid = Yes false GCN4 = Yes true... false GCN4 = Yes -3 P(Level) Level... truefalsetruefalse 2 P(Level) Level 3 P(Level) Level 0 P(Level) Level
19
How do I learn these models?
20
LEARNERLEARNER Learning the Models Experimental Details Annotations (GO, MIPS, YPD) ACGCCTAACGCCTA Exp. Cluster = 2 HSF= Yes Lipid = Yes GCN4 = Yes... GCN4 = Yes... G C E C …… 1 1 0.8 1.2 1 2 -0.7 0.6 2 1 0.8 1.2 2 2 -0.7 0.6 Level Gene Expression Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Experiment Exp. type
21
Automatic Induction u Structure Learning: Dependency structure Tree structure u Missing Data: Gene cluster & experiment cluster never observed u Bayesian score u Heuristic search u Expectation Maximization (EM) Learning Algorithm
22
Learning Process Level Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression
23
Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Experiment Similarity Exp. Cluster = 2 Level
24
Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Gene Similarity Exp. Cluster = 2 Level Gene Cluster = Yes
25
Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid HSF Endoplasmatic GCN4Exp. type Expression Separability by binding site Exp. Cluster = 2 Level HSF= Yes... Gene Cluster = Yes
26
Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Attribute dependencies: induce cluster changes Exp. Cluster = 2 Level HSF HSF= Yes... Gene Cluster = Yes
27
Learning Process Gene Exp. cluster Experiment Gene Cluster Lipid Endoplasmatic GCN4Exp. type Expression Exp. Cluster = 2 Level HSF HSF= Yes GCN4 = Yes... Achieved desired clustering Gene Cluster = Yes...
28
Yeast Stress Data (Gasch et al 2001) u Measured response to stress cond. u 92 arrays u We selected ~900 genes u Added data: TRANSFAC, MIPS Results: u 15 significant TFs u 7 significant function categories u 793 Groupings
29
Context Specific Groupings u Metabolism of amino acids u Transporter genes u Down in nitrogen depletion
30
Context Specific Groupings u Metabolism of nitrogen u Transporter genes u Up in Starvation, Nitrogen depletion & DTT
31
Example Biological Finding u Discovered grouping of 17 genes All induced in diauxic shift All have 2 binding sites for MIG1 transcription factor Many not known to be regulated by MIG1 u Context-sensitive groupings were key to finding cluster
32
Compendium Data (Hughes et al 2000) u 300 samples of yeast deletion mutants Expression Level Gene ACluste r GCluster Lipid Lipid (of mutated gene) GCluster (of mutated gene) HSF Endoplasmatic GCN4 Array/Mutated Gene
33
Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Gene 1 Gene 2 Gene 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene 4 Gene Cluster 3 Resulting Bayesian Network
34
Experimental Setup Array. cluster u Example: predicting the effect of mutating gene 4 Gene 4 mutant ? ? u Available information: Attributes of gene 4 Lipid 4 Gene Cluster 4 HSF 4 Gene Cluster of gene 4 as a gene u Goal: predict the effect of mutating specific genes without performing the experiment (!)
35
Experimental Setup ? Lipid 4 Array. cluster ? Level 2,2 Level 3,2 Level 1,2 Gene Cluster 1 HSF 1 Gene Cluster 2 HSF 2 HSF 3 Lipid 1 Lipid 3 Level 1,1 Level 3,1 Gene 1 mutantGene 3 mutant Array. cluster 1 Array. cluster 3 Level 3,2 Gene Cluster 4 HSF 4 Level 3,1 Level 2,1 Gene Cluster 3 Gene 4 mutant
36
Results Training set: 180 mutants Level Gene Cluster Lipid HSF Endoplasmatic GCN4 Exp. cluster Exp. type Test set: 20 mutants u 44 arrays predicted at 99% confidence and 95% accuracy u Relational model is key to prediction 0 10 20 30 40 50 60 70 80 90 100 PRMs Accuracy (%) 95% accuracy
37
Conclusions u Presented a unified probabilistic framework: Models complex biological domains Expressive data organization Incorporates heterogeneous data u Future directions: Incorporate DNA and protein sequence data Discover regulatory networks u Paper: http://www.cs.stanford.edu/~eran u Software (soon): http://dags.stanford.edu/bio u Contact: eran@cs.stanford.edu Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.