Download presentation
Presentation is loading. Please wait.
Published byAntonia Lucas Modified over 9 years ago
1
Discovering Regulatory Networks from Gene Expression and Promoter Sequence Eran Segal Stanford University
2
From Parts to Systems Parts ModulesInteractions Activity
3
Gene Regulation DNA Gene 2 Gene 1 RNA Protein DNARNA is a tightly regulated process
4
Gene Regulation DNA Gene 2 Gene 1 RNA Coding Control Coding Control Swi5 Regulator (transcription factor) Swi5 ACGTGC Regulator Motif
5
Genome-wide Available Data Gene 2 Gene 1 Coding Control Coding Control DNA Sequence Gene Expression mRNA level of all genes Measured in different conditions RNA DNA Microarray ……ACTAGCGGCTATAATGACTGGACCTACGTACCGATATAATGTCAGCTAGCA……
6
Gene Regulation Gene 2 Gene 1 Coding Control Coding Control ACGTGC Motif Many diagnostic, prognostic and therapeutic implications Regulator Swi5 How are genes regulated? Who regulates whom? How are genes regulated? Who regulates whom? Under which conditions? How are genes regulated? Who regulates whom? Under which conditions? Which genes are co-regulated?
7
Example: Finding Motifs Cluster gene expression profiles Search for motifs in control regions of clustered genes clustering AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Control regions Gene I Gene II Gene III Gene IV Gene V Gene VI GACTGC AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Experiments Genes Procedural Apply a different method to each type of data Use output of one method as input to the next Motif
8
Our Approach: Model Based What is a model? A description of the biological process that could have generated the observed data stochastic probabilistic
9
Our Approach: Model Based Statistical modeling language for biological domains Based on Bayesian networks Classes of objects Properties Observed: gene sequence, experiment conditions Hidden: gene module Interactions Expression level as a function of gene and experiment properties Experiment Gene Expression Condition Module Tumor STGFK ’01 (ISMB)
10
Tumor Module Level Probabilistic Model Defines a joint distribution Condition Exper. Gene Expression Tumor 1 Module 1 Level 1,1 Condition 1 Level 1,2 Tumor 2 Condition 2 Module 2 Level 2,1 Level 2,2 Bayesian Network P(Level 2,1 | Module 2,Condition 2,Tumor 2 )
11
Probabilistic Model Defines a joint distribution Learned automatically from data Parameterization Structure Assignment to hidden variables Find model M that maximizes P(M | D) Tumor Module Level Condition Exper. Gene Expression Learn parameterization and structure of distributions Learn network structure Thousands of variables Space of possible networks is super-exponential Probabilistic inference in the Bayesian network Millions of hidden variables Variables are highly dependent NP-Hard Convex optimization Graph theoretic algorithms Dynamic programming Heuristic search Problem-specific structure Modularity in biological systems STGFK ’01 (ISMB)
12
Analyze results Visualization Literature Statistics Learn model Automatically from data Structure Parameterization Model design Classes of objects Properties Interactions Scheme Model designLearn model Biological problem Data Analyze results Derive biological insights from model STGFK ’01 (ISMB)
13
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC
14
Ongoing Biological Debate Can we discover actual regulators from gene expression data alone?
15
ActivatorRepressor Regulated gene ActivatorRepressor Regulated gene Activator Regulated gene Repressor State 1 Activator State 2 Activator Repressor State 3 Gene Regulation: Simple Example Regulated gene DNA Microarray Regulators DNA Microarray Regulators
16
truefalse true false Regulation Tree Activator? Repressor? State 1State 2State 3 true Regulation program Module genes Activator expression Repressor expression SSRPBKF ’03 (Nature Genetics) Genes in the same module share the same regulation program
17
Module Networks Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators Modules HAP4 CMK1 true false true false SSRPBKF ’03 (Nature Genetics)
18
Expression level in each module is a function of expression of regulators Module Network Probabilistic Model Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level What module does gene “g” belong to? Expression level of Regulator 1 in experiment BMH1 GIC2 0 0 0 2 1 Module P(Level | Module, Regulators) HAP4 CMK1 0 0 0 SSRPBKF ’03 (Nature Genetics)
19
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC
20
Learning Problem Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level HAP4 CMK1 0 0 0 Find gene module assignments and tree structures that maximize P(M|D) Goal: Gene module assignments Tree structures Hard Genes: 5000-10000 Regulators: ~500 SSRPBKF ’03 (Nature Genetics)
21
Learning Algorithm Overview Relearn gene assignments to modules clustering Gene module assignment Regulatory modules Learn regulation programs HAP4 CMK1 SSRPBKF ’03 (Nature Genetics)
22
Learning Regulation Programs Experiments Module genes Experiments sorted in original order Experiments sorted by Hap4 expression log P(M|D) log P(D| , ) + log P( , ) HAP4 log P(M|D) log P(D HAP4 | HAP4 , HAP4 ) + log P(D HAP4 | HAP4 , HAP4 ) + log P( HAP4 , HAP4 , HAP4 , HAP4 ) SIP4 log P(M|D) log P(D SIP4 | SIP4 , SIP4 ) + log P(D SIP4 | SIP4 , SIP4 ) + log P( SIP4 , SIP4 , SIP4 , SIP4 ) log P(M|D) log P(D HAP4 | HAP4 , HAP4 ) + log P(D CMK1 | CMK1 , CMK1 ) + log P(D CMK1 | CMK1 , CMK1 ) + … HAP4 CMK1 Module genes Hap4 expression Regulator
23
Learning Algorithm Performance -131 -130 -129 -128 05101520 Bayesian score (avg. per gene) Algorithm iterations 0 10 20 30 40 50 05101520 Algorithm iterations Gene module assignment changes (% from total) Significant improvements across learning iterations Many genes (50%) change module assignment in learning SPRKF ’03 (UAI)
24
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC
25
Yeast Stress Data Genes Selected 2355 that showed activity Experiments (173) Diverse environmental stress conditions: heat shock, nitrogen depletion,…
26
Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Expression level of each gene is a function of expression of regulators Fragment of learned Bayesian network 2355 variables (genes) 173 instances (experiments)
27
Comparison to Bayesian Networks Problems Robustness Interpretability Cmk1 Hap4 Mig1 Ste12 Bayesian Network Friedman et al ’00 Hartemink et al. ’01 Yap1 Gic1 Module Network SPRKF ’03 (UAI) Solutions Robustness sharing parameters Interpretability module-level model Regulator 1 Regulator 2 Regulator 3 Level Module
28
Comparison to Bayesian Networks Problems Robustness Interpretability Solutions Robustness sharing parameters Interpretability module-level model Test Data Log-Likelihood (gain per instance) Number of modules Bayesian Network performance -150 -100 -50 0 50 100 150 0100200300400500 SPRKF ’03 (UAI) Learn which parameters are shared (by learning which genes are in the same module)
29
Module From Model to Regulatory Modules Regulator 1 Regulator 2 Regulator 3 Level HAP4 CMK1 Biologically relevant? HAP4 CMK1 0 0 0 SSRPBKF ’03 (Nature Genetics)
30
Respiration Module Regulation program Module genes Energy production (oxid. phos. 26/55 P<10 -30 ) Hap4+Msn4 known to regulate module genes Module genes functionally coherent? Module genes known targets of predicted regulators? SSRPBKF ’03 (Nature Genetics) Predicted regulator
31
Energy, Osomlarity, & cAMP Signaling Regulation by non-TFs (Tpk1 – cAMP-dependent protein kinase) Module genes known targets of predicted regulators? Regulation program Module genes
32
Biological Evaluation Summary Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? 46/50 30/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) Known targets = direct biological experiments reported in the literature SSRPBKF ’03 (Nature Genetics)
33
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC
34
From Model to Detailed Predictions Prediction: Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment HAP4 Ypl230w X ? SSRPBKF ’03 (Nature Genetics)
35
Does ‘X’ Regulate Predicted Genes? Experiment: knock out Ypl230w (stationary phase) 1334 regulated genes (312 expected by chance) wild-typemutant >4x Regulated genes Rank modules by regulated genes Predicted modules ModuleSig. Protein foldingP<0.0001 Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 ModuleSig. Protein foldingP<0.0001 Cell diferentiationP<0.02 Glycolysis and foldingP<0.04 Mitochondrial and protein fateP<0.04 Modules predicted to be regulated by Ypl230w Ypl230w regulates computationally predicted genes SSRPBKF ’03 (Nature Genetics)
36
Regulated genes (1014) Ppt1 knockout (hypo-osmotic stress) wild-typemutant Regulated genes (1034) wild-typemutant Kin82 knockout (heat shock) ModuleSig. Energy and osmotic stressP<0.0001 Energy, osmolarity & cAMP signalingP<0.006 mRNA, rRNA and tRNA processingP<0.02 ModuleSig. Ribosomal and phosphate metabolismP<0.009 Amino acid and purine metabolismP<0.01 mRNA, rRNA and tRNA processingP<0.02 Protein foldingP<0.02 Cell cycleP<0.02 Does ‘X’ Regulate Predicted Genes? SSRPBKF ’03 (Nature Genetics)
37
Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes New yeast biology suggested Ypl230w activates protein-folding, cell wall and ATP-binding genes Ppt1 represses phosphate metabolism and rRNA processing Kin82 activates energy and osmotic stress genes SSRPBKF ’03 (Nature Genetics)
38
Ongoing Biological Debate Can we discover actual regulators from gene expression data alone? Many regulatory relationships can be induced from gene expression data SSRPBKF ’03 (Nature Genetics)
39
Undetected regulatorsDetected regulatorsDetected target Assumption: Regulators are transcriptionally regulated Feedforward, auto-regulatory “motifs” (Shen-Orr et al. 2002) TFs and SMs have detectable expression signature Phd1 (TF) Hap4 (TF) Cox4Cox6Atp17 Regulator chain (Respiration) Yap6 (TF) Vid24Tor1Gut2 Auto regulation (Snf kinase regulated processes) Sip2 (SM) Msn4 (TF) Vid24Tor1Gut2 Positive signaling loop (Sporulation & cAMP) Why Does it Work? Statistical methods can infer their regulatory relationships from gene expression data SSRPBKF ’03 (Nature Genetics)
40
Outline Who regulates whom and when? How are genes regulated? Model Evaluation Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC Reg. ACGTGC Motif
41
GATAG Motif Activator Repressor From Sequence to Expression ?? ACGTGCGATAG Gene 2Gene 3Gene 1 ? Activator Repressor ACGTGC GATAG + No motifs DNA Microarray DNA control sequence
42
From Sequence to Expression ACGTGC GATAG + No motifs SequenceExpression Goal: Explain how expression arises from sequence Construct mechanistic model of gene regulation Learn the model from sequence and expression data
43
Cluster gene expression profiles Search for motifs in control regions of clustered genes clustering AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Control regions Gene I Gene II Gene III Gene IV Gene V Gene VI GACTGC AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Experiments Genes Procedural Apply a different method to each type of data Use output of one method as input to the next Motif Two Phase Approach (I)
44
Expression clustering is not perfect Cluster II Cluster I Clustering B Shared Motif Clustering A Cluster II Cluster I Shared Motif Two Phase Approach: Problems
45
Iterate over all sequences of length k Find all genes that have each k-mer in their promoter Keep k-mers whose genes are coherent in expression GATACC ACGACT AAATGC TCGACT CGCTGA ACGAGA TTCGCA CGATGG AAATTA TCGACT GATACC Two Phase Approach (II)
46
Single motifs may not have coherent expression Activator: Repressor: TCGACTGC GATAC TCGACTGC GATAC TCGACTGC + GATAC TCGACTGC + GATAC TCGACTGC + Two Phase Approach: Problems
47
Are we missing motifs? TCGACTGC CCAAT + OR ? Two Phase Approach: Problems
48
A single motif cannot explain variation in expression Activator: Repressor: TCGACTGC GATAC + TCGACTGC GATAC Two Phase Approach: Problems
49
ACGATGCTAGTGTAGCTGATGCTGATCGATCGTACGTGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCAG CTAGCTCGACTGCTTTGTGGGGCCTTGTGTGCTCAAACACACACAACACCAAATGTGCTTTGTGGTACT GATGATCGTAGTAACCACTGTCGATGATGCTGTGGGGGGTATCGATGCATACCACCCCCCGCTCGATCG ATCGTAGCTAGCTAGCTGACTGATCAAAAACACCATACGCCCCCCGTCGCTGCTCGTAGCATGCTAGCT AGCTGATCGATCAGCTACGATCGACTGATCGTAGCTAGCTACTTTTTTTTTTTTGCTAGCACCCAACTGA CTGATCGTAGTCAGTACGTACGATCGTGACTGATCGCTCGTCGTCGATGCATCGTACGTAGCTACGTAG CATGCTAGCTGCTCGCAAAAAAAAAACGTCGTCGATCGTAGCTGCTCGCCCCCCCCCCCCGACTGATC GTAGCTAGCTGATCGATCGATCGATCGTAGCTGAATTATATATATATATATACGGCG Sequence TCGACTGC GATAC CCAAT TCGACTGC CCAAT GCAGTT TCGACTGCCCAATGATACGCAGTT Motifs TCGACTGC GATAC + CCAAT + GCAGTT CCAAT Motif Profiles Expression Profiles Unified Model of Gene Regulation SYK ’03 (ISMB) Genes
50
Sequence Motifs TCGACTGC GATAC + CCAAT + GCAGTT CCAAT Motif Profiles Expression Profiles cis-regulatory modules Unified Model of Gene Regulation
51
Modules Experiments Expression of module genes DNA control sequences of module genes TCGACTGCGATAC +Motif Profile: Regulatory Module SYK ’03 (ISMB)
52
Sequence Motifs Motif Profiles Expression Profiles Unified model of gene regulation using sequence and expression Model trained as a whole Motif profiles are predictive of expression Expression clusters share motif profiles Motifs added to make profiles predictive Model learned without prior knowledge Input I: sequence data Input II: expression data Our Approach SYK ’03 (ISMB)
53
Expression clustering is not perfect A single motif cannot explain variation in expression Are we missing motifs? Unified model for expression and motifs Use combinatorial motif profiles Dynamically add motifs to explain expression Problems and Solutions SYK ’03 (ISMB)
54
Probabilistic Model Experiment Gene Expression Sequence S4S4 S1S1 S2S2 S3S3 R2R2 R1R1 R3R3 Motifs Motif Profiles Expression Profiles P(R 2 |S) = Is motif i “active” in gene g? Position Specific Scoring Matrix (PSSM) SYK ’03 (ISMB)
55
Experiment Expression Probabilistic Model Gene Sequence S4S4 S1S1 S2S2 S3S3 R1R1 R2R2 R3R3 Module Sequence Motifs Motif Profiles Expression Profiles 1 2 3 Module R1R1 R2R2 R3R3 P(Module | R)= softmax Motif profile 1: R 1 R 2 SYK ’03 (ISMB)
56
Probabilistic Model Experiment Gene Expression Module Sequence S4S4 S1S1 S2S2 S3S3 R1R1 R2R2 R3R3 ID Level Sequence Motifs Motif Profiles Expression Profiles Every module has a unique expression profile 1 Module ID 123 000 P(Level | Module, ID) 2 0 00 SYK ’03 (ISMB)
57
Probabilistic Model Experiment Gene Expression Module Sequence S4S4 S1S1 S2S2 S3S3 R1R1 R2R2 R3R3 ID Level Sequence Motifs Motif Profiles Expression Profiles genes Motif profile Expression profile Regulatory Modules SYK ’03 (ISMB)
58
Learning Problem Experiment Gene Expression Module Sequence S4S4 S1S1 S2S2 S3S3 R1R1 R2R2 R3R3 ID Level Sequence Motifs Motif Profiles Expression Profiles Genes: 5000-10000 Variables per gene Sequence: 1000 Expression: 200-500 Motifs: 50-100 (hidden) Module: 1 (hidden) Learn Module assignments “Active” motifs per gene Motif profiles That maximize P(M|D) Hard SYK ’03 (ISMB)
59
add/delete motifs X clustering Gene partition motif search Motif set E-step Regulatory modules M-step Learning Algorithm Overview
60
Motif set Add all sequences of length k as motifs ACGTAGT TGATGCA ACGTGC GCTGGT TTTTAC X Overfitting Use the expression data to guide the search for new motifs Learning the Set of Active Motifs
61
Examine all regulatory modules Compare genes with motif profile to module genes Add motif initialized to common motif in missed genes Motif profile Expression profile Regulatory Module 1 Motif profile Expression profile Regulatory Module 2 All genes match motif profile Many genes do not match motif profile Add motif CCAAT Dynamically Adding Motifs
62
Outline Who regulates whom and when? How are genes regulated? Model Evaluation Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC Reg. ACGTGC
63
Application of Method to Data 4 Expression datasets 500bp upstream seq. YeastHuman 4 Expression datasets 1000bp upstream seq. 77 motif profiles 65 motifs 25 known (out of 37) Method found many known motifs in yeast 62 motif profiles 80 motifs 10 known TRANSFAC (37 known motifs) SYK ’03 (ISMB)
64
Yeast Human 2510 124 Our method Standard approach Comparison to Standard Approach (Recovery of known motifs) Our method found many more known motifs from the literature 2510 124 SYK ’03 (ISMB)
65
Caspase 3 Cyclin A2 Cyclin F CDC 2 Centromere A Centromere E kinesin family karyopherin alpha 2 polo-like kinase RGS3 Serine kinase 6 topoisomerase II TTK protein kinase aurora kinase B Kinase family 23 extra spindle pole 1 ARHGAP11A HEC Ubiquitin-conjugating CDC8 DKFZp762E1312 NALP2 C20orf129 DDA3 UBF-fl Cell Division Module in Human DNA control sequence of module genes Expression of module genes NFAT motif Novel motif Module genes functionally coherent? Module genes known to be regulated by predicted motifs? Module genes involved in mitosis (10/25 P<10 -9 ) NFAT regulates cytokine (cell division) genes SYK ’03 (ISMB)
66
Biological Evaluation Summary Are the module genes functionally coherent? Yeast: module genes functionally coherent? 40/62 65/77 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) SYK ’03 (ISMB)
67
Evaluating Human Motifs Hide sequence of gene i Learn motif model for module Assign gene i to module if gene is in module with Prob. 0.5 Gene 1: TTGACTGCACTCGGCAATTACTATACT Gene 2: AGCACTGCACTGCACTCGACTATACTA Gene 3: TTTTACTATCTCACGATGCACTCGGCC Gene 4: ACACTTACTATACCCTTGCACTCGTAG DNA control sequences Gene 5: Gene 6: Gene 7: Gene 8: TAGGCCAACCCGGTGGCTTACTATACT ACAAACGTGAGTTTTCATCGAGTTCTT ACGTGCACTCGAATATAGTCTTGATTT CTGATCGTAGCGGGTAGCTCGCGAGG Module genes Non- module genes Signal or overfitting? Gene 1: TTGACTGCACTCGGCAATTACTATACT TTTTACTATCTCACGATGCACTCGGCC ACACTTACTATACCCTTGCACTCGTAG P<0.5 (false positive) P 0.5 (true positive) Classification margin = True positives (%) – False positives (%) Repeat for all genes SS ’04 (RECOMB) TGCACTCG Motifs: TTACTAT
68
Tumor antigen Transcription co-repressor Protein phosphatase Chemokine receptor Nuclear lamina G-protein signaling ATpase activity Regulation of cdk Two-component signal transduction CAMP dependant protein kinase Manganese ion binding Protein folding Carbohydrate binding Regulation of cdk Chemokine receptor binding Translation initiation Mitochondrial membrane Protein phosphatase Protein folding Trypsin activity Lysosome Secretory vesicle Serine protease inhibitor Protein kinase ck2 26s proteasome Pathogenesis Epidermal differentiation Antimicrobial peptide activity Tyrosine kinase signaling pathway Kinase regulator Pregnancy Taxis Protein phosphatase regulator Sugar binding Mitochondrial membrane Interleukin binding Ubiquitin cycle Cytokinesis Epidermal differentiation Regulation of t-cell proliferation Embryogenesis and morphogenesis Nucleolus Nucleotide biosynthesis Antimicrobial peptide Thermoregulation Oxidoreductase on paired donors Muscle contraction Transcription co-repressor Protein phosphatase Metal ion transport Cytosolic calcium ion concentration GTPase regulator Transcription factor complex protein-nucleus import Ligase activity Energy derivation by oxidation Extracellular ligand-gated ion channel Translation release factor G-protein signaling Serine protease inhibitor Energy taxis GTPase mediated signal transduction ATP dependent helicase activity Transcription from pol I promoter Nucleosome disassembly tRNA metabolism Sphingolipid metabolism NADH dehydrogenase activity Xenobiotic metabolism Small monomeric gtpase Nucleosome assembly Monooxygenase activity RNA dependent ATPase activity Steroid metabolism Uptake permease activity Transcription from pol II promoter Xenobiotic metabolism RNA splicing DNA-dependent ATPase activity DNA recombination Small ribosomal subunit Classification margin Modules 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Best classification margin from 100 random modules HSF is known to regulate protein folding Motif: HSF Genes: Protein folding Motif: GATA Genes: Mitochondrial GATA is known to activate mitochondrial membrane genes Evaluating Human Motifs MINI19 ETS1 BRACH NFX6 GATA1 XFD3 XBP1 E2F MAF GNCF1, GATA1 PAX1 ELK1 RORA2 GFI1 HOGNESS SRF BARBIE STAT5A RORA2 E2F HNF1 ZF5 TAACC ARNT NFKAPPAB RORA2 NFMUE1 HOX13 TAXCREB OCT1 ARNT MEF2 PAX1 ARNTOCT1 R_01MUSCLE_INI AREB6 OCT1 NFKAPPAB HSF ERG1 GATA1 HNF1 GIF1 NFY, ACAAT MYCMAX Modules SS ’04 (RECOMB)
69
Compendium of human cis-regulatory modules Module genes are functionally coherent Module genes similarly expressed in external datasets Learned motifs characterize module genes Biological Evaluation Summary
70
Incorporating Protein-DNA binding Protein-DNA Binding Identifies all the genes that are bound by a regulator Noisy assay Gene 2 Gene 1 Coding Control Coding Control Reg.
71
Incorporating Protein-DNA binding Experiment Gene Expression Module Sequence S4S4 S1S1 S2S2 S3S3 R1R1 R2R2 ID Level SBSFK ’02 (RECOMB) Does regulator 3 bind to gene g? Protein-DNA data for regulator i is a noisy sensor for regulation by motif i Is the motif recognized by regulator 3 “active” in gene g? R3R3 P1P1 P2P2 P3P3
72
Outline Who regulates whom and when? How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Reg. ACGTGC Reg. ACGTGC
73
Model Assumption Experiment Gene Expression Module Regulator 1 Regulator 2 Regulator 3 Level Every gene belongs to exactly one module Assumption: X X X
74
Multi-Functional Genes Model Gene 2 Every gene can belong to multiple modules Module 1 Gene 1 Module 2 Gene 3 Gene 2 The expression of a gene is the sum of its expression in each module it participates Gene 2 expression: +=
75
Multi-Functional Genes Model Gene Expression M3M3 M2M2 A3A3 A2A2 Experiment Is gene “g” part of module i? M1M1 Activity level of module i in experiment A1A1 Expression is a sum of activity level of all modules Level g,e ~N( g.M i e.A i, ) Level SBK ’03 (PSB)
76
Connection to SVD Singular Value Decomposition Experiments Genes Modules Experiments =xx E=M A T Golub et al. ’96 Alter et al. ’00 Level g,e = i g.M i e.A i Level g,e ~N( g.M i e.A i,σ) Gene Expression M3M3 M2M2 M1M1 A3A3 A2A2 A1A1 Level Experiment SBK ’03 (PSB) Learning problem Module assignments Module activity levels Difference to our model: Discrete module assignments Hard
77
A 11 A 12 A 13 Hidden M 12 M 11 M 13 Hidden Hard M 12 Level 11 A 11 Level 12 Level 21 Level 22 Bayesian Network A 12 A 13 M 11 M 13 A 21 A 22 A 23 M 12 M 11 M 13 (3 Modules, 2 genes, 2 experiments) Learning Assignments and Activities Every pair of hidden vars. are dependent Standard approximations Loopy belief propagation Variational methods Genes: 5000-10000 Experiments: ~200 Modules: 50-100 1,000,000 dependent hidden variables At best, local maximum of approximate energy function SBK ’03 (PSB)
78
A 11 A 12 A 13 Observed M 12 M 11 M 13 Hidden Easy GO A 11 A 12 A 13 Hidden M 12 M 11 M 13 Observed Easy GO Level 11 Level 12 Level 21 Level 22 Bayesian Network M 12 M 11 M 13 M 12 M 11 M 13 A 11 A 12 A 13 A 21 A 22 A 23 (3 Modules, 2 genes, 2 experiments) Learning Assignments and Activities Optimize activities given assignments Optimize assignments given activities M 12 M 11 M 13 Initialize Standard approximations converge (at best) to local maximum of approximate energy function Our algorithm converges to strong local maximum SBK ’03 (PSB) A 11 A 12 A 13 Hidden M 12 M 11 M 13 Hidden Hard
79
A 11 A 12 A 13 Hidden M 12 M 11 M 13 Observed Easy GO Level 11 Level 12 Level 21 Level 22 Bayesian Network M 12 M 11 M 13 M 12 M 11 M 13 A 11 A 12 A 13 A 21 A 22 A 23 (3 Modules, 2 genes, 2 experiments) Learning Module Activity Levels A ij variables are continuous Standard least squares problem Optimization problem: SBK ’03 (PSB)
80
A 11 A 12 A 13 Observed M 12 M 11 M 13 Hidden Level 11 Level 12 Level 21 Level 22 Bayesian Network M 12 M 11 M 13 M 12 M 11 M 13 A 11 A 12 A 13 A 21 A 22 A 23 (3 Modules, 2 genes, 2 experiments) Learning Module Assignments M ij variables are discrete For each gene, combinatorial search in time 2 m Optimization problem:
81
A 11 A 12 A 13 Observed M 12 M 11 M 13 Hidden Level 11 Level 12 Level 21 Level 22 Bayesian Network M 12 M 11 M 13 M 12 M 11 M 13 (3 Modules, 2 genes, 2 experiments) Learning Module Assignments Optimize for continuous M ij For each gene i, select k largest variables from {M i1,…,M im } Combinatorial search in time 2 k Optimization problem:
82
Comparison to Plaid (Lazzeroni and Owen ’02) 0 5 10 15 20 05101520 -Log (P-value) Compare P-value of enrichment for functional annotations (GO) (P-value of annotation enrichment = best hypergeometric p-value in any module) Plaid Our method 122 of 137 annotations more significant in our model SBK ’03 (PSB)
83
Comparison to Standard Clustering Compare P-value of enrichment for functional annotations (GO) (P-value of annotation enrichment = best hypergeometric p-value in any module) 0 5 10 15 20 05101520 -Log (P-value) Hierarchical clustering Our method 120 of 137 annotations more significant in our model SBK ’03 (PSB)
84
Adding the Regulation Model Experiment Gene Expression Regulator 1 Regulator 2 Regulator 3 M3M3 M2M2 M1M1 A3A3 A2A2 A1A1 Level Activity level of module i in array HAP4 CMK1 0 0 0 BSK ’04 (RECOMB) Gene Expression M3M3 M2M2 M1M1 Level A3A3 Experiment A2A2 A1A1
85
Outline Who regulates whom and when? How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Robust prediction of gene function Identifying conserved modules Reg. ACGTGC Reg. ACGTGC
86
Single Species Gene Expression Co-expression is not always functionally relevant Noise in DNA microarray technology Biological sloppiness Use evolution as a filter
87
Multiple Species Gene Expression Different organisms share many of their genes Can we learn something from observing the expression of the same gene in multiple species? Yeast Orthologs Human ~30% of yeast genes are conserved in human Irrelevant co-expression is uncorrelated in different species Relevant co-expression confers selective advantage Combining expression from multiple species can improve gene function and regulatory module discovery
88
Conserved Co-Expression Network Yeast (643) Worm (949) Fly (155) Human (1202) Connect genes that are co-expressed in at least two organisms 3D visualization of network SSKK ’03 (Science)
89
Ribosome biogenesis Energy generation Cell cycle Secretion Neuronal Proteasome General transcription Ribosomal subunits Signaling Translation initiation and elongation Lipid metabolism Unknown Conserved Co-Expression Network SSKK ’03 (Science)
90
Classification Accuracy (%) 40 Annotations at 50% accuracy 70 Annotations at 30% accuracy 0 10 20 30 40 50 60 70 80 90 100 Gene annotations (Gene Ontology) Predicting Gene Function Predict function using guilt-by-association scheme Protein modification SSKK ’03 (Science)
91
0 10 20 30 40 50 60 70 80 90 Predicting Protein Modification WormFlyHumanYeast 12% 18% 15% 13% 76% Multiple species prediction predictions using single species Significant improvements over any single species network Classification Accuracy (%) (50 most confident predictions) SSKK ’03 (Science)
92
Excess nuclei in mutant Biological Experiment Prediction: Experiment: Consistent with cell proliferation prediction ZK652.1 plays a role in cell proliferation Knock-out ZK652.1 and test mutant SSKK ’03 (Science)
93
Outline Who regulates whom and when? How are genes regulated? Regulation of multi-functional genes Evolution of gene regulation Robust prediction of gene function Identifying conserved modules Reg. ACGTGC Reg. ACGTGC Reg. ACGTGC Reg. ACGTGC Mouse Human
94
Gene Experiment Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 2 Module Experiment Gene Expression Regulator 1 Regulator 2 Regulator 3 Level Organism 1 Module Conserved Gene Regulation Model Compatibility potential (Module,Module) Orthologs are more likely to be in the same module 1 2 3 Module 123 Regulation programs for the same module are more likely to share regulators
95
Human (138)Mouse (42) Conserved Regulation Normal brain (4) Brain tumors Gliomas (57) Medulloblastoma (60) Miscellaneous (17) Brain development (39) Brain tumors Medulloblastoma (3) Goal: Discover regulators in brain that are shared between human and mouse
96
Comparison to Single Species Test Data Log-Likelihood (gain per gene) Human Single species Multiple species Mouse Single species Multiple species Single species By combining expression data from mouse, we can learn a better model of gene regulation in human
97
MouseHuman Neuron Differentiation Module NeuroD1 Brain expressed genes (18/34 P<10 -12 ) Module genes functionally coherent? Module genes known targets of predicted regulators? NeuroD known to regulate module genes
98
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Reg. ACGTGC Reg. ACGTGC Mouse Human Finding conserved regulators SSKK ’03 (Science) Reg. ACGTGC Finding motifs SS ’04 (RECOMB) SBSFK ’02 (RECOMB) SYK ’03 (ISMB) Reg. ACGTGC Finding regulators SSRPBKF ’03 (Nature Gen.) SPRKF ’03 (UAI) BSK ’04 (RECOMB)
99
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Gene regulation Two-sided clustering Learning abstraction hierarchies Discovering molecular pathways Learning with clinical data SOK ’01 (NIPS) SK ’02 (RECOMB) STGFK ’01 (ISMB) SWK ’03 (ISMB) SSKK ’03 (Science)SSRPBKF ’03 (Nature Gen.) SPRKF ’03 (UAI) SS ’04 (RECOMB) SBSFK ’02 (RECOMB) SYK ’03 (ISMB) SBK ’03 (PSB) BSK ’04 (RECOMB)
100
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Unified Approach for Heterogeneous Data Gene expression DNA sequence Protein-DNA binding data Multiple species data Protein-protein interaction data SBSFK ’02 (RECOMB) SWK ’03 (ISMB) SSKK ’03 (Science) SSRPBKF ’03 (Nature Gen.) SYK ’03 (ISMB)SS ’04 (RECOMB) SBK ’03 (PSB)
101
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Unified Approach for Heterogeneous Data Model Automatically Learned from Data Convex optimization Graph theoretic algorithms Exploit modularity in biological system Exploit problem-specific structure Model designLearn model Data Analyze results Dynamic programming Heuristic search
102
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Unified Approach for Heterogeneous Data Model Automatically Learned from Data Model Evaluation Methods Comparison to existing methods Cross validation Enrichment for known biological function Relative to current knowledge in literature
103
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Unified Approach for Heterogeneous Data Model Automatically Learned from Data Model Evaluation Methods Testable Biological Hypotheses Generate novel hypotheses from model Wet-lab validation of predictions SSKK ’03 (Science) SSRPBKF ’03 (Nature Gen.)
104
Summary: Probabilistic Framework Rich Modeling Language for Biological Processes Unified Approach for Heterogeneous Data Model Automatically Learned from Data Model Evaluation Methods Testable Biological Hypotheses Visualization Software
105
The Challenge Ahead Organisms Data types Conditions Developmental Physiological Environmental Clinical Metabolic Experimental Protein expression Tissue specific expression Interaction data Location data … Biological information ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.