Download presentation
Presentation is loading. Please wait.
Published byFranklin Wade Modified over 8 years ago
1
Evaluation of a phylogenetic footprinting method designed to discover regulatory elements in prokaryotes Rekin’s Janky EMBRACE RSMD Workshop November, 10th 2006 Service de Conformation des Macromolécules Biologiques et de Bioinformatique (S.C.M.B.B.) Université Libre de Bruxelles (U.L.B.), Belgium
2
Objectives Can we predict Transcription Factor binding sites from genome sequences ? Can we analyse their evolution across the taxonomy ? Can we predict Transcription Factor binding sites from genome sequences ? Can we analyse their evolution across the taxonomy ?
3
Transcriptional regulation Robison K, McGuire AM, and Church GM (JMB,1998) small_N1qaa.gif www.rtc.riken.co.jp
4
Proteobacteria Actinobacteria Firmicutes Alpha Beta Gamma-proteobacteria Delta Enterobacteriales Cyano + Escherichia Bacterial taxonomic tree Bacteria
5
To apply dyad-analysis* at each level of the fully sequenced prokaryotes taxonomic tree. Taxonomic tree Bacteria Prokaryotes Archae Proteobacteria Firmicutes Alpha Beta Gamma Actinobacteria Cyanobacteria *J.van Helden et al (Discovering regulatory elements in non-coding sequences by analysis of spaced dyads.NAR, 2000) Strategy
6
LexA study case dyad-analysis for orthologous sequences of lexA within Gamma-Proteobacteria
7
Feature-map for orthologous sequences of lexA within Gamma Proteobacteria LexA study case
8
Pattern-assembly for orthologous sequences of lexA within Gamma
9
Bacteria Proteobacteria Firmicutes Alpha Beta Gamma Xanthomonadales Pseudomonadales Vibrionales Alteromonadales Enterobacteriales Pasteurellales Shigella Escherichia Yersinia 5.89 Cyanobacteria Actinobacteria Score = occurrence significance 0.17 8.31 9.16 CTGTNNNNNNNNACAG lexA (Gamma annotated)* CTGNNNNNNNNNNCAG lexA (most significant dyad) * G.C.Walker (Mutagenesis and inducible response to DNA damage in E.coli, Microbiol Rev, 1984) LexA study case A Gamma-proteobacterial motif for lexA
10
Significance map Max.Sig>=8
11
LexA binding motif annotated in different taxonomical groups LexA study case
12
Significance map Gamma motif Beta motif Xanthomonadales motif Alpha motif Actinobacteria motif Firmicutes motif AATCGAACACATGTTCGAACAT ACTGTATATATATACAGTT TTGCGGAACACATANAGAACAGA GTTAGTAATACTACTAAC CACTGTATATATATACAGTG TACGAACATATGTTCGTAT Max.Sig>=8
13
This approach allows to identify : –the optimal taxonomic level for a given motif –multiple motifs –the evolution of a motif across the taxonomy LexA study-case may not be representative of the performance of this approach on all bacterial genes ! Summary from LexA study-case
14
Validation RegulonDB 368 E.coli genes 1012 sites Salgado H et al. RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acid Research 2004
15
e Evaluation with RegulonDB Significance of the discovered patterns
16
Comparing predicted patterns vs annotated sites Correctness of predicted patterns (LexA study case / Gammaproteobacteria)
17
Matching statistics Positive Predictive Value (PPV) fraction of discovered patterns matching at least one annotated site. PPV=MD/(MD+ND) Sensitivity (Sn) fraction of annotated binding sites matched by at least one discovered pattern. Sn=MS/(NS+MS) Geometric Accuracy (Acc.g) geometric average of the sensitivity and PPV. Acc.g= Sn. PPV Correctness of predicted patterns Annotated sites Not annotated Predicted dyads Matching dyads (MD) Matched sites (MS) Non-matching dyads (ND) Not predictedNon-matched sites (NS)True negative (TN) ?
18
Validation Heat-map (color scale) Evaluation by gene and by taxon (RegulonDB)
19
Evaluation by gene and by taxon (RegulonDB) - 1 Bac. Pro. Gam. Ent. Esc. Ref.
20
Evaluation by gene and by taxon (RegulonDB) - 2 Bac. Pro. Gam. Ent. Esc. Ref.
21
Evaluation by gene and by taxon (RegulonDB) - 3 Bac. Pro. Gam. Ent. Esc. Ref.
22
SnPPVAcc.g Sensitivity (Sn), PPV and Geometric Accuracy (Acc.g) Evaluation by taxon (RegulonDB) Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
23
Conclusions We have evaluated a new phylogenetic footprinting approach : Significance score is a reliable criterion to distinguish biologically relevant motifs from spurious motifs Evaluated on prokaryotic motifs (RegulonDB) at level of genes and taxons Optimal taxonomic levels are intermediate (Gammaproteobacteria) Limits of the approach : Missing sites localized in coding sequences Missing sites localized in redundant fragment (purge) Perspectives : Purge calibration, Conservation of genes Phylogenetic tree Regulons prediction
24
Publications Discovery of conserved motifs in promoters of orthologous genes in prokaryotes. Janky R & van Helden J. Methods Mol Biol. in Press. Deciphering the evolution of cis-regulatory elements by taxonomy-traversing pattern discovery. Janky R & van Helden J. submitted.
25
Acknowledgements SCMBB Lab Jacques van Helden Sylvain Brohée Olivier Sand Jean Valery Turatsinze Karoline Faust Ariane Toussaint Raphaël Leplae Gipsi Lima Mendez Marc Lesink Benoit Dessailly Raul Mendez RSAT http://rsat.scmbb.ulb.ac.be/rsat/ RegulonDB http://www.cifn.unam.mx/Computational_Genomics/regulondb/ PhD Funding F.R.I.A. (FNRS)
26
Matching statistics Transcription Factor-wise Sensitivity (Sn.tf) fraction of transcription factors for which at least one of the binding sites has been matched by at least one of the discovered dyads in one of the target genes. Sn.tf=MF/(MF+NF) Correctness of predicted patterns Annotated sites Not annotated Predicted dyads Matching dyads (MD) Matched sites by factor (MF) Non-matching dyads (ND) Not predicted Non-matched sites by factor (NF) True negative (TN) ?
27
Transcription Factor-wise Sensitivity Evaluation by taxon (RegulonDB) Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
28
>YP_108439.1|Burkholderia_pseudomallei_K96243|lexA TGATGCGATGAGACGGCGCGCGCCGCCTGCCAGCCCCGTGTTGCGCTTGACCATCCGTTG TTCTCCAAATGCGTGGGTCGTTCGTTGCGGTTCGTTCGTTGCGGTGATGCGGTTTGTTCG CAATGTCGGGCCAGTCTAACGAACAGGTTTCATCATTAAAAATAATCGTTCCTCATTTTT TAATACTCAAAAGTGGTAAAGAGCCCCCGACCGACGACTATCGCGTCCGAAGCGTCGCTT CGACGCAGCCGAATCGCGGCGGAATCGCCGCGATAGCGCCGAACTTCAAGTAACGCTTGA ATTTCCCGCGATACTGTATAAAAATACAGCTCACTGTCTATCCATACAGTCATGCC >YP_333407.1|Burkholderia_pseudomallei_1710b|lexA CCGTTGTTCTCCAAATGCGTGGGTCGTTCGTTGCGGTTCGTTCGTTGCGGTGATGCGGTT TGTTCGCAATGTCGGGCCAGTCTAACGAACAGGTTTCATCATTAAAAATAATCGTTCCTC ATTTTTTAATACTCAAAAGTGGTAAAGAGCCCCCGACCGACGACTATCGCGTCCGAAGCG TCGCTTCGACGCAGCCGAATCGCGGCGGAATCGCCGCGATAGCGCCGAACTTCAAGTAAC GC >YP_443002.1|Burkholderia_thailandensis_E264|lexA CCGTTGTTCTCCAAATGCGTGGGTCGTTCGTTACGGTTTCTTGCGATTTGTTTGCGATGT CGGGCCAGTCTAACGAACAGGTTTCATCATTAAAAATAATCGTTCCTCATTTTTTAATAC TCAAAAGTGGTAAGGCGCGCCCGGATCTCGGCTATCGCGCCCGAAGCGCCGCTTCGACGC GGCCGGATCGCGACGGAATCGCCGCGATAGCGCGGAACCTCAAGTAACGC >YP_108439.1|Burkholderia_pseudomallei_K96243|lexA TGATGCGATGAGACGGCGCGCGCCGCCTGCCAGCCCCGTGTTGCGCTTGACCATCCGTTG TTCTCCAAATGCGTGGGTCGTTCGTTGCGGTTCGTTCGTTGCGGTGATGCGGTTTGTTCG CAATGTCGGGCCAGTCTAACGAACAGGTTTCATCATTAAAAATAATCGTTCCTCATTTTT TAATACTCAAAAGTGGTAAAGAGCCCCCGACCGACGACTATCGCGTCCGAAGCGTCGCTT CGACGCAGCCGAATCGCGGCGGAATCGCCGCGATAGCGCCGAACTTCAAGTAACGCTTGA ATTTCCCGCGATACTGTATAAAAATACAGCTCACTGTCTATCCATACAGTCATGCC >YP_333407.1|Burkholderia_pseudomallei_1710b|lexA nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nn >YP_443002.1|Burkholderia_thailandensis_E264|lexA nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnACGGTTTCTTGCGATTTGTTTGCGnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnGGCGCGCCCGGATCTCGGCTATCGCGCCCGAAGCGCCGCTTCGACGC GGCCGGATCGCGACGGAATCGCCGCGATAGCGCGGAACCTCAAGTAACGC + Purge
29
Evaluating parameters
30
Evaluating parameters... Parameters Parameters of the analysis Background model (MONAD versus TAXFREQ) Promoter location (UPSTREAM versus LEADER) Purge sequences stringency Parameters of the evaluation Matching weight (PERFECT MATCH versus MISMATCH) Annotated Sites (FLANKED versus NOT FLANKED)
31
Evaluation by taxon (RegulonDB) Testing parameters... Background model vs Promoter location (Sensitivity) LEADERUPSTREAM MONAD TAXFREQ Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
32
Evaluation by taxon (RegulonDB) Testing parameters... Background model vs Promoter location (PPV) MONAD TAXFREQ Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia LEADERUPSTREAM
33
Evaluation by taxon (RegulonDB) Testing parameters... Background model vs Promoter location (Acc.g) MONAD TAXFREQ Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia LEADERUPSTREAM
34
Evaluation by taxon (RegulonDB) Testing parameters... Purge sequences Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia PPVAcc.gSn ml=30bp,mis=0 ml=40bp,mis=3
35
Evaluation by taxon (RegulonDB) Testing parameters... Matching Weight (Perfect match vs one mismatch) PERFECT MISMATCH PPVAcc.gSn Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
36
Evaluation by taxon (RegulonDB) Testing parameters... Flanked vs not flanked annotated sites PPVAcc.gSn FLANKED NOT FLANKED Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
37
Summary We have evaluated a new phylogenetic footprinting approach Significance score is a reliable criterion to distinguish biologically relevant motifs from spurious motifs Evaluated on prokaryote motifs (RegulonDB) at level of genes and taxons Optimal taxonomic levels are intermediate (Gammaproteobacteria) Open the way of regulon prediction Limits of the approach : Missing sites localized in coding sequences Missing sites localized in redundant fragment (purge)
38
Validation Heat-map (PPV vs Sensitivity) Evaluation by gene and by taxon (RegulonDB) Link
40
Evaluation by gene and by taxon (Regulon LexA) Validation Heat-map (PPV vs Sensitivity)
41
Voet, Donald and Voet, Judith G., Biochemistry, John Wiley & Sons, Inc., © 2004 2e Edition : p.1180 Regulation of the SOS response in E.coli
42
Voet, Donald and Voet, Judith G., Biochemistry, John Wiley & Sons, Inc., © 2004 2e Edition : p.1180 Regulation of the SOS response in E.coli
43
Firmicutes (Gram+) motif for LexA e CGAACRNRYGTTYC lexA (Gram+ annotated)* AACNNNNGTT lexA (most significant dyad) AACNNNNNTTC lexA (significant dyad) TCGAACATATGTTCGA lexA Bacteria CGAACATATGTTCK lexA Firmicutes GAACANNNGTTC lexA Bacillales * KW.Winterling (J Bacteriol 1998); EO.Davis (J Bacteriol 2002) CGAACRNRYGTTYC lexA (Gram+ annotated)* CGAACA lexA (significant dyad) AACNNNNGTT lexA (significant dyad) LexA study case
44
CGAACRNRYGTTYC lexA (Gram+ annotated)* TCGAAC lexA (most significant dyad) CGAACA lexA TCGAACATATGTTCGA lexA Bacteria TCGAACA lexA Actinobacteria Feature-map for orthologous sequences of lexA within Actinobacteria Actinobacterial (Gram+) motif for LexA e LexA study case
45
e Evaluation with RegulonDB Significance of the discovered patterns
46
Study case : LexA regulon in E.coli umuCumuD ssb sulA recXrecA uvrD uvrA dinF lexA dnaGrpsU rpoD recA Source : Ecocyc, http://www.ecocyc.org uvrB phr
47
RSAT Orthologs of E.coli K12 genes Taxonomic tree scores consensus outputfeature-map dna-pattern Feature-map Taxonomy Prokaryote Genomes (NCBI) compare-scores pattern-assembly dyad-analysis Taxon-specific group of orthologs make-tree get-orthologs Fasta sequences retrieve-seq-multigenome Taxons frequencies calc-taxfreq footprint-analysis footprint-report.pl Html report Methods : analysis
48
RSAT Orthologs of E.coli K12 genes Taxonomic tree scores consensus Output (predicted patterns) feature-map dna-pattern Feature-map Taxonomy Prokaryote Genomes (NCBI) compare-scores pattern-assembly dyad-analysis Taxon-specific group of orthologs make-tree get-orthologs Fasta sequences retrieve-seq-multigenome Taxons frequencies calc-taxfreq footprint-analysis footprint-report.pl Html report Methods : Analysis and Evaluation RegulonDB
49
Orthologs of E.coli K12 genes Taxonomic tree Matching with annotated regulatory motifs Predicted patterns compare-patterns dyad-analysis Taxon-specific group of orthologs RegulonDB footprint-valid.pl Annotated sites Methods : evaluation Accuracy.R MySQL RegulonDB_validation.mk Evaluation stats by gene and by taxon Evaluation stats by taxon and by sig threshold heatmap4valid.R Validmap2html.pl Validation heatmap (.html) Graphs param=f(sig.threshold)
50
Sensitivity Evaluation by taxon (RegulonDB) Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
51
Positive Predictive Value Evaluation by taxon (RegulonDB) Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
52
Geometric Accuracy Evaluation by taxon (RegulonDB) Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Escherichia
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.