Computational genomic strategies for natural product discovery Dr. Marnix H. Medema Bioinformatics Group Wageningen University, The Netherlands EBI Course Exploiting Metagenomics Thursday, december 3rd, 2015, 11:00h
Microbial Biosynthetic pathways: a great source of valuable molecules
Specialized metabolites play key roles in microbiomes
Specialized metabolites play key roles in microbiomes Donia et al. (2014) Cell 158: 1402-1414.
Diverse and complex enzymology produces chemical diversity: riPPs Huge assembly-lines, But: not the only mechanism RiPPs: Ribosomally synthesized and Posttranslationally modified Peptides Ortega et al. (2015), Nature 517: 509-512.
nonproteinogenic amino Diverse and complex enzymology produces chemical diversity: nonribosomal peptides Key enzyme class: Nonribosomal Peptide Synthetase (NRPS) NRPSs can introduce nonproteinogenic amino acids into peptides! Huge assembly-lines, But: not the only mechanism
Diverse and complex enzymology produces chemical diversity: nonribosomal peptides Huge assembly-lines, But: not the only mechanism Schmartz et al. (2014), Nat. Prod. Rep. 12: 5574-5577.
Diverse and complex enzymology produces chemical diversity: polyketides Key enzyme class: Polyketide synthase (PKS) Huge assembly-lines, But: not the only mechanism Menzella et al. (2005), Nat. Biotechnol. 23: 1171-1176.
Diverse and complex enzymology produces chemical diversity: polyketides Not all polyketide synthases are modular, some are iterative! Fungal Type I Type II Type III etc. Huge assembly-lines, But: not the only mechanism Shen et al. (2003), Curr. Opin. Chem. Biol. 7: 285-295.
Diverse and complex enzymology produces chemical diversity: terpenes Huge assembly-lines, But: not the only mechanism Key enzyme classes: terpene synthases / cyclases These turn isoprene precursors into mature terpenoids Gao et al. (2012), Nat. Prod. Rep. 29: 1153-1175.
Diverse and complex enzymology produces chemical diversity: saccharides Key enzyme class: glycosyl transferase Huge assembly-lines, But: not the only mechanism McCranie & Bachmann et al. (2014), Nat. Prod. Rep. 31: 1026-1042.
Biosynthetic gene clusters: the genetic basis of molecular diversity So if we can find new gene clusters, we can find new chemicals! Now how to find new gene clusters?
Modularity of biosynthetic gene clusters Second strategy Cacho et al. (2015) Front. Microbiol 5: 774.
Modularity of biosynthetic gene clusters Second strategy Medema, Cimermancic et al. (2015) PLoS Comp. Biol. 10: e1004016
antiSMASH: A Web Server for the Detection and analysis of biosynthetic gene clusters 15 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W204-212. http://antismash.secondarymetabolites.org
Core structure prediction for polyketide synthase and nonribosomal peptide synthetase gene clusters 16 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W204-212. http://antismash.secondarymetabolites.org
Comparative analysis and subcluster detection 17 Medema et al. (2011) Nucl. Acids Res. 39: W339-W346. Blin, Medema et al. (2013) Nucl. Acids Res. 41: W204-212. http://antismash.secondarymetabolites.org
Another Method to Detect Biosynthetic Gene Clusters in Prokaryotic Genomes Training set consisted of 732 biosynthetic gene clusters of known compounds: 136 type I polyketides 100 nonribosomal peptides 76 type II polyketides 82 polyketide-peptide hybrids 93 oligo- and polysaccharides 38 aminoglycosides 36 terpenoids 27 ribosomal peptides 23 lantibiotics 13 indolocarbazoles 11 type III polyketides 9 fatty acids 9 siderophores 8 nucleosides 6 beta-lactams 4 aminocoumarins 61 others Cimermancic, Medema, Claesen et al. (2014) Cell 158: 412-421
Large metagenomic datasets may contain very large numbers of biosynthetic gene clusters Now there are of course both rare and frequently occurring classes of gene clusters / compounds. What we had not expected was to find large clusters within this network that contain no known gene clusters. We chose one of these regions, which contained two related families of hundreds of gene clusters encoding amongst others very unusual ketosynthases CoA-ligases. Cimermancic, Medema, Claesen et al. (2014) Cell 158: 412-421
Data on bgcs is scattered and not systematically stored
The minimum information about a biosynthetic gene cluster (MIBiG) 21 Medema et al. (2015) Nature Chem. Biol., under review.
a rich set of annotations and metadata on biosynthetic gene clusters General MIBiG Parameters Biosynthetic class MIxS environmental / taxonomic information Number of loci Complete / partial cluster Nucleotide sequence accession 16S accession / sequence Custom gene names Functional sub-clusters Biosynthetic genes Transport-related genes Regulatory genes Resistance/immunity genes Operon architecture Knockout mutant phenotypes Compound name Synonyms for compound name Exact molecular mass Molecular formulae of the compound(s) Compound structure Chemical moieties Compound activity Compound molecular target Publications on activity/toxicity/target Tailoring reactions Evidence for compound-cluster connection Polyketide-specific Polyketide synthase type Polyketide subclass Linear / cyclic PKS genes Number of PKS modules Ketide unit sequence Starter Unit Reductive domains KR stereochemistries AT domain substrate specificities Non-reductive modifying PKS domains Module skipping / iteration Number of iterations (if iterative) Iterative PKS subtype (if iterative) Trans-acyltransferase genes Inactive / atypical domains TE domain type Cyclization / termination type Nonribosomal peptide-specific NRP subclass Linear / cyclic NRPS genes Number of NRPS modules NRP amino acid sequence A domain substrate specificities Variable A domain specificities Condensation domain subtypes Modifying domains (Me/Ox/Red/Epi) Module skipping / iteration TE domain type Cyclization / termination type RiPP-specific RiPP subclass Linear/cyclic Precursor-encoding gene(s) Precursor peptide length Leader peptide length Follower peptide length Core peptide length Core peptide sequence Cleavage recognition site Number of crosslinks Crosslink positions Type of crosslinks/cyclizations Recognition motif in leader peptide Terpenoid-specific Terpene subclass Precursor carbon chain length Final isoprenoid precursor Terpene synthases / cyclases Prenyltransferases Saccharide-specific Saccharide subclass Glycosyltransferase (GT) genes GT substrate specificities Alkaloid-specific Alkaloid subclass Specific for other classes Biosynthetic class specification 22 Again, MIBiG has an important role to play here, as standardized data submission and storage will allow us to build up a parts registry that can function as a trustworthy repository for designing new pathways. Medema et al. (2015) Nature Chem. Biol., under review.
>75 research groups worldwide participated Community annotation of biosynthetic gene clusters using MIBiG 23 >75 research groups worldwide participated Result: detailed annotation of ±400 BGCs, essential annotations for another ±900 BGCs So we currently have a draft version of MIBiG, on which between 60-70 PIs in the field have already commented through an online survey. Later this week, I will organize a discussion session, to which I would like to invite you all to discuss this further. A standard has to be carried by the community.
Community annotation of biosynthetic gene clusters using MIBiG 24 So we currently have a draft version of MIBiG, on which between 60-70 PIs in the field have already commented through an online survey. Later this week, I will organize a discussion session, to which I would like to invite you all to discuss this further. A standard has to be carried by the community.
An online repository for MIBIG information 25 http://mibig.secondarymetabolites.org
Integration with antismash: KnownClusterblast 26
Finding more variants of known enzymatic parts using Multigeneblast 27 Medema et al. (2013) Mol. Biol. Evol. 30: 1218-1223. http://multigeneblast.sf.net
Finally: some suggestions for analyzing metagenomes using antismash 28 Assemble first! Only run contigs > 2 kb; use other tools for very fragmented assemblies, e.g. http://napdos.ucsd.edu/ Sort contigs by size, if >1000 contigs: run locally or contact us to run it on the public server Local installations: Docker container available http://phdops.kblin.org/2015-running-antismash-standalone-from-docker.html