Greg Challis Department of Chemistry, University of Warwick, UK Lecture 1: Introduction to computer workshops Microbial Metabolites: Signals to Drugs Dubrovnik, Croatia, 21 August - 29 August 2010 Greg Challis Department of Chemistry, University of Warwick, UK Govind Chandra & Mervyn Bibb Department of Molecular Microbiology, John Innes Centre, UK
Overview Cloning and sequencing of secondary metabolite gene clusters Analysis of raw sequence data Cryptic (orphan) gene clusters in microbial genomes Nonribosomal peptide biosynthesis
Cloning of secondary metabolic biosynthesis gene clusters Identify putative biosynthetic gene in genome of producer e.g. by PCR using degenerate primers Construct large insert genomic library e.g. fosmid or cosmid library Screen library for clones containing putative biosynthetic gene e.g. using PCR or colony hybridisation Sequence and assemble insert of isolated clones e.g. by sending to a company for shotgun sequencing
TCTAGATCGATCCCGCGAAATTAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCTCGA GAAGGAGATATACATATGGCAGATCTGAGCAAACTCTCCGATTCTCGCACCGCCCAGCCGGGCCGCATCG TCCGCCCATGGCCGCTGTCTGGCTGCAATGAATCCGCATTGCGTGCTCGCGCCCGGCAGCTTCGGGCACA CCTGGACCGTTTTCCGGACGCGGGCGTGGAGGGCGTGGGTGCGGCATTGGCCCACGACGAGCAGGCGGAC GCAGGTCCGCATCGTGCGGTGGTTGTTGCTTCATCGACCTCAGAATTACTGGATGGTCTGGCCGCGGTGG CCGATGGTCGCCCGCATGCGAGCGTCGTACGCGGGGTTGCGCGTCCTTCTGCCCCGGTAGTGTTTGTGTT TCCTGGGCAGGGGGCACAGTGGGCAGGTATGGCGGGCGAGCTGCTTGGCGAGTCGCGCGTGTTCGCTGCC GCCATGGACGCCTGTGCTCGCGCGTTCGAACCTGTGACAGACTGGACGCTTGCACAGGTCCTGGATAGCC CTGAACAAAGCCGCCGCGTTGAAGTGGTCCAGCCAGCGTTATTCGCCGTGCAAACTTCGCTAGCGGCGCT CTGGCGTTCCTTTGGCGTGACCCCAGATGCTGTGGTTGGCCATTCAATTGGTGAATTAGCAGCGGCGCAT GTTTGCGGTGCCGCAGGTGCGGCGGATGCAGCGCGCGCAGCGGCACTGTGGAGTCGCGAGATGATTCCGT TGGTGGGCAACGGCGACATGGCCGCTGTCGCTCTGTCGGCAGATGAAATTGAACCACGTATCGCGCGCTG GGACGATGACGTAGTGCTGGCGGGCGTCAACGGTCCGCGGTCCGTCCTGTTGACAGGGTCACCTGAACCC GTAGCTCGTCGTGTGCAGGAACTGAGCGCCGAGGGCGTACGCGCCCAGGTAATCAATGTTAGCATGGCTG CGCATAGCGCTCAGGTTGATGACATCGCTGAGGGTATGCGTAGTGCCCTGGCGTGGTTTGCCCCAGGCGG CTCCGAAGTTCCGTTCTACGCCTCACTGACCGGCGGTGCGGTTGATACCCGTGAGTTAGTAGCCGATTAC TGGCGTCGTTCTTTTCGGCTACCGGTACGGTTTGATGAAGCGATCCGCAGTGCCTTGGAAGTAGGCCCGG GTACGTTTGTCGAAGCGAGCCCGCATCCTGTGTTGGCGGCGGCGCTGCAACAGACCCTGGATGCCGAAGG TTCAAGCGCGGCTGTTGTACCTACACTGCAGCGTGGTCAAGGGGGCATGCGTCGCTTCCTGTTGGCCGCG GCCCAGGCTTTCACTGGCGGCGTCGCGGTTGACTGGACGGCCGCTTACGATGATGTTGGTGCCGAACCAG GTTCGCTGCCTGAGTTCGCTCCGGCCGAAGAAGAGGACGAGCCGGCAGAGTCCGGGGTTGATTGGAACGC ACCGCCACACGTGCTCCGCGAACGTCTGCTGGCTGTGGTGAACGGGGAGACCGCAGCTCTTGCAGGCCGC GAAGCTGACGCAGAGGCGACCTTTCGCGAATTAGGTCTCGATTCTGTGTTAGCAGCCCAGCTGCGCGCGA AAGTCAGCGCGGCCATTGGCCGTGAAGTGAATATTGCGCTGTTATATGACCATCCAACCCCGCGTGCACT TGCGGAGGCACTGTCTAGTGGGACGGAAGTAGCGCAACGCGAGACTCGCGCCCGTACAAACGAAGCTGCA CCTGGCGAACCAATTGCGGTAGTAGCGATGGCATGTCGTTTACCGGGCGGTGTATCGACCCCTGAAGAGT
Artemis
Sequencing Shift from long reads - low coverage to short reads - high coverage. The read lengths of 454 are approaching those of capillary and gel methods. Illumina can now give read lengths of 100 nucleotides. Coupled with some clever strategies such as paired end sequencing, we can get long high quality contigs from the short reads coming out of the machines.
Assembly Affected by both quality and length of the reads. High GC (or AT) presents another hurdle to assembly. High coverage helps but only to a limited extent. Assembly can suffer due to very high coverage. Best left to people who do this for a living. But you do need to understand the process enough to be able to do some independent quality checks.
No more “finishing” Assembly Primer Design Sequencing Cycle till all gaps were closed and all ambiguities resolved. Because we just want a sequence we can mine with some degree of confidence, there is no need for the sequence to be finished to a single contig.
Beware Multiple contigs Uncertainty about the correctness of contigs. It is better to have a few more contigs than to have wrongly assembled ones.
A B C A C B A C B
Mining Contigs can be searched for clusters. Clusters may be scattered over several contigs due to mis-assembly. blastp: Fast, but will not find any proteins which have not been called in the contigs. tblastn: Slower. Search a nucleotide database with a protein query. Also helps by indicating potentially adjacent contigs and wrongly assembled ones. Use both. Make a cosmid library and sequence the positive cosmids.
Annotation ORF calling rRNAS tRNAs Rfam RAST http://rast.nmpdr.org/
Sequence gazing n contigs RAST n GenBank files These are plain text files. Do not open in any word processor. Use notepad or download a decent text editor from the web. Sequence visualisation — Artemis. Sequence comparison — ACT. The Artemis ACT workshop manual takes over from here.
‘Cryptic’ (orphan) biosynthetic gene clusters Present in many of the 739 sequenced microbial genomes e.g. Streptomyces avermitilis Streptomyces coelicolor Bacillus subtilis Pseudomonas fluorescens Pseudomonas syringae Nostoc punctiforme Aspergillus nidulans Polyketide synthases Nonribosomal peptide synthetases Terpene synthases May prove a valuable new source of bioactive metabolites
Genome sequence of the model antibiotic-producer Streptomyces coelicolor M145
Gene clusters directing complex metabolite biosynthesis in the S Gene clusters directing complex metabolite biosynthesis in the S. coelicolor genome
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
NRPSs are metabolic assembly lines – penicillin biosynthesis as an example
Prediction of NRPS module substrate specificity GrsA DASVWEMFMALLTGASLYIILKDTINDFVKFEQYINQKEITVITLPPTYVVHL-----DPERILSIQTLITAGSATSPSLVNKWKEK--VTYINAYGPTETTI Ncs1-M1 DIAVWELLAAFVGGARLVIAEHRLRGVVPHLPELMTDHRVTVAHFVPSVLEELLGWMADGGRVG-LRLVVCGGEAVPPSQRDRLLALSGARMVHAYGPTETTI GrsA D A W T I A A I Ncs1-M1 D I W H V G A I Challis, Ravel and Townsend, Chem. Biol. (2000) 7, 211-224 Stachelhaus, Mootz and Marahiel, Chem Biol. (1999) 6, 493-505
NRPS-PKS
Questions? About things in the talk About the manuals About computing in relation to sequence analysis in general
BLASTP
Artemis Comparison Tool (ACT)
ORF Finder