Greg Challis Department of Chemistry, University of Warwick, UK

Slides:



Advertisements
Similar presentations
Ufedo Ruby Awodi and Greg L. Challis
Advertisements

METAGENOMICS OF CYANOBACTERIAL BLOOMS Phillip B Pope and Bharat K.C. Patel Microbial Gene Research and Resources Facility, School of Biomolecular and Biomedical.
Greg Challis Department of Chemistry Lecture 1: Methods for in silico analysis of cryptic natural product biosynthetic gene clusters Microbial Genomics.
Central Dogma Information storage in biological molecules DNA RNA Protein transcription translation replication.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
UCSC Archaeal genome browser September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.
Screening and genome mining of polyether-producing strains in actinomycetes Minghao Liu, Hao Wang, Ning Liu, Jisheng Ruan and Ying Huang* State Key Laboratory.
David Hopwood Lecture 1 (DH1). Isolation of microbes from soil: fungi, actinomycetes, other bacteria (left); streptomycetes (right)
DNA Technology- Cloning, Libraries, and PCR 17 November, 2003 Text Chapter 20.
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Large-scale genome projects
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Greg Challis Department of Chemistry Lecture 2: Methods for experimental identification of cryptic biosynthetic gene cluster products Microbial Genomics.
Epoxomicin: Assembly Line Engineering for Pharmaceutical Drug Production Using Natural Product Gene Clusters Anna Klavins, Haley Hoffman August 13, 2015.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Bioinformatics Lecture to accompany BLAST/ORF finder activity Start with orientation to activity, for taking notes effectively Slide difference between.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop January 31, 2012.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
Investigating the Antibiotic Productivity of Streptomyces rimosus A. MACFADYEN 1, Z. TANG 1,2, R. KIRBY 3, R. EDRADA-EBEL 1, I. HUNTER 1 and P. HERRON.
Questions we can address with bioinformatic analysis and genome sequence comparison: 1.Why is a given pathogen more virulent? 2.What is the geographic.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Jun Xu, Taifo Mahmud* and Heinz G. Floss* Department of Chemistry, University of Washington, Box , Seattle, WA Identification and Characterization.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Traditional approach for bioactive natural product discovery fractionation extraction Investigate bioactivity of extract Identify active fraction(s) and.
Are Roche 454 shotgun reads giving a accurate picture of the genome?
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
University of Bucharest Collage of Engineering
Virginia Commonwealth University
Microbial genomics.
Bacterial infection by lytic virus
Introduction to Bioinformatics Resources for DNA Barcoding
The Integrated Microbial Genome (IMG) systems
Lesson: Sequence processing
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Basics of BLAST Basic BLAST Search - What is BLAST?
Computational genomic strategies for natural product discovery
Very important to know the difference between the trees!
Workshop on the analysis of microbial sequence data using ARB
Department of Genetics • Stanford University School of Medicine
Summary PA14 Genome Sequencing Project Pseudomonas syringae update
Identification and Characterization of pre-miRNA Candidates in the C
Identify D. melanogaster ortholog
Comparative Genomics.
Explore Evolution: Instrument for Analysis
Volume 19, Issue 2, Pages (February 2012)
Basic Local Alignment Search Tool (BLAST)
Introduction to Sequencing
Microbial Molecules from the Multitudes within Us
Leinamycin Biosynthesis Revealing Unprecedented Architectural Complexity for a Hybrid Polyketide Synthase and Nonribosomal Peptide Synthetase  Gong-Li.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Volume 12, Issue 3, Pages (March 2005)
Learning a hidden graph with adaptive algorithms
Nonribosomal Biosynthesis of Fusaricidins by Paenibacillus polymyxa PKB1 Involves Direct Activation of a d-Amino Acid  Jingru Li, Susan E. Jensen  Chemistry.
Strategies for Engineering Natural Product Biosynthesis in Fungi
Presentation transcript:

Greg Challis Department of Chemistry, University of Warwick, UK Lecture 1: Introduction to computer workshops Microbial Metabolites: Signals to Drugs Dubrovnik, Croatia, 21 August - 29 August 2010 Greg Challis Department of Chemistry, University of Warwick, UK Govind Chandra & Mervyn Bibb Department of Molecular Microbiology, John Innes Centre, UK

Overview Cloning and sequencing of secondary metabolite gene clusters Analysis of raw sequence data Cryptic (orphan) gene clusters in microbial genomes Nonribosomal peptide biosynthesis

Cloning of secondary metabolic biosynthesis gene clusters Identify putative biosynthetic gene in genome of producer e.g. by PCR using degenerate primers Construct large insert genomic library e.g. fosmid or cosmid library Screen library for clones containing putative biosynthetic gene e.g. using PCR or colony hybridisation Sequence and assemble insert of isolated clones e.g. by sending to a company for shotgun sequencing

TCTAGATCGATCCCGCGAAATTAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCTCGA GAAGGAGATATACATATGGCAGATCTGAGCAAACTCTCCGATTCTCGCACCGCCCAGCCGGGCCGCATCG TCCGCCCATGGCCGCTGTCTGGCTGCAATGAATCCGCATTGCGTGCTCGCGCCCGGCAGCTTCGGGCACA CCTGGACCGTTTTCCGGACGCGGGCGTGGAGGGCGTGGGTGCGGCATTGGCCCACGACGAGCAGGCGGAC GCAGGTCCGCATCGTGCGGTGGTTGTTGCTTCATCGACCTCAGAATTACTGGATGGTCTGGCCGCGGTGG CCGATGGTCGCCCGCATGCGAGCGTCGTACGCGGGGTTGCGCGTCCTTCTGCCCCGGTAGTGTTTGTGTT TCCTGGGCAGGGGGCACAGTGGGCAGGTATGGCGGGCGAGCTGCTTGGCGAGTCGCGCGTGTTCGCTGCC GCCATGGACGCCTGTGCTCGCGCGTTCGAACCTGTGACAGACTGGACGCTTGCACAGGTCCTGGATAGCC CTGAACAAAGCCGCCGCGTTGAAGTGGTCCAGCCAGCGTTATTCGCCGTGCAAACTTCGCTAGCGGCGCT CTGGCGTTCCTTTGGCGTGACCCCAGATGCTGTGGTTGGCCATTCAATTGGTGAATTAGCAGCGGCGCAT GTTTGCGGTGCCGCAGGTGCGGCGGATGCAGCGCGCGCAGCGGCACTGTGGAGTCGCGAGATGATTCCGT TGGTGGGCAACGGCGACATGGCCGCTGTCGCTCTGTCGGCAGATGAAATTGAACCACGTATCGCGCGCTG GGACGATGACGTAGTGCTGGCGGGCGTCAACGGTCCGCGGTCCGTCCTGTTGACAGGGTCACCTGAACCC GTAGCTCGTCGTGTGCAGGAACTGAGCGCCGAGGGCGTACGCGCCCAGGTAATCAATGTTAGCATGGCTG CGCATAGCGCTCAGGTTGATGACATCGCTGAGGGTATGCGTAGTGCCCTGGCGTGGTTTGCCCCAGGCGG CTCCGAAGTTCCGTTCTACGCCTCACTGACCGGCGGTGCGGTTGATACCCGTGAGTTAGTAGCCGATTAC TGGCGTCGTTCTTTTCGGCTACCGGTACGGTTTGATGAAGCGATCCGCAGTGCCTTGGAAGTAGGCCCGG GTACGTTTGTCGAAGCGAGCCCGCATCCTGTGTTGGCGGCGGCGCTGCAACAGACCCTGGATGCCGAAGG TTCAAGCGCGGCTGTTGTACCTACACTGCAGCGTGGTCAAGGGGGCATGCGTCGCTTCCTGTTGGCCGCG GCCCAGGCTTTCACTGGCGGCGTCGCGGTTGACTGGACGGCCGCTTACGATGATGTTGGTGCCGAACCAG GTTCGCTGCCTGAGTTCGCTCCGGCCGAAGAAGAGGACGAGCCGGCAGAGTCCGGGGTTGATTGGAACGC ACCGCCACACGTGCTCCGCGAACGTCTGCTGGCTGTGGTGAACGGGGAGACCGCAGCTCTTGCAGGCCGC GAAGCTGACGCAGAGGCGACCTTTCGCGAATTAGGTCTCGATTCTGTGTTAGCAGCCCAGCTGCGCGCGA AAGTCAGCGCGGCCATTGGCCGTGAAGTGAATATTGCGCTGTTATATGACCATCCAACCCCGCGTGCACT TGCGGAGGCACTGTCTAGTGGGACGGAAGTAGCGCAACGCGAGACTCGCGCCCGTACAAACGAAGCTGCA CCTGGCGAACCAATTGCGGTAGTAGCGATGGCATGTCGTTTACCGGGCGGTGTATCGACCCCTGAAGAGT

Artemis

Sequencing Shift from long reads - low coverage to short reads - high coverage. The read lengths of 454 are approaching those of capillary and gel methods. Illumina can now give read lengths of 100 nucleotides. Coupled with some clever strategies such as paired end sequencing, we can get long high quality contigs from the short reads coming out of the machines.

Assembly Affected by both quality and length of the reads. High GC (or AT) presents another hurdle to assembly. High coverage helps but only to a limited extent. Assembly can suffer due to very high coverage. Best left to people who do this for a living. But you do need to understand the process enough to be able to do some independent quality checks.

No more “finishing” Assembly Primer Design Sequencing Cycle till all gaps were closed and all ambiguities resolved. Because we just want a sequence we can mine with some degree of confidence, there is no need for the sequence to be finished to a single contig.

Beware Multiple contigs Uncertainty about the correctness of contigs. It is better to have a few more contigs than to have wrongly assembled ones.

A B C A C B A C B

Mining Contigs can be searched for clusters. Clusters may be scattered over several contigs due to mis-assembly. blastp: Fast, but will not find any proteins which have not been called in the contigs. tblastn: Slower. Search a nucleotide database with a protein query. Also helps by indicating potentially adjacent contigs and wrongly assembled ones. Use both. Make a cosmid library and sequence the positive cosmids.

Annotation ORF calling rRNAS tRNAs Rfam RAST http://rast.nmpdr.org/

Sequence gazing n contigs RAST n GenBank files These are plain text files. Do not open in any word processor. Use notepad or download a decent text editor from the web. Sequence visualisation — Artemis. Sequence comparison — ACT. The Artemis ACT workshop manual takes over from here.

‘Cryptic’ (orphan) biosynthetic gene clusters Present in many of the 739 sequenced microbial genomes e.g. Streptomyces avermitilis Streptomyces coelicolor Bacillus subtilis Pseudomonas fluorescens Pseudomonas syringae Nostoc punctiforme Aspergillus nidulans Polyketide synthases Nonribosomal peptide synthetases Terpene synthases May prove a valuable new source of bioactive metabolites

Genome sequence of the model antibiotic-producer Streptomyces coelicolor M145

Gene clusters directing complex metabolite biosynthesis in the S Gene clusters directing complex metabolite biosynthesis in the S. coelicolor genome

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

NRPSs are metabolic assembly lines – penicillin biosynthesis as an example

Prediction of NRPS module substrate specificity GrsA DASVWEMFMALLTGASLYIILKDTINDFVKFEQYINQKEITVITLPPTYVVHL-----DPERILSIQTLITAGSATSPSLVNKWKEK--VTYINAYGPTETTI Ncs1-M1 DIAVWELLAAFVGGARLVIAEHRLRGVVPHLPELMTDHRVTVAHFVPSVLEELLGWMADGGRVG-LRLVVCGGEAVPPSQRDRLLALSGARMVHAYGPTETTI GrsA D A W T I A A I Ncs1-M1 D I W H V G A I Challis, Ravel and Townsend, Chem. Biol. (2000) 7, 211-224 Stachelhaus, Mootz and Marahiel, Chem Biol. (1999) 6, 493-505

NRPS-PKS

Questions? About things in the talk About the manuals About computing in relation to sequence analysis in general

BLASTP

Artemis Comparison Tool (ACT)

ORF Finder