SPECIES AT THE GENOMIC LEVEL. DDH has been the gold standard  the “sex” for higher eukaryotes Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52:846-849.

Slides:



Advertisements
Similar presentations
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Advertisements

 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Tutorial #2. Quiz next week Cover everything you’ve seen in the course so far Combination of True/False, definition, short answer, or some similar question.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
DNA Forensics. DNA Fingerprinting - What is It? Use of molecular genetic methods that determine the exact genotype of a DNA sample in a such a way that.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
HOGENOM a phylogenomic database
Lecture 4 – Characters: Molecular First used by Luca Cavalli-Sforza and Anthony Edwards.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Species  OTUs  OPUs  Species  OTUs  OPUs. Rosselló-Mora & Amann 2001, FEMS Rev. 25:39-67 Taxa circumscription depends on the observable characters.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
3- NON-RIBOSOMAL GENE RECONSTRUCTION  Core / auxiliary / strain specific genes  Housekeeping genes and accordance with global reconstruction  MLSA 
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
Bioinformatics Lecture to accompany BLAST/ORF finder activity Start with orientation to activity, for taking notes effectively Slide difference between.
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Construction of Substitution Matrices
“It is less clear, however, whether our species demarcations provide this information for the vast majority of prokaryotes that are never going to cause.
Phylogenomics “The intersection of phylogenetics and genomics”
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Bioinformatics Lecture to accompany BLAST/ORF finder activity
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
What is sequencing? Video: WlxM (Illumina video) WlxM.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Alignments and phylogenetic trees
bacteria and eukaryotes
Research Paper on BioInformatics
Preprocessing Data Rob Schmieder.
Problem with N-W and S-W
Phylogeny - based on whole genome data
Phylogenetic Signal in the Distribution of Chlor-tetracycline and Oxy-tetracycline Gene Clusters across a Large-scale Phylogeny of the Streptomycetaceae.
Basics of BLAST Basic BLAST Search - What is BLAST?
Lecture 4: Probe & primer design
Genome organization and Bioinformatics
Dr Tan Tin Wee Director Bioinformatics Centre
Identify D. melanogaster ortholog
Sequence alignment, Part 2
Comparative Genomics.
Phylogenetic footprinting and shadowing
Conservation in Evolution
Basic Local Alignment Search Tool (BLAST)
Phylogenetic tree of 38 Pseudomonas type strains, based on the V3-V5 region sequence of the 16S rRNA gene (V3 primer, positions 442 to 492; and V5 primer,
Basic Local Alignment Search Tool
Phylogenetic comparison among selected Pasteurella multocida and Haemophilus influenzae species with completed genome sequences. Phylogenetic comparison.
Presentation transcript:

SPECIES AT THE GENOMIC LEVEL

DDH has been the gold standard  the “sex” for higher eukaryotes Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52: Rosselló-Mora & Amann 2001, FEMS Rev. 25:39-67 Gevers et al., 2005, Nature Rev. Microbiol. 3: DDH (DNA-DNA hybridization):  70% similarity (50-70%)  used since the 60’s  strong influence  non cumulative DB  need to be substituted MLSA (multilocus sequence analysis):  5-10 full/partial sequences  house keeping genes  primer design difficulties  biases in the selection of genes  time consuming  ↓↓ number for stable topology Amplify and sequence 5-10 housekeeping genes for each strain Concatenate gene sequences Reconstruct the phylogeny genAgenBgenCgenDgenEgenF Str. 1 Str. 2 Str. 3 Str. 4

Alternative approaches  ANI Konstantinidis and Tiedje, 2005, PNAS. 102: Genome a BLAST N Genome b Search annotated ORFs sieve common orthologous genes ANI aa b genome Cut into fragments of 1020 nuc + BLAST N < 30% identity < 70% aligned seq > 30% identity > 70% aligned seq discard ANI Goris et al., 2007, IJSEM. 7:81-91

JSpecies ( JSpecies  Biologist oriented  user friendly and usable with multifasta data

ANI is way to circumscribe species genomically in the future ANIm vs DDH:  85 genospecies evaluated  94-96% a plausible borderline  inconsistent results most probably due to wrong DDH values  ANI thresholds of 94-96%  genomospecies  20% random sequences (i.e., 250 nuc) of two genomes is enough  Complete catalogue of type strain genomes  only 4% random genome sequence is enough Richter & Rosselló-Móra 2009, PNAS 106:

The best scenario ◄► all species genomes sequenced afedcb glkjih mrqpon sxwvut complete type strain genomes + < 20% random sequence genome coverage Perhaps with 1000 reads would be enough (200€) STABLE ANI  1% of the genome will be enough for IDENTIFICATION purposes  need of an effort to full sequence the species collection  need of an effort to full sequence the species collection (GEBA; Wu et al Nature 24: )  it will be in the future necessary to fully sequence any new type strain  94% - 96% ANI boundary

► Data analysis in summer 2009 => 938 genomes ► 10% of the entries tagged with the collection number (the rest with original strain number) ► 255 species names represented by their Type Strain ► 256 species names NOT represented by their Type Strain ► 50 species names NEVER validly published ► it is possible to circumscribe uncultured species (i.e. Buchnera & Wolbachia) Richter & Rosselló-Móra 2009, PNAS 106: Genome database & Type strains

Tetranucleotide variation: 4 4 = 256 TETRA:  Genomes have an oligonucleotide usage (not yet understood, related to codon usage)  Similar genomes might have similar usage  ALIGNMENT FREE PARAMETER  may be useful in deciding whether a group of strains deserve a species status  Same species >0.999

► The case of the synthetic genome of M. mycoides strain GM12 transplanted to M. capricolum (Science (2010) 329: 52) ► 88.5 (66% aligned) ► 94.5 (78% aligned) ► 87.8 (76% aligned)

► Only one of the several transplantations worked out! ► Different ways of reading the genome? organismtargetANI TETRA (r) M. hyopneumoniae 7448 M. hyopneumoniae J M. mycoides LC M. capricolum M. genitalium M. capricolum M. genitalium M. pneumoniae M. genitalium M. gallisepticum M. aligatoris (crocodyli) M. capricolum Same species WorkedNONONO Genome transplantation experiments of Venter

► The phylogenetic (evolutive) distance plays an important role in the recognition of how the genetic information is coded ► M. genitalium  M. pneumoniae, strange! Wrong identified strain?

OTHER PARAMETERS Average Aminoacid Identity (AAI) Kostantinidis & Tiejde, 2005, J. Bacteriol. 187: Maximal Unique and Exact Matches (MUM) De Loger et al., 2009, J. Bacteriol. 191: High Scoring Segment Pairs (HSP) (HSP) Auch et al., Std Gen Sci 2: And more to come Need full genome sequences The easiest is the best