ESPRIT. Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one.

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

BLAST Sequence alignment, E-value & Extreme value distribution.
Molecular Evolution Revised 29/12/06
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence alignment, E-value & Extreme value distribution
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
An Introduction to Bioinformatics
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
SAGExplore web server tutorial for Module II: Genome Mapping.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Presenter: Yang Ruan Indiana University Bloomington
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Calculating branch lengths from distances. ABC A B C----- a b c.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Whole Genome Repeat Analysis Package A Preliminary Analysis of the Caenorhabditis elegans Genome Paul Poole.
MUSCLE An Attractive MSA Application. Overview Some background on the MUSCLE software. The innovations and improvements of MUSCLE. The MUSCLE algorithm.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
MERG Contents 1.Bioportal A) Registration. B) Managing projects, files, and jobs. C) Submitting / checking jobs. 2.AIR (Appender, Identifier, and Remover)
GE3M25: Computer Programming for Biologists Python, Class 5
Phylogenetics.
Accurate estimation of microbial communities using 16S tags
Construction of Substitution matrices
SAGExplore web server tutorial. The SAGExplore server has three different modules …
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Robert Edgar Independent scientist
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Micelle PCR reduces artifact formation in 16S microbiota profiling
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Peter Sterk EBI Metagenomics Course 2014
Phylogeny - based on whole genome data
Presented By: Chinua Umoja
PNAS 2012 Alpha diversity: how many species are in each sample?
EDNA analyze Wang Ying & Huang Junman.
Blast Basic Local Alignment Search Tool
Research in Computational Molecular Biology , Vol (2008)
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Department of Computer Science
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Sequence comparison: Significance of similarity scores
Independent scientist
Analysis of the factors affecting the formation of the microbiome associated with chronic osteomyelitis of the jaw  A. Goda, F. Maruyama, Y. Michi, I.
BLAST.
Explore Evolution: Instrument for Analysis
Lecture 7 – Algorithmic Approaches
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

ESPRIT

Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one level up in the hierarchy ● Assignment is as accurate as possible ● Species detail is lost ● Not good enough to measure genetic diversity and species richness

MOTU / OTU ● Molecular operational taxonomic unit ● Operational Taxonomic Unit ● 3%species ● 5%genus ● 10%phylum ● Controversial ● Practical ● As long as you remember there is no real association

Computing OTUs ● Measure distance among two sequences ● If < cutoff ● They belong to the same group (out) ● If > cutoff ● They belong in different groups ● Each sequence must be checked against all others ● Requires a distance matrix ● Distances are calculated by sequence comparison

Multiple Sequence Alignment ● Slow ● New developments: MAFFT, MUSCLE, CLUSTAL-OMEGA ● Slow for hundreds of thousands of sequences ● MSA leads to inflated estimates ● Arguable results for 16S hypervariable regions ● Some regions may not have enough conservation (e, g, V6, V3) ● Distance tables can become huge

Better than MSA: NW ● Needleman-Wunsch aligns two sequences globally ● Pairwise distances can be computed simultaneously ● Does not require reading a huge distance matrix ● Gives more accurate results

Pairwise alignments ● Are a combinatorial problem: ● (N · (n – 1) ) / 2 ● Needleman-Wunsch is expensive on sequence size ● Can take forever is not reduced to minimum needed ● Combined with a suitable clustering method can avoid computing distance matrix.

Reducing problem size ● Remove low-quality and low-information reads ● Remove reads containing ambiguous nucleotides (N) ● Eliminate reads with atypical sequence lengths ● If two sequences are identical or one is a subset of the other, they are combined and the frequency count is incremented ● Estimate distances among pairs with <0.10 distance ● Use k-mer distance of 0.5 for initial filtering.

Hierarchical clustering ● First sort pairwise distances in ascending order ● Process distances on the fly ● Classify clusters into active or inactive ● Active: not enough information to merge with other cluster ● Inactive: cluster with no information or already merged ● Gives same results as mothur clustering method

Calculations ● Observed species ● Rarefaction analysis ● CHAO1 ● ACE

OTUPIPE

About Otupipe ● Bash script ● Requires USEARCH and UCHIME ● Calculate OTUs from single-region experiments ● Designed for 454 sequencing ● Can be adapted for Illumina reads ● Appears to show higher error rates for 16S gene ● No effective denoising/error-correction solution has been published ● Increase MINSIZE

Basic usage ● Otupipe.bash input.file.fas outdir ● Creates outdir ● Writes chimeras.fa, otus.fa and readmap.uc ● readmap.uc – One line per read – Hit (chimera or out) – No match (new species or more likely an error) ● User settable parameters as environment variables – MINSIZE, PCTID_ERR, PCTID_OTU, PCTID_BIN

Practical usage ● Windows: use Cygwin ● Embed in shell scripts ● Process results programatically

What it does ● Remove duplicates ● Sort sequences by decreasing length ● Detect chimeras (UCHIME) ● Abundance ● Gold database ● Set chimeras aside ● Cluster chimeras ● Cluster remaining reads ● Generate readmap.uc

MOTHUR

A general tool ● Can do most common tasks ● In several ways ● Evolves rapidly ● Join the forum ● Trace changes ● Well documented ● function(help) ● Good tutorials

Denoising ● Sffinfo (get information on sff file) ● shhh.flows (PyroNoise) ● trim.seqs (select by properties as size, ambiguity, remove barcodes, primers...) ● unique.seqs (select unique sequences) ● screen.seqs (remove sequences aligning outside a desired range) ● filter.seqs (remove common gaps, trump, etc...) ● pre.cluster (merge sequences below threshold) ● chimera.uchime (remove chimeras using uchime) ● classify.seqs/remove.lineage (remove contaminants)

Multiple sequence alignments ● Use an external alignment in fasta format ● Use a reference guided alignment ● Kmer, blastn, suffix tree ● Pairwise alignment between candidate and de- gapped sequences (Needleman-Wunsch, Gotoh, blastn) ● Reinsert gaps (NAST) ● References: Greengenes, SILVA, user-provided

Cluster ● pre.cluster (collate reads with less than X changes) ● cluster.seqs (cluster reads by furthest, average or nearest neighbor) ● Hcluster (hierarchical clustering, very slow for average neighbor, good for furthest and nearest) ● Cluster.split (fastest, new, works by taxon level and should give same output as cluster.seqs)

Measures ● Large array of options ● OTUs and rarefactions ● Estimators (ACE, CHAO1, Shannon) ● Phylogeny ● Alpha and beta diversity (one or many groups) ● Venn diagrams ● Unifrac ● PCoA (Principal Component Analysis) ● NMDS (non-metric multidimensional scaling), etc...

Usage ● Command line ● Batch (mothur file) ● Parallel (processors=x) ● Distributed (MPI) ● See SOP in Mothur web site ● Monitor the web site ● Most versatile