Download presentation
Presentation is loading. Please wait.
Published byIra Freeman Modified over 8 years ago
1
ESPRIT
2
Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one level up in the hierarchy ● Assignment is as accurate as possible ● Species detail is lost ● Not good enough to measure genetic diversity and species richness
3
MOTU / OTU ● Molecular operational taxonomic unit ● Operational Taxonomic Unit ● 3%species ● 5%genus ● 10%phylum ● Controversial ● Practical ● As long as you remember there is no real association
4
Computing OTUs ● Measure distance among two sequences ● If < cutoff ● They belong to the same group (out) ● If > cutoff ● They belong in different groups ● Each sequence must be checked against all others ● Requires a distance matrix ● Distances are calculated by sequence comparison
5
Multiple Sequence Alignment ● Slow ● New developments: MAFFT, MUSCLE, CLUSTAL-OMEGA ● Slow for hundreds of thousands of sequences ● MSA leads to inflated estimates ● Arguable results for 16S hypervariable regions ● Some regions may not have enough conservation (e, g, V6, V3) ● Distance tables can become huge
6
Better than MSA: NW ● Needleman-Wunsch aligns two sequences globally ● Pairwise distances can be computed simultaneously ● Does not require reading a huge distance matrix ● Gives more accurate results
7
Pairwise alignments ● Are a combinatorial problem: ● (N · (n – 1) ) / 2 ● Needleman-Wunsch is expensive on sequence size ● Can take forever is not reduced to minimum needed ● Combined with a suitable clustering method can avoid computing distance matrix.
8
Reducing problem size ● Remove low-quality and low-information reads ● Remove reads containing ambiguous nucleotides (N) ● Eliminate reads with atypical sequence lengths ● If two sequences are identical or one is a subset of the other, they are combined and the frequency count is incremented ● Estimate distances among pairs with <0.10 distance ● Use k-mer distance of 0.5 for initial filtering.
9
Hierarchical clustering ● First sort pairwise distances in ascending order ● Process distances on the fly ● Classify clusters into active or inactive ● Active: not enough information to merge with other cluster ● Inactive: cluster with no information or already merged ● Gives same results as mothur clustering method
10
Calculations ● Observed species ● Rarefaction analysis ● CHAO1 ● ACE
11
OTUPIPE
12
About Otupipe ● Bash script ● Requires USEARCH and UCHIME ● Calculate OTUs from single-region experiments ● Designed for 454 sequencing ● Can be adapted for Illumina reads ● Appears to show higher error rates for 16S gene ● No effective denoising/error-correction solution has been published ● Increase MINSIZE
13
Basic usage ● Otupipe.bash input.file.fas outdir ● Creates outdir ● Writes chimeras.fa, otus.fa and readmap.uc ● readmap.uc – One line per read – Hit (chimera or out) – No match (new species or more likely an error) ● User settable parameters as environment variables – MINSIZE, PCTID_ERR, PCTID_OTU, PCTID_BIN
14
Practical usage ● Windows: use Cygwin ● Embed in shell scripts ● Process results programatically
15
What it does ● Remove duplicates ● Sort sequences by decreasing length ● Detect chimeras (UCHIME) ● Abundance ● Gold database ● Set chimeras aside ● Cluster chimeras ● Cluster remaining reads ● Generate readmap.uc
16
MOTHUR
17
A general tool ● Can do most common tasks ● In several ways ● Evolves rapidly ● Join the forum ● Trace changes ● Well documented ● function(help) ● Good tutorials
18
Denoising ● Sffinfo (get information on sff file) ● shhh.flows (PyroNoise) ● trim.seqs (select by properties as size, ambiguity, remove barcodes, primers...) ● unique.seqs (select unique sequences) ● screen.seqs (remove sequences aligning outside a desired range) ● filter.seqs (remove common gaps, trump, etc...) ● pre.cluster (merge sequences below threshold) ● chimera.uchime (remove chimeras using uchime) ● classify.seqs/remove.lineage (remove contaminants)
19
Multiple sequence alignments ● Use an external alignment in fasta format ● Use a reference guided alignment ● Kmer, blastn, suffix tree ● Pairwise alignment between candidate and de- gapped sequences (Needleman-Wunsch, Gotoh, blastn) ● Reinsert gaps (NAST) ● References: Greengenes, SILVA, user-provided
20
Cluster ● pre.cluster (collate reads with less than X changes) ● cluster.seqs (cluster reads by furthest, average or nearest neighbor) ● Hcluster (hierarchical clustering, very slow for average neighbor, good for furthest and nearest) ● Cluster.split (fastest, new, works by taxon level and should give same output as cluster.seqs)
21
Measures ● Large array of options ● OTUs and rarefactions ● Estimators (ACE, CHAO1, Shannon) ● Phylogeny ● Alpha and beta diversity (one or many groups) ● Venn diagrams ● Unifrac ● PCoA (Principal Component Analysis) ● NMDS (non-metric multidimensional scaling), etc...
22
Usage ● Command line ● Batch (mothur file) ● Parallel (processors=x) ● Distributed (MPI) ● See SOP in Mothur web site ● Monitor the web site ● Most versatile
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.