From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to Useful Visual This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA C Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government. Distribution Statement A: Approved for public release; distribution is unlimited.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ The Opportunity NGS instruments have recently given us the ability to characterize the microbiomes that we live in and that live in us. We can get a step closer to this goal by creating a visualization program that facilitates manual data curation by a human.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Your Mission Invent novel visualization approaches to represent metagenomic data. Subgoals: Pick out anomalies within a given dataset. Generate time series representation of multiple datasets. Compress data efficiently to allow visualization of huge datasets.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Metagenomic datasets (FASTQ format) from clinical and environmental samples. Metagenome of the human oral cavity under healthy and diseased conditions, with a focus on supragingival dental plaque and cavities. –“oral_healthy” and “oral_diseased” datasets –Roche 454 Nose/throat swab from Nicaraguan child with acute respiratory illness –“nicaragua” dataset –Illumina The Data (I)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Skin surface from the palm of a human hand –“palm” dataset –Roche 454 Human abscess sample of unknown etiology –“abscess” dataset –Illumina Cultivated corn soil metagenome –“soil” dataset –Illumina The Data (II)
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Our Processing Pipeline Raw FASTA reads BLAST against virus, bacteria, and archaea databases (from GenBank) Data Processing Parsed CSV summary of BLAST hits BLAST hits sorted by species, FASTA format Other BLAST parsers Data is available from each stage of the processing pipeline
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Parsed BLAST File Example for a Single Hit S _ Query Name + Query Strand 1 Query Start 232 Query End Neisseria meningitidis Query Organism Bacteria; Proteobacteria; Betaproteobacteria; Query Taxonomy 232 Identities 100 Percent 0 Number Gaps 0 Number Characters GU Target Name - Target Strand 47 Target Start 278Target End Neisseria subflava Target Organism Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria.Target Taxonomy CTGGGCCGTGTCTCAGTCCCAGTGTGGC Query Sequence CTGGGCCGTGTCTCAGTCCCAGTGTGGC Target Sequence BLASTN Analysis Program bacteria.gdna Database
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Your Open-Source Toolkit MEGAN4 IMG/IM KRONA (included with PhymmBl) MG-RAST METAREP Mothur Feel free to use any additional tools you think are useful.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MEGAN4-MEtaGenomoe ANalyzer A simple lowest common ancestor algorithm assigns reads to taxa. Taxonomic level reflects the degree of conservation of a sequence. Dissects large datasets without assembly or the targeting of specific phylogenetic markers. Graphical and statistical output for comparing different datasets.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MEGAN4-MEtaGenomoe ANalyzer Oral Diseased Bacteria Oral Healthy Bacteria Oral Diseased Virus Oral Healthy Virus
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MEGAN4-MEtaGenomoe ANalyzer Oral healthy Vs. Oral diseased Bacteria Oral healthy Vs. Oral diseased Virus
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Web interface: IMG/IM – Integrated Microbial Genomes with Microbial Samples source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ IMG/IM Phylogenetic Distribution of Genes Based on Distribution of BLAST Hits source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ IMG/M Abundance Profile Overview source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ KRONA allows hierarchal data to be explored with zoomable pie-charts. –Excel template or KRONA tools. –Support for several bioinformatics tools and raw data formats. KRONA source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MG-RAST Oral Diseased source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MG-RAST Oral Healthy source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ MG-RAST Oral Diseased Oral Healthy source:
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ A Web 2.0 application to analyze and compare annotated metagenomic datasets. Compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Statistical tests, multidimensional scaling, heatmap and hierarchal clustering plots. JCVI Metagenomics Reports (METAREP) source: Heatmap Plot Hierarchical Clustering Plot METASTAT Results
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ A single platform for sequence alignment, pairwise distance calculation, distance matrix analysis. Venn diagrams, community trees, heat maps, sample-based rarefaction curves. Mothur: 16S rRNA Sequence Analysis