Meta’omic functional profiling with ShortBRED Galeb Abu-Ali Curtis Huttenhower 08-14-15 Harvard T.H. Chan School of Public Health Department of Biostatistics.

Slides:



Advertisements
Similar presentations
Meta’omic functional profiling with HUMAnN Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon META Center.
Advertisements

Meta’omic functional profiling with HUMAnN Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon.
1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
Amplicon functional profiling with PICRUSt
Sahar Abubucker, Nicola Segata,
Computational metagenomics and the human microbiome Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Scaffold Download free viewer:
NGS Analysis Using Galaxy
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
1 P6a Extra Discussion Slides Part 1. 2 Section A.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Processing Lab 3 – Header issues and trace editing Bryce Hutchinson Objectives: Fixing elevation issues Define an LMO function Pick first breaks Kill traces.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower Harvard School of Public Health Department of Biostatistics.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
1 Installation Training Everything you need to know to get up and running.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Input data for analysis Users that have expression values (dataset 1_ chicken affy_foldchane.txt. can upload that file as shown in slide 30.
Download all complete prokaryotic genomes from the NCBI RefSeq database Extract 16S rRNA sequences from each genome. Use UCLUST algorithm to cluster 16S.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Meta’omic functional profiling with ShortBRED Curtis Huttenhower Harvard School of Public Health Department of Biostatistics U. Oregon.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
The Treatment-Naive Microbiome in New-Onset Crohn’s Disease Dirk Gevers, Subra Kugathasan, Lee A. Denson, Yoshiki Vázquez-Baeza, Will Van Treuren, Boyu.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
An Introduction to Meta’omic Analyses Curtis Huttenhower Galeb Abu-Ali Eric Franzosa Harvard T.H. Chan School of Public Health Department of Biostatistics.
Functional profiling with HUMAnN2
Using the bioBakery Curtis Huttenhower
Meta’omic functional profiling with ShortBRED
Metagenomic Species Diversity.
Regulatory Genomics Lab
Strain profiling with StrainPhlAn and PanPhlAn
An Introduction to Meta’omic Analyses
Functional profiling with HUMAnN2
Taxonomic profiling with MetaPhlAn2
Identifying personal microbiomes using metagenomic codes
Taxonomic profiling with MetaPhlAn2
Strain profiling with StrainPhlAn
Curtis Huttenhower Galeb Abu-Ali Eric Franzosa
BLAST.
Basic Local Alignment Search Tool
Volume 20, Issue 5, Pages (November 2014)
Quantification of antibiotic resistance marker and virulence factor abundances on subway surfaces. Quantification of antibiotic resistance marker and virulence.
Basic Local Alignment Search Tool (BLAST)
Regulatory Genomics Lab
Volume 20, Issue 5, Pages (November 2014)
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
A Presentation by Regina Strelecki
Introduction to RNA-Seq & Transcriptome Analysis
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Regulatory Genomics Lab
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Meta’omic functional profiling with ShortBRED Galeb Abu-Ali Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics

2 The two big questions… Who is there? (taxonomic profiling) What are they doing? (functional profiling)

3 (What we mean by “function”)

4 HUMAnN HMP Unified Metabolic Analysis Network Short reads + protein families Nucleotide pan-genome search Repeat for each metagenomic or metatranscriptomic sample A1A1 A2A2 A3A3 B1B1 B2B2 C1C1 C2C2 C3C3 Weight hits by %ID + coverage Sum over seqs. within family Adjust for sequence length Translated BLAST search

5 ? HUMAnN HMP Unified Metabolic Analysis Network Many millions of hits are collapsed into a few million gene families (UniRefs) (still a large number) Map genes to MetaCyc pathways Use MinPath (Ye 2009) to find simplest pathway explanation for observed genes Remove pathways unlikely to be present due to low organismal abundance Smooth/fill gaps Collapsing UniRef abundance into MetaCyc pathway abundance (or presence/absence) yields a smaller, more tractable feature set

What’s there: ShortBRED ShortBRED is a tool for quantifying protein families in metagenomes or metatranscriptomes –Short Better REad Dataset Inputs: –FASTA file of proteins of interest –Large reference database of protein sequences (FASTA or blastdb) –Metagenomes (FASTA/FASTQ nucleotide files) Outputs: –Short, unique markers for protein families of interest (FASTA) –Relative abundances of protein families of interest in each metagenome (text file, RPKM) Compared to BLAST (or HUMAnN), this is: –Faster –More specific 6 Jim Kaminski

What’s there: ShortBRED algorithm Cluster proteins of interest into families –Record consensus sequences Identify and common areas among proteins –Compared against each other –Compared against reference database –Remove all of these Remaining subseqs. uniquely ID a family –Record these as markers for that family 7

What’s there: ShortBRED marker identification 8 Prots of interest Reference database True Marker Junction Marker Quasi Marker Cluster into families Identify short, common regions

What’s there: ShortBRED family quantification 9 Metagenome reads ShortBRED markers Translated search for high ID hits Normalize relative abundances

What’s there: ShortBRED’s fast 10 Six synthetic metagenomes from GemSim, spiked with known proteins of interest: ARDB = Antibiotic Resistance VFDB = Virulence Factors Six synthetic metagenomes from GemSim, spiked with known proteins of interest: ARDB = Antibiotic Resistance VFDB = Virulence Factors

What’s there: ShortBRED’s accurate 11 Six synthetic metagenomes from GemSim, spiked with known proteins of interest: ARDB = Antibiotic Resistance VFDB = Virulence Factors Six synthetic metagenomes from GemSim, spiked with known proteins of interest: ARDB = Antibiotic Resistance VFDB = Virulence Factors

Setup notes reminder Slides with green titles or text include instructions not needed today, but useful for your own analyses Keep an eye out for red warnings of particular importance Command lines and program/file names appear in a monospaced font. Commands you should specifically copy/paste are in monospaced bold blue. 12

What’s there: ShortBRED ShortBRED is available athttp://huttenhower.sph.harvard.edu/shortbredhttp://huttenhower.sph.harvard.edu/shortbred 13 You could download ShortBRED by clicking here

From the command line... But don’t! –Instead, we’ve installed ShortBRED already for you You can create your own virtual copy by running: ln -s /class/stamps-software/biobakery/shortbred/ To see what you can do, run: module unload stamps module load bioware./shortbred/shortbred_identify.py -h | less -S./shortbred/shortbred_quantify.py -h | less -S 14

Getting some annotated protein sequences Go to 15 You could download the ARDB protein sequences here

From the command line... But don’t! –Instead, we’ve downloaded the important file for you Take a look by running: less /class/stamps-shared/biobakery/data/resisGenes.pfasta 16

Getting some reference protein sequences Go to 17 You could download the MetaRef protein sequences here

Running ShortBRED-Identify But don’t! –We’ll use an example mini reference database for speed Lets make some antibiotic resistance markers by running:./shortbred/shortbred_identify.py \ --goi /class/stamps-shared/biobakery/data/resisGenes.pfasta \ --ref./shortbred/example/ref_prots.faa \ --markers ardb_markers.faa less ardb_markers.faa –This should take ~5 minutes If you get bored waiting, kill it and copy: /class/stamps-shared/biobakery/results/shortbred/ardb_markers.faa –It will produce lots of status output as it runs 18

ShortBRED markers 19 True Markers at the top

ShortBRED markers 20 Junction/Quasi Markers at the bottom

Running ShortBRED-Quantify Using your existing HMP data subset, you can search for antibiotic resistance proteins in the oral cavity by running:./shortbred/shortbred_quantify.py \ --markers ardb_markers.faa \ --wgs SRS Buccal_mucosa.fasta \ --results SRS Buccal_mucosa-ARDB.txt less SRS Buccal_mucosa-ARDB.txt –This should take just a few seconds –It will again produce lots of status output as it runs 21

ShortBRED marker quantification 22 RPKMs and raw hit count Other columns are family name and total AAs among all family makers Sort table (head -n 1; sort -k 2,2 -n -r) < \ SRS Buccal_mucosa-ARDB.txt | less

AR proteins in the human gut That’s boring! Let’s get some real data scp the file to your own computer: /class/stamps-shared/biobakery/data/shortbred_ardb_hmp_t2d.tsv This is the result of running: –ShortBRED-Identify on the real ARDB + reference –ShortBRED-Quantify on the real HMP + T2D data (Qin Nature 2014) –Summing each sample’s RPKMs for families in each ARDB resistance class 23

AR proteins in the human gut 24

What it means: LEfSe Visit LEfSe at: 25 First click here

What it means: LEfSe Then upload your formatted table –After you upload, wait for the progress meter to turn green! Click here, browse to shortbred_ardb_hmp_t2d.tsv 2. Then here 3. Then watch here

What it means: LEfSe Then tell LEfSe about your metadata: Click here 2. Then select Dataset 4. Then SampleID 5. Then click here 3. Then Gender

What it means: LEfSe Leave all parameters on defaults, and run LEfSe! –You can try playing around with these parameters if desired Click here 2. Then GO!

What it means: LEfSe You can plot the results as a bar plot –Again, lots of graphical parameters to modify if desired Click here 2. Then here

What it means: LEfSe In Galaxy, view a result by clicking on its “eye” 30 Click here

What it means: LEfSe 31

What it means: LEfSe There’s no really any reason to plot a cladogram –Although it will work! But you can see the raw data for individual biomarkers –These are generated as a zip file of individual plots Click here 3. Then here 2. Then selected your formatted data here

What it means: LEfSe In Galaxy, download a result by clicking on its “disk” 33 Click here Then here

What it means: LEfSe 34 Tet. Ribosomal Blockers Aminoglycoside Acetyltransferases Tetracycline Efflux Pumps

Summary HUMAnN2 (up next!) –Quality-controlled metagenomic reads in –Tab-delimited gene, module, and pathway relative abundances out ShortBRED –Raw metagenomic reads, Proteins of interest, and Protein reference database in –Tab-delimited gene family rel. abundances out 35

Alex Kostic Levi Waldron Human Microbiome Project 2 Lita Procter Jon Braun Dermot McGovern Subra Kugathasan Ted Denson Janet Jansson Ramnik Xavier Dirk Gevers Jane Peterson Sarah Highlander Barbara Methe Joseph Moon George Weingart Tim Tickle Xochitl Morgan Daniela Boernigen Emma Schwager Jim Kaminski Afrah Shafquat Eric Franzosa Boyu Ren Regina Joice Koji Yasuda Bruce Birren Chad Nusbaum Clary Clish Joe Petrosino Thad Stappenbeck Tiffany Hsu Kevin Oh Thanks! Randall Schwager Chengwei Luo Keith Bayer Moran Yassour Human Microbiome Project Karen Nelson George Weinstock Owen White Alexandra Sirota Galeb Abu-Ali Ali Rahnavard Soumya Banerjee Interested? We’re recruiting postdoctoral fellows! Rob Knight Greg Caporaso Jesse Zaneveld Rob BeikoMorgan Langille

39 HUMAnN accuracy Validated against synthetic metagenome samples (similar to MetaPhlAn validation) Gene family abundance and pathway presence/absence calls beat naïve best-BLAST-hit strategy

40 HUMAnN in action Franzosa et al. PNAS 11:E (2014)