Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle.

Slides:



Advertisements
Similar presentations
NIMAC 2.0: The Accessible Media Producer Portal NIMAC 2.0 for AMPs.
Advertisements

Intro to Access 2007 Lindsey Brewer CSSCR September 18, 2009.
A complete citation, notecard, and outlining tool
Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle.
Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle.
UNIT 12 LO4 BE ABLE TO CREATE WEBSITES Cambridge Technicals.
Version Control System (Sub)Version Control (SVN).
YOUR LOGO HERE YOUR LOGO HERE Amy Brink Comparing caTissue Plus to caTissue 1.3.6A Amy Brink March 5 th, 2014.
Introduction to Excel 2007 Part 2: Bar Graphs and Histograms February 5, 2008.
Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle.
CIS101 Introduction to Computing
Welcome to the Turnitin.com Instructor Quickstart Tutorial ! This brief tour will take you through the basic steps teachers and students new to Turnitin.com.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
RIMS II Online Order and Delivery System Tutorial on Downloading and Viewing Multipliers.
Give it, Live it, Doing the Pivot Louise Cape, James Colombo, Katie McDonald, Tammy Rowland.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Creating a Web Page HTML, FrontPage, Word, Composer.
Wikispaces in Education Tutorial Jennifer Carrier Dorman
Google Training By: Amy Shannon and Dave Auwerda.
Training Course 2 User Module Training Course 3 Data Administration Module Session 1 Orientation Session 2 User Interface Session 3 Database Administration.
1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (
 First time student activates their google account, they need to go to an internet browser and go to  drive.google.com/a/students.macon.k12.nc.usdrive.google.com/a/students.macon.k12.nc.us.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
SqlReports Dean Dahlvang PSUG-MO March About Dean Dean Dahlvang Director of Administrative Technology for the Proctor.
Gadgets & More…. “Date Range” Gadgets Allows you to choose a specific date, before or after a date or a range of dates using the Workflows calendar.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lab 4 Using Solver, Linking Workbooks,
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
Kimberly Otos FACS Instructor Mandan High School WEEBLY 101.
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
← Select Exchange Once logged in. ↓ click Join Course Icon.
CPS120: Introduction to Computer Science Compiling Your Programs Using Visual C++
PowerTeacher with Web Grade Book Semester Classes School Year August 2012.
1 TEISS Case Project Introduction Melinda Ronca-Battista and Angelique Luedeker ITEP/TAMS Center.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Forms and Server Side Includes. What are Forms? Forms are used to get user input We’ve all used them before. For example, ever had to sign up for courses.
Using PTOManager.co m to create a Student Directory May 4, 2009 L.P.S. VIPS Meeting.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
XP New Perspectives on Integrating Microsoft Office XP Tutorial 3 1 Integrating Microsoft Office XP Tutorial 3 – Integrating Word, Excel, Access, and PowerPoint.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Workflow: Content (write) Acquisition (record) Produce (edit) Author (compress/export as mp3) Add meta data (ID3 tags) Distribute (ftp to moodle)
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
RDP – Capturing the Unclassified Use only on data that can be publicly shared. These are not secure tools.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Wikispaces in Education Tutorial ESA, Region 2 Mary Teply Marge Hauser.
Paper 2 Exam Tips Guidance: 1.Evidence Document 2.Unit 9: – Communication ( ) 3.Unit 10: - Document Production (Word) 4.Unit 16: PowerPoint 5.Unit.
Creating Web Pages with Links, Images, and Embedded Style Sheets
For Datatel and other applications Presented by Cheryl Sullivan.
Canadian Bioinformatics Workshops
1.Switch on the computer and wait for loading. 2.Select the Windows 7 OS at the end of the list. 3.Click on the link ‘Administrator’ 4.Enter the administrator.
1.Switch on the computer and wait for loading. 2.Select the Windows 7 OS at the end of the list. 3.Click on the link ‘Administrator’ 4.Enter the administrator.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
with a few tips and tools for managing mail
IUIE Reporting Basics Workshop
Training Documentation – Replacing GSPR with RFQ 2.0
Microsoft Excel.
Single Sample Registration
Using Excel with Google Maps
Tutorial for using Case It for bioinformatics analyses
NIMAC for Publishers & Vendors: Delivering Files
Multi-host Internet Access Portal (MIAP) Enhancement Guide
Tutorial Tutorial Read all the directions before proceeding
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
Getting started – Example 1
Spreadsheets and Data Management
Presentation transcript:

Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle

Genboree 16S Workshop Learning Objectives – Students should be able to take.sff files and user supplied information and produce: Metadata File PCoA Classification Distribution Expectations – Apply topics learned today before next meeting – Be able to discuss where issues arise – Be able to move knowledgeably through the whole Genboree Workflow

Genboree 16S Workshop Part II Learning Outcomes – Newer database version of RDP – How to take advantage? – Students should take user.sff files and user created metadata file and produce: (I can provide files if needed.) PCoA (QIIME) Classification Distribution (RDP) Expectations – Apply topics learned in tutorial – Be able to discuss where in the process issues arose – Have a hypothesis about your data issues if they happen

Workshop Outline 16S Metadata File Genboree Workbench Workflow – Account – Group – Database – Project – Loading your files/samples/sequences (and linking) – QIIME – RDP – How to get help Wrap Up and Preparation for 2 nd Installment

Resources Genboree Home Screen – Tutorials are located in the Genboree Commons – You must be signed in to open the following link – – Tutorial 1 Data Set: ce_file.sff.gz ce_file.sff.gz – Tutorial 2 Data Set: Projects are accessed through the Genboree Workbench

16S What is it? What part is being sequenced? – Here? – Elsewhere? How is this accomplished? – DNA to bead to light – Intro. to flow data and.sff file content – OUTPUT is an.sff file – Aside on zipping methods and large file transfers

Allmetrics.net Sales Material Tortoli E Clin. Microbiol. Rev. 2003;16: What is it? 16Svedberg (small sub-unit of the ribosome) What part is being sequenced? Here? - TCMC sequences the V5-V3 by 454 Elsewhere? - V3-V5, V1-V3, V9, V7-V9…many more. Know your variable regions 16S

How is this accomplished? – DNA to bead to light Life Sciences Sales Materials

16S How is this accomplished? – DNA to bead to light Life Sciences Sales Materials

16S How is this accomplished? – DNA to bead to light – Intro to flow data and sff file content – OUTPUT is an.sff file – Standard Flowgram Format All reads are structured as linker-tag-primer Provides both identity and quality information Allmetrics.net Sales Material

Genboree Workflow Take one step back from the Genboree Workflow and talk about input files. What do you do with your files? From: Genboree.org help files Meta- data.sff

Genboree Workflow What do you do with many files? Genboree takes.zip,.gzip,.txt, and.sff files – Compressed files are easier and faster to move – Multiple files are easier to move when compressed together in an archive Meta- data.sff.sff(s) should be archived and compressed. Meta data files are very small and do not need compression. Meta- data

Metadata Files What data must you have? How should it be formatted for Genboree? What can you include? How to make it tab-delimited Include variable region or primer? Directional awareness on primers

Metadata Files What data must you have? – name – barcode – region or proximal & distal – First column must begin with # – #No_spaces_are_allowed_in_column_names_ How should it be formatted for Genboree? – Tab delimited What can you include? How to make it tab-delimited? Include variable region or primer? Directional awareness on primers

Metadata Files How to determine which to include - variable region or primers Directional awareness on primers Demo of making and saving as tab delimited #namebarcodeproximaldistalregionbody_site S_ CCGTTCCTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ ACCGGCGTTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ ACGAATTAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ AACCGGATACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ AACGGAACGCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool T_ AATAACCGTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ TTAATGGAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ CGGACCGGAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ CCGAACGACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ TTCGTTCTTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat or

#namebarcodeproximaldistalregionbody_site S_ CCGTTCCTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ ACCGGCGTTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ ACGAATTAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ AACCGGATACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ AACGGAACGCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool T_ AATAACCGTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ TTAATGGAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ CGGACCGGAACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ CCGAACGACCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat T_ TTCGTTCTTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Throat Metadata Files - Demo Select the data above and Copy. Paste into Excel or an open source spreadsheet program. Be sure all entries are free of spaces and special characters and that all samples have the same number of columns. Avoid the column titles "state" and "type". Save As and select tab-delimited. Name your file in a clear and consistent manner. or

Metadata Files How to determine variable region vs. primer inclusion Directional awareness of primers If you aren’t sure, ask! What are these files often called: mapping, metadata, oligos, or linker-primer file. (Many others possible.) #namebarcodeproximaldistalregionbody_site S_ CCGTTCCTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool S_ ACCGGCGTTCCCGTCAATTCMTTTRAGTCTGCTGCCTCCCGTAGGV3V5Stool Allmetrics.net Sales Material

Metadata Files Another example: Tutorial Set 2 Metadata What possible issues may arise with this metadata file? sampleNametagproximaldistalregionsample_periodtype Ferm_5AGCTTCGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V35Fermentation Ferm_2GCCATACATTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V32Fermentation Ferm_3GCCAGCAAGTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V33Fermentation Ferm_4CGTTAAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V34Fermentation Ferm_1CTAACAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V31Fermentation Soil_1ACGCAAAAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V31Soil Soil_2CTAACTAAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V32Soil Soil_3GCGACCTAGTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V33Soil Soil_4AAGAATCAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V34Soil Soil_5AGCGCAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V35Soil

Metadata Files Another example What possible issues may arise with this metadata file? Change name => #name (or any #1 st entry) Change tag => barcode Change type => sample_type (do not name columns ‘type’ or ‘state’) Demo. making and saving as tab-delimited #namebarcode proximaldistalregionsample_period sample_type Ferm_5AGCTTCGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V35Fermentation Ferm_2GCCATACATTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V32Fermentation Ferm_3GCCAGCAAGTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V33Fermentation Ferm_4CGTTAAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V34Fermentation Ferm_1CTAACAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V31Fermentation Soil_1ACGCAAAAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V31Soil Soil_2CTAACTAAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V32Soil Soil_3GCGACCTAGTGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V33Soil Soil_4AAGAATCAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V34Soil Soil_5AGCGCAGAGAGTTTGATCNTGGCTCAGCAGCMGCCGCNGTAANACV1V35Soil

7zip Zipping methods and large file transfers Compression and archiving of files Uncompressing in an easy to use format for PCs Demo compressing –.sff (s) – From: 7-zip.org

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Genboree URL: Workbench and Commons Differences Account – How to create your account? – ublic-commons?faq_id=493 ublic-commons?faq_id=493 Workshop Home – march march-2014

Workbench Where is it? Create a Group - Demo – Why? To serve as a project base – How to share it with others? – commons?faq_id=494 commons?faq_id=494 Create a Database - Demo – Why? To hold processed and pre-processed files – Using folders to organize the space – commons?faq_id=491 commons?faq_id=491 Create a Project - Demo – Why? To have a record of the major level processes that you have used on your data – Importance of tracking information for multiple users in a group – commons?faq_id=492 commons?faq_id=492

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Upload Files What to import (upload) – Meta data –.sff (s) – Can both meta data and sffs be in one file? No - upload them separately..sffs will need unpacking while meta data files will need converting. Shortcutting this step can cause odd problems down the line. Importing files and choosing to extract will cause the system to queue the process. The process may take a few moments. Now that I have it uploaded…How to edit and remove files? - Demo

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Create Samples (Import) Import samples singly or in multiples – Creating and adding samples to a set – Import Behavior – Assign samples to a set What is a sample set? – Why use them? Grouping for downstream analysis Makes Genboree use faster on user (don’t have to move each file around) Editing sample information

Create Samples (Import) Import samples singly or in multiples: Demo – Creating and adding samples to a set Input Window: Metadata file Output Window: Target Database Data> Samples & Sample Sets> Samples> Import Samples Double check your Input, Target, and Settings – Import Behavior – Create New Record – Keep Existing – Merge and Update  Use this one by default – Replace Existing – Assign Samples to new Sample Set Name the folder or leave blank to not create a set Can be added to a set later

Create Samples (Import) What is a sample set? – Why use them? Grouping for downstream analysis Makes Genboree use faster on user (don’t have to move each file around) Editing sample information – What isn’t possible (right now)? Editing column titles Adding single samples de novo

Sample Set Management Demo. adding samples to a sample set – Input Window: Sample to be added – Output Window: Target Sample Set – Data> Samples & Sample Sets> Sample Sets> Add Sample to Sample Set Demo. editing Sample (or Sample Set) data – Input Window: Sample to be edited – Output Window: Blank – Data> Samples & Sample Sets> Samples> Edit Samples This is important for later stages – Makes Sequence Import easier and cleaner

Sample Set Management Editing Sample (or Sample Set) data – Move boxes before saving or you will lose your edit.

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Link Samples to Sequence Files Sample file linker tool – The name is opposite the file positions required. Arrangement in the Input Window: –.sff Sample Set or –.sff Sample –.sff Sample –.sff Sample Output Window: Empty Demo. how to do it and how to check it has been done.

Link Samples to Sequence Files How to check your linked files? – The prompt screen on linking – The when complete – The Sample Edit tool – look for fileLocation column. – Demo. looking at linked fileLocation Input Window: Sample to be edited Output Window: Blank Data> Samples & Sample Sets> Samples> Edit Samples

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Sequence Import Choose one or more samples to load sequences – Input Window: Sample(s) or Sample Set – Output Window: Target Database – Metagenome> Data Initialization> Import 16S rRNA Sequences Check quality of import Fixing the files when something has gone wrong – When it is possible? – When to start over? Download files from Genboree

Sequence Import Choose one or more samples to load sequences – Demo. – Input Window: Sample(s) or Sample Set – Output Window: Target Database – Metagenome> Data Initialization> Import 16S rRNA Sequences

Sequence Import Check quality of import

Sequence Import Fixing the files when something has gone wrong

Sequence Import Fixing the files when something has gone wrong – When it is possible? Bad barcode? Sample info. wrong? – Primers – Region – Direction Bad file? – When to start over?

Sequence Import Download files from Genboree Click on file In Details Window, choose Download Start with – sequences_metrics_ summary.xls – Easy to open – No compression

Sequence Import When problems arise, check the: – sample.metadata – Does it match what you put in? – fasta.result.tar.gz – Look at the.fasta files See barcodes See primers Notepad for metadata Bioedit to open fasta – Use WINE on Mac

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Break

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Data Analysis - QIIME How to select samples for analysis Chimera removal and why you should be thinking about it Output – downloading and organization – making sense of the files

Data Analysis - QIIME How to select samples for analysis

Data Analysis - QIIME – Selecting samples for analysis INPUT = One or more Sequence Import folders – All should be of the same variable region; ideally produced with the same primer and sequencing direction OUTPUT Targets = Your database (required), your project (optional)

Data Analysis - QIIME Caveats: All samples in your input folder will be analyzed – This includes no-template controls and positive controls – The % variation explained by you PCoA may be influenced by the inclusion of these samples QIIME on Genboree is not currently set up to allow users to subsample their data – This can be problematic if sequencing depth varies substantially across samples – It does however perform a “rounding up” normalization step

A bit about sequencing depth How deep should you go? There is no good answer Strong biological patterns can be detected with low sequencing depth – 10s to 100s of sequences can sometimes be enough – 1000s tend to be the norm Subtle biological patterns tend to require greater sequencing depth for detection Sequencing depth can be dictated by: – Sample quality – The number of samples placed on a run – Project budget Kuczynzski et al Nature Methods 7:

Unequal sequencing depth What’s the problem? Being certain that you are seeing the full view (…or at least equivalent glimpses of the) of your communities

Unequal sequencing depth What’s the problem? Unequal depth Avg Red = 5995 seqs Avg Blue = seqs Same data set Sampled are colored by library size Red ~4000 Orange ~5000 Yellow ~6000 Green 8,000-10,000 Blues 11,000-17,000

Unequal sequencing depth What’s the problem? Unequal depth Avg Red = 5995 seqs Avg Blue = seqs Equal depth All libraries were sub-sampled to ~4000 reads.

Data Analysis - QIIME Chimera removal and why you should be thinking about it – What is a chimeric sequence? – How frequently do they occur? – An example from real data – Why should you think about chimeras? – How to screen for chimeras using Genboree

What is a Chimeric Sequence? – In Greek mythology: A creature that was an amalgam of multiple animals Body of a lion, head of a goat, tail resembling a snake – In your sequence data: The combination of multiple sequences during PCR to create a hybrid – In sequence databases: A not-so-small nightmare of junk data Mis-annotation Enhanced “discovery” of novel organisms Chimera generation figure from: Haas et al. 2011, Genome Research 21:

How frequently do chimeras occur? – Schloss et al 2011: With mock communities of known composition: ~8% of raw sequences were chimeric Incidence increased with sequencing depth – Approaches for detection: Multiple algorithms available Genboree uses ChimeraSlayer – How it works: The ends of each read (~30% of total length) are compared to a chimera-free reference database Potential “parent” sequences are identified Identity of potential chimera to in silico chimera evaluated Schloss et al PLoS ONE 6(12):e27310 AATCGCGACCTGTTTAACCGTAGGTC AAACGCTTACGGAGCTACACGAGTC Query Parent 1 Parent 2 AATCGCGACCTGTGCTACACGGGTA AATCGCGACCTGTTTAACCGTAGGTC AAACGCTTACGGAGCTACACGGGTA Query Parent 1 Parent 2 Likely Chimera Non-chimera

An example from real data Chimeric alignment from: Haas et al. 2011, Genome Research 21: Alignment of chimeric sequences derived from Streptococcus (top, red) and Staphylococcus (bottom, black) Sequences were generated from 4 replicate PCR reactions/454 runs of V3V5 sequence

Why should you think about chimeras? – Spurious results Artificially increases estimates of richness and diversity You may discover a “new” (but fake) species – Should you trust all flagged chimeras? Most people do but….buyer beware False-positive rates are in the 1-4% range Some taxa are poorly represented in reference databases Prevotella and Acinetobacter are known to produce false-positive results in ChimeraSlayer – How to verify (digging in to your QIIME output) Obtain representative sequence(s) and verify their identity (e.g., BLAST vs. NCBI nt database, RDP SeqMatch) Sogin et al 2006 PNAS 103:

How to screen chimeras in Genboree – Run a QIIME job INPUT = Sequence Import folder OUTPUT Targets = Your database (required), your project (optional)

How to screen chimeras in Genboree – Select “Remove Chimeras” in the Tool Settings dialogue box Provide a study name Provide a job name (TIP: add chimeras_removed to you job name so that your output reflects that you selected this option) Click SUBMIT

Data Analysis - QIIME Output – downloading and organization – making sense of the files

How do I get my files out? – Entire folders can be archived/downloaded INPUT = Folder to be archived OUTPUT = Database to house archive

How do I get my files out? – Entire folders can be archived/downloaded Provide and archive name Choose your compression type Decide if you want the directory structure to be preserved SUBMIT

How do I get my files out? – Single files, including archives, can be downloaded one by one Click on your file of interest in the DATA SELECTOR window Click on the “Click to Download File” link in the DETAILS window Save the file to your computer or storage drive Most file types will require decompression

QIIME – making sense of the files – fasta.result.tar.gz – jobFile.json – mapping.txt – otu.table – phylogenetic.result.tar.gz – plots.result.tar.gz – raw.results.tar.gz – repr_set.fasta.ignore – sample.metadata – settings.json – taxonomy.result.tar.gz

QIIME – making sense of the files – fasta.result.tar.gz: multiple sequence alignment of your representative sequences file. Rep seqs = representative sequence for each OTU. – jobFile.json: a log of the settings used by Genboree to run your analysis – mapping.txt: a QIIME-compatible metadata file, includes barcode information – otu.table: a spreadsheet of OTU by sample distributions – phylogenetic.result.tar.gz: a phylogenetic tree of your rep seqs, additional files required for iTOL – plots.result.tar.gz: figures, html files for all PCoA plots produced in your QIIME run – raw.results.tar.gz: mapping file, otu table, rep seqs file, distance matrices underlying all PCoA calculations – repr_set.fasta.ignore: RDP classification (with confidence scores) of each rep seq – sample.metadata: like the mapping.txt file, with additional file locations for Genboree – settings.json: similar to the jobFile.json file – taxonomy.result.tar.gz: taxonomic summaries (per sample, at the Kingdom, Phylum, Class, Order, Family, and Genus levels)

Genboree Workflow Create Group Create Database Create Project Upload Files  Create Samples (Sample Import using metadata file)  Link Samples to Sequence Files (Sample File Linker)  QC and Attach Sequences (Sequence Import)  QIIME    RDP 

Data Analysis - RDP How to select samples Output – Downloading and organization – making sense of the files

Data Analysis - RDP – Selecting samples for analysis INPUT = One or more Sequence Import folders – All should be of the same variable region; ideally produced with the same primer and sequencing direction OUTPUT Targets = Your database (required), your project (optional)

Data Analysis - RDP Caveats: All samples in your input folder will be analyzed – This includes no-template controls and positive controls RDP on Genboree does not pre-filter for chimeric sequences RDP on Genboree is not currently set up to allow users to subsample their data – Depending on your application, this may be problematic if sequencing depth varies substantially across samples – It does however perform a “rounding up” normalization step and presents data on a relative abundance basis

How do I get my files out? – Entire folders can be archived/downloaded INPUT = Folder to be archived OUTPUT = Database to house archive

How do I get my files out? – Entire folders can be archived/downloaded Provide and archive name Choose your compression type Decide if you want the directory structure to be preserved SUBMIT

How do I get my files out? – Single files, including archives, can be downloaded one by one Click on your file of interest in the DATA SELECTOR window Click on the “Click to Download File” link in the DETAILS window Save the file to your computer or storage drive Most file types will require decompression

RDP – making sense of the files – domain.result.tar.gz – phylum.result.tar.gz – class.result.tar.gz – order.result.tar.gz – family.result.tar.gz – genus.result.tar.gz – sample.metadata – settings.json – count.result.tar.gz – count.xlsx – count_normalized.xlsx – weighted.xlsx – weighted_normalized.xlsx – png.result.tar.gz

RDP – making sense of the files – domain.result.tar.gz – phylum.result.tar.gz – class.result.tar.gz – order.result.tar.gz – family.result.tar.gz – genus.result.tar.gz – sample.metadata – settings.json – count.xlsx – count_normalized.xlsx – weighted.xlsx – weighted_normalized.xlsx – png.result.tar.gz Per sample summaries at various taxonomic levels, including raw counts and weighted values Per sample summaries at various taxonomic levels, raw counts or relative abundances (normalized) All of the plots produced during your run (e.g., heatmaps, stacked bar graphs) Per sample summaries at various taxonomic levels, weighted by confidence of ID assignments (raw counts or normalized)

Individual Time Confirm user accounts are created. Confirm users know where mock data or their data set are.