Basic Microbiome Analysis with QIIME

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Lab 1: Using data output from Qiime, transformations, quality control
Basic Microbiome Analysis with QIIME
Enrichment Map GSEA Tutorial
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Basic Microbiome Analysis with QIIME
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Graphing With Excel 2010 University of Michigan – Dearborn Science Learning Center Based on a presentation by James Golen Revised by Annette Sieg…
Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton McNemar Test PowerPoint Prepared by Alfred P.
Scaffold Download free viewer:
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
Introduction to RNA-Seq and Transcriptome Analysis
1 Day 3 Directories Files Moving & Copying. 2 Case Sensitive First thing to learn about UNIX is that everything is case sensitive. Thus the files: –enda.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
Bacterial Genome Assembly C. Victor Jongeneel Bacterial Genome Assembly | C. Victor Jongeneel | PowerPoint by Casey Hanson.
– Introduction to the Shell 10/1/2015 Introduction to the Shell – Session Introduction to the Shell – Session 2 · Permissions · Users.
®® Microsoft Windows 7 for Power Users Tutorial 13 Using the Command-Line Environment.
Creating a PowerPoint Presentation
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Chapter 7 Experimental Design: Independent Groups Design.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Downloading and Installing Autodesk Revit 2016
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Chi-Square Goodness-of-Fit Test PowerPoint Prepared.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
WINKS 7 Tutorial 7 – Advanced Topic: Labels and Formats Permission granted for use for instruction and for personal use. © Alan C. Elliott,
Chapter Two Exploring the UNIX File System and File Security.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Analyze Improve Define Measure Control L EAN S IX S IGMA L EAN S IX S IGMA Chi-Square Analysis Chi-Square Analysis Chi-Square Training for Attribute Data.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
WINKS 7 Tutorial 3 Analyzing Summary Data (Using Student’s t-test) Permission granted for use for instruction and for personal use. ©
XP Tutorial 3 Creating Animations. XP New Perspectives on Macromedia Flash MX Elements of Animation Layers are used to organize the content of.
1V EPRI/SPP/Baylor Web Tutorial on Synchrophasor Data Analysis and Event Detection Mack Grady, Andrew Mattei, David Jonsson Baylor University Friday,
Learning basic Unix command It 325 operating system.
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Canadian Bioinformatics Workshops
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Statistical hypothesis Statistical hypothesis is a method for testing a claim or hypothesis about a parameter in a papulation The statement H 0 is called.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
Bacterial Genome Assembly Tutorial: C. Victor Jongeneel Bacterial Genome Assembly v9 | C. Victor Jongeneel1 Powerpoint: Casey Hanson.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Regulatory Genomics Lab
Bacterial Genome Assembly
Variant Calling Workshop
How to Start This PowerPoint® Tutorial
Reasoning in Psychology Using Statistics
Bacterial Genome Assembly
Linux + Galaxy Server Tutorial
Volume 3, Issue 1, Pages (July 2016)
Module 6 Working with Files and Directories
Reasoning in Psychology Using Statistics
Regulatory Genomics Lab
Linux + Genome Assembly Tutorial
Regulatory Genomics Lab
Presentation transcript:

Basic Microbiome Analysis with QIIME Bryan White PowerPoint by Casey Hanson Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Exercise In this exercise we will do the following: Calculate sample diversity (𝛼-diversity), and test if different sample types have different numbers of OTUs (species). Calculate differences in microbial community structure (𝛽-diversity); in particular, we will compare OTU composition and abundance between samples and sample types. Compute statistical support for observed differences between sample types. Plot taxonomy composition across samples. Test for potential microbial markers. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 0A: Accessing the IGB Biocluster Open Putty.exe In the hostname textbox type: biocluster.igb.illinois.edu Click Open If popup appears, Click Yes Enter login credentials assigned to you; example, user class00. Now you are all set! Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 0B: Lab Setup The lab is located in the following directory: /home/classroom/mayo/2015/06_Metagenomics/ This directory contains the data and the finished version of the lab (i.e. the version of the lab after the tutorial). Consult it if you unsure about your runs. You don’t have write permissions to the lab directory. Create a working directory of this lab in your home directory for your output to be stored. Note ~ is a symbol in unix paths referring to your home directory. Copy the files Make sure you login to a machine on the cluster using the qsub command. The exact syntax for this command is given below. This particular command will login you into a computer with 2 cpus with an interactive session. You only need to do this once. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 0C: Local Files For viewing and manipulating the files needed for this laboratory exercise, insert your flash drive. Denote the path to the flash drive as the following: [course_directory] We will use the files found in: [course_directory]/06_Metagenomics/results/ Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 0D: Lab Setup $ qsub -I -q classroom -l ncpus=2 # Login to a computer on cluster. $ mkdir -p ~/06_Metagenomics/results # Make results directory in our working directory. # -p indicates to create ~/06_Metagenomics if it doesn’t exist. $ cp /home/classroom/mayo/2015/06_Metagenomics/data/* ~/06_Metagenomics/ # Copy data to your working directory. $ cd ~/06_Metagenomics # Change directory to our working directory. $ module load qiime # We will need QIIME for this lab. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Interstitial Cystitis Interstitial cystitis (IC) is a chronic inflammation of the bowels. In this exercise, we will examine differences between the microbiota of women with and without IC to understand the effect IC has on the community. Our data consists of 16S sequencing of stools samples from 8 women with IC and 7 without it. Using QIMME 1.8.0, we will examine Using this data, we will test the hypothesis that IC induces significant change in gut microbiota. Additionally, we will examine whether or not there is a change in the community and what bacteria are implicated in causing such change. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 1A: Dataset Characteristics ICF.biom The ICF.biom file is an OTU observation file. It is a matrix of observed OTUs, or species, for each sample, annotated with their taxonomy. The ICF.biom file was created using our own TORNADO pipeline for 16S reads: quality check, chimera check, align, assign taxonomy and cluster to 97% similarity to find OTUs The TORNADO pipeline can take from HOURS to DAYS depending on the complexity of the project. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 1B: Dataset Characteristics ICF.mapping.txt The mapping file contains metadata associated with samples. Let us examine the file using the Unix cat command. $ cat ICF.mapping.txt # print file contents to screen Output: #SampleID Barcode Dx SubjectID Description ICF-1 GGATCGCAGATC Control 1 IC_fecal1 ICF-2 GCTGATGAGCTG Control 2 IC_fecal2 ICF-3 AGCTGTTGTTTG Control 3 IC_fecal3 ICF-4 GGATGGTGTTGC IC 4 IC_fecal4 The most important column to us. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 1C: Dataset Characteristics ICF.tree The ICF.tree file is a Newick-formatted phylogenetic tree file. It contains phylogenetic relationships between the OTUs found in our samples. It is another output of the 16S pipeline required for various comparison metrics. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 1D: Dataset Characteristics params.txt The params.txt file contains alternative parameters to run QIIME. Let us examine the file using the Unix cat command. $ cat params.txt# print file contents to screen Output: beta_diversity:metrics bray_curtis,unweighted_unifrac,weighted_unifrac alpha_diversity:metrics chao1,goods_coverage,observed_species,shannon,simpson,PD_whole_tree Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 2: Get Basic Statistics The first step we will do is to get some basic statistics on our ICF.biom file. We will use the biom summarize-table command in QIIME to do this. $ biom summarize-table -i ICF.biom -o results/summary.txt $ cat results/summary.txt # Show stats. Output: Num samples: 15 Num observations: 260 Total count: 399985 Table density (fraction of non-zero values): 0.608 Table md5 (unzipped): be4b6e26ff80ca9ff173d6bbfeda162b Counts/sample summary: Min: 10267.0 Max: 48123.0 Change output Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 3: Calculating 𝛼 Diversity For this next step, let us measure the diversity of the samples. We will use the number from the previous slide so that, for comparison purposes, all samples will have the same number of sequences. We will use the alpha_rarefaction.py script in QIIME to do this. Results are located in ~/06_Metagenomics/results/alpha_diversity $ alpha_rarefaction.py -i ICF.biom -t ICF.tree -m ICF.mapping.txt -o results/alpha_diversity -p params.txt -e 10267 This calculation will take from 5 - 7 min to complete. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 4: Calculating 𝛽 Diversity For this next step, let us compare samples using their composition. We will use the beta_diversity_through_plots.py script in QIIME to do this. Results are located in : ~/06_Metagenomics/results/beta_diversity We will use these results later in the tutorial. $ beta_diversity_through_plots.py -i ICF.biom -t ICF.tree -m ICF.mapping.txt -o results/beta_diversity -p params.txt -e 10267 This calculation will take from 1 - 5 min to complete. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 5: Taxonomy Computations For this next step, we will create a graphical summary of the taxonomical composition of the samples. Let us do the same thing as above, only this time merging the control and IC samples using the Dx column. Results are located in : ~/06_Metagenomics/results/taxonomy (1st command) ~/06_Metagenomics/results/taxonomy_Dx (2nd command). $ summarize_taxa_through_plots.py -i ICF.biom -m ICF.mapping.txt -o results/taxonomy $ summarize_taxa_through_plots.py -i ICF.biom -m ICF.mapping.txt -o results/taxonomy_Dx -c Dx Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 6: ANOVA Tests ANOVA stands for Analysis of Variance. It is a standard suite of statistical tests aimed at explaining differences between groups of data. We will use ANOVA in this step to see if there are any OTUs that explain the differences between sample categories. We will use the group_significance.py script in QIIME to do this. The resulting file, ~/06_Metagenomics/results/ANOVA.txt, sorts the OTUs in the data according to how likely they are driving the differences between samples. The file includes probabilities (uncorrected and corrected), as well as abundance information and lineage of the OTU. $ group_significance.py -i ICF.biom -m ICF.mapping.txt -o results/ANOVA.txt -s ANOVA -c Dx Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Statistical Tests In this exercise, we will test our hypotheses. In particular, if the control and IC samples cluster together, the following tests will measure the significance of such clustering based on the metrics that we just calculated. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 7A: Statistical Tests - 𝛼 Diversity In this step, we will see whether or not the IC and control samples differ significantly using the 𝛼 diversity results computed earlier. We will use the compare_alpha_diversity.py script in QIIME to do this. The result file is located in: ~/06_Metagenomics/results/signif compare_alpha_diversity.py -i results/alpha_diversity/alpha_div_collated/observed_species.txt -c Dx -o results/signif -d 10260 -m ICF.mapping.txt Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 7B: Statistical Tests - 𝛼 Diversity Let us take a look at the results file using the cat command: ~/06_Metagenomics/results/signif/Dx_stats.txt It seems that the categories are very different. Note: your output may be slightly different We will confirm this later when looking at diversity plots $ cat results/signif/Dx_stats.txt Output: Group1 Group2 … t stat p-value Control IC … 3.57527959 0.003 Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 8A: Statistical Tests - 𝛽 Diversity In this step, we will see whether or not the IC and control samples differ significantly using the 𝛽 diversity results computed earlier. We will use the UniFrac matrix and the ANOSIM test. We will use the compare_categories.py script in QIIME to do this. The result file is located in: ~/06_Metagenomics/results/anosim/anosim_results.txt $ compare_categories.py –-method anosim –i results/beta_diversity/unweighted_unifrac_dm.txt –m ICF.mapping.txt –c Dx –o results/anosim –n 9999 Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 8B: Statistical Tests - 𝛽 Diversity Let us take a look at the results file using the cat command : ~/06_Metagenomics/results/anosim/anosim_results.txt Although the p-value is significant, the R statistic says that the clustering is only moderately strong. Note: your output may be slightly different $ cat results/anosim/anosim_results.txt Output: Method name R statistic p-value Number of permutations ANOSIM 0.4069 0.0009 9999 Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 We will now analyze the files we generated during the 𝛼 and 𝛽 diversity runs and tests. Note: the output you generated in lab may be slightly different. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 9A: a Diversity Results Access the downloaded results directory: [course_directory]/06_Metagenomics/results Inside the results directory, open the following file: alpha_diversity/alpha_rarefaction_plots/rarefaction_plots.html Select observed_species as metric, and Dx as category. A graph will be displayed. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 9A: a Diversity Results Control is significantly different than IC! Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 10A: 𝛽 Diversity Results Access the downloaded results directory: [course_directory]/06_Metagenomics/results Inside the results directory, open the HTML file in the following dir: beta_diversity/unweighted_unifrac_emperor_pcoa_plot/index.html This will open a 3D PCA plot, based on unweighted UniFrac distances, colored by sample type (Dx, Control) Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 10B: 𝛽 Diversity Results Rotate the plot to see if the points separate in when viewed from other directions. Identify individual samples from using the ‘Key’ tab Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 10C: 𝛽 Diversity Results Control and IC samples segregate, but only moderately. This is in agreement with the ANOSIM results (R = 0.4069 , p = 0.0009 from Slide 21 ). Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 11A: Taxonomy Results Access the downloaded results directory: [course_directory]/06_Metagenomics/results Inside the results directory, open the HTML file in the following dir: taxonomy/taxa_summary_plots/area_charts.html Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 11B: Taxonomy Results This is the taxonomy at phylum level, for all samples. Hover over each color to find out about each color (colors may differ from this plot). Basic Microbiome Analysis with QIIME | Bryan White | 2015

Step 11C: Taxonomy Results These look like otherwise normal stool samples, with Firmicutes and Bacteroides dominating. Note the Fusobacteria in sample 2, a control! Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 11D: Taxonomy Things get more complex as we go down the taxonomy hierarchy. This is the plot at the genus level, typical of stool samples. Hover over each color to see its taxonomy information. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 11D: Taxonomy There seems to be no obvious pattern (which is the usual case unless there’s something very wrong or a known pathogen). Hover over each color to see its taxonomy information. Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Step 11E: Taxonomy Let’s see if there is something hidden in the taxonomy. In the results directory, open the ANOVA.txt file. Below is the readout from one significant genus, Odoribacter. OTU 111 Test-Statistic 11.82051724 P 0.004407693 FDR_P 0.313682109 Bonferroni_P 1 Control_mean 92.71428571 IC_mean 9.125 taxonomy k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Odoribacter;s__unclassified Basic Microbiome Analysis with QIIME | Bryan White | 2015

(Plot below from the bottom of area_plots.html) Step 11F: Taxonomy Odoribacter has 0.3% mean abundance in controls and 0.02% mean abundance in IC. (Plot below from the bottom of area_plots.html) Indeed, it seems to be a good marker despite its low relative abundance. (Look at abundances in red vs blue columns) Its absence seems correlated with IC (samples 4,7,8,9,10,12,14,15). Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Analysis Conclusions Microbial composition and structure significantly different in stool between IC patients and controls: IC stool microbiota significantly less diverse Overall IC microbiota different (it clusters away from controls) Potential marker found: Lack of Odoribacter associated with IC Basic Microbiome Analysis with QIIME | Bryan White | 2015

Basic Microbiome Analysis with QIIME | Bryan White | 2015 Exercise Conclusions Basic Microbiome analysis: Calculate various diversity metrics for samples Calculate statistical support for differences found between samples types Plot taxonomy composition of samples Basic tests for potential microbial markers Basic Microbiome Analysis with QIIME | Bryan White | 2015