Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
16S sequencing for microbiome studies Nicola Segata and Nick Loman
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
Metabarcoding 16S RNA targeted sequencing
Determination of host-associated bacterial communities In the rhizospheres of maize, acorn squash, and pinto beans.
Determination of host-associated bacterial communities In the rhizospheres of maize, acorn squash, and pinto beans.
Peter Tsai Bioinformatics Institute, University of Auckland
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Methods in Microbial Ecology
The Microbiome and Metagenomics
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Metagenomic Analysis Using MEGAN4
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Molecular Microbial Ecology
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Probes can be designed in an evolutionary hierarchy.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Roadmap for Soil Community Metagenomics of DOE’s FACE & OTC Sites
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Diversity and quantification of candidate division SR1 in various anaerobic environments James P. Davis and Mostafa Elshahed Microbiology and Molecular.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
The Microbiome and Metagenomics
Metagenomics at Second Genome
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Accurate estimation of microbial communities using 16S tags
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Introduction Biodiversity is important in an ecosystem because it allows the species living in that ecosystem to adapt to changes made in the environment.
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Canadian Bioinformatics Workshops
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
16S rRNA Experimental Design
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Biotechnology.
Presented By: Emily Lamoureux
Metagenomic Species Diversity.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Micelle PCR reduces artifact formation in 16S microbiota profiling
Research in Computational Molecular Biology , Vol (2008)
Toward Next Generation Biodiversity Research
The African Soil Microbiology project
Workshop on the analysis of microbial sequence data using ARB
New genes can be added to an organism’s DNA.
H = -Σpi log2 pi.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Canadian Bioinformatics Workshops

2Module #: Title of Module

Module 6 Microbiome biomarker discovery

What are biomarkers and how do we find them? bi·o·mark·er /ˈbīōˌmärkər/ Functional biomarkers Taxonomic biomarkers Biological functions that may be specific to a single organism or shared among multiple organisms Can be a specific species, but most often is a category of organisms – also called Operational Taxonomic Unit (OTU). Measureable biological property that can be indicative of some phenomena, such as an infection, disease, or environmental disturbance Anamaria Crisan 4

Module 6 bioinformatics.ca DISCOVERY VALIDATION o Bioinformatics Software - Takes the raw digitized genomic data and performs QC and biomarker quantification o Use math – applied statistical methods can help us find useful biomarkers o Design Primers – biological “hooks” that pick out our biomarker of interest from a sample o qPCR – measures how many times primers (hooks) manage to snag our biomarker of interest What are biomarkers and how do we find them? Anamaria Crisan

Module 6 bioinformatics.ca Extract & Sequence DNA Identify Potential Biomarkers ( OTUs or Functional Genes ) Validate Biomarker Abundance Select Useful Biomarkers Plan Analysis: “bio”? “mark”? Obtain Biological Samples What are biomarkers and how do we find them? Anamaria Crisan, Thea Van Rossum

Module 6 bioinformatics.ca A) Bacteria B) Viruses o Shotgun or 16S amplicon o Best studied, most methods developed o Shotgun or amplicon (RdRp, g23) o Can be challenging to get enough DNA o Host-specificity and population “bursts” hold promise What will put the “bio” in biomarker? C) Eukaryotes o Amplicon (18S, ITS) (large genomes make shotgun difficult) o Best studied, most methods developed Combinations! Thea Van Rossum

Module 6 bioinformatics.ca A) TAXONOMIC B) GENE-BASED o Can use amplicon or shotgun data o Strain-level diversity can lead to false positives/negatives o More variable across environments (for better or worse) o Requires shotgun data (DNA or RNA) o Need good sequencing depth to reach specialised genes o Domain-based gene architecture can be tricky What kind of biomarker do we want? C) OTHER? o Diversity metrics, using microbiome analysis to suggest other metabolic markers, etc Combinations! Thea Van Rossum, Fiona Brinkman

What is biomarker selection? The process for removing non-informative or redundant OTUs from an analysis. Anamaria Crisan, Fiona Brinkman

Module 6 bioinformatics.ca Anamaria Crisan Sample frequency Abundance OTU1OTU2

Module 6 bioinformatics.ca What makes a good biomarker? In this example we want to find biomarkers that separate between the red and blue class labels. Anamaria Crisan

Module 6 bioinformatics.ca How can biomarkers add value to current testing procedures? How can we make biomarkers easily accessible to those that routinely test water quality? 1 2 What makes a good biomarker? Anamaria Crisan

Module 6 bioinformatics.ca Biomarker selection relies on statistical techniques These techniques can range from the very simple (like a t-test) to more complex You can write your own statistical analysis (using programs like R, Matlab, Python, STATA, SAS etc.) OR you can use more complex methods developed by others: Popular metagenomics / microbiome methods include: LEfSE metagenomeSeq Whatever you choose to use it’s important to understand the statistical methods especially: Your assumptions about the data The statistical method’s assumptions about the data The statistical method’s limitations How to interpret your results from the output of the statistical method Anamaria Crisan

Module 6 bioinformatics.ca Biological Data is difficult Most biological data is correlated and, especially with microbial community data, quite sparse Statistical methods vary with respect to how correlation and data sparseness are taken into account. So how do others deal with the problem? – They ignore correlation and use only parametric methods (most common approach); use a hard filter to look at only abundant OTUs – Use more complex approaches that take into account correlation and sparseness BUT it can be more computationally intensive and time consuming to use these methods AND it can be difficult to understand and interpret what the method is doing Anamaria Crisan

Module 6 bioinformatics.ca So you might be wondering … what are these mysterious “statistical methods”?

Module 6 bioinformatics.ca Generally, statistical techniques either try to predict labels or continuous values Anamaria Crisan

Module 6 bioinformatics.ca And…they also generally belong to one of two categories (I lied a little : Semi-supervised methods also exist) Anamaria Crisan

Module 6 bioinformatics.ca Anamaria Crisan

Module 6 bioinformatics.ca 19 Anamaria Crisan

Module 6 bioinformatics.ca Note, the same techniques are called different things in different fields Anamaria Crisan

Module 6 bioinformatics.ca 21 The best approach to use depends on your research question

Module 6 bioinformatics.ca Now that we have biomarkers, we should validate them.

Module 6 bioinformatics.ca How do we validate our biomarkers? Once you ID the gene or taxonomic group you want to use as a biomarker, you need to design a test for it (q)PCR is a good option: 1.Identify biomarker specific sequence – If used marker-based tool to identify biomarker, you’re all set (e.g. MetaPhlAn2) – Otherwise, can cluster reads or align them to find conserved sequences – need to verify that “representative” sequence is still a good biomarker 2. Design primers (& probe) – PrimerProspector: designs primers from a sequence alignment – PrimerBLAST: designs primers specific to a clade Thea Van Rossum

Module 6 bioinformatics.ca Case Study – Categorical Biomarker from Bacterial Shotgun Data Example of fast track to marker identification and PCR test development – IDing markers of water quality (see for project) Bacterial shotgun data using Illumina HiSeq platform Comparing riverwater microbiomes at an agricultural unimpacted site (AUP) versus two impacted downstream sites (“at pollution” (APL) & “downstream” (ADS)) Using MetaPhlAn – Low sensitivity, high precision – Based on clade-specific gene sequences – 3000 reference genomes – Fast: 3 million reads ( bp) in 10 minutes Thea Van Rossum, Fiona Brinkman

Quality trimmed and normalised data across samples MOCK COMMUNITY VALIDATION Validated with mock community: DNA-free water spiked with DNA from lab-cultured bacteria 7% of reads were assigned to a species (low sensitivity) Of those, 84% were correctly assigned Thea Van Rossum

Module 6 bioinformatics.ca Taxonomic Classifications – Rural Site Prioritised high abundance taxa Use White’s non-parametric t-test with false discovery rate multiple test correction to find differentially abundant taxa (alternative: RandomForests) 26 Upstream At Site & Downstream Taxon 1 Taxon 2 Top 50 most “abundant” taxa Thea Van Rossum

Module 6 bioinformatics.ca 3. Identified sequences characteristic of taxa 57,016 reads assigned to Taxon 1 2,176 reads assigned to Taxon 2 Prioritised taxon 1 due to our research indicating high abundance taxa are most accurately predicted Extracted taxon-specific sequences from Metaphlan database – 607 sequences for Taxon 1 Aligned reads against these sequences Chose regions of Metaphlan sequences with most hits Thea Van Rossum

Metaphlan marker sequence Consensus sequence Highest Coverage = candidate marker sequence Thea Van Rossum, Fiona Brinkman

4. Designed qPCR primers and probes from marker sequence Used Primer3 for primer and probe design Checked relative occurrence rates of candidate primers and probes – Considered matches that are exact or have 1-2 mismatches – Chose sequences that minimize non-specific matches Confirmed we can amplify a product of the right size Upstream At site & downstream Forward primer Probe Reverse primer Next: iteratively test & improve qPCR Thea Van Rossum

Module 6 bioinformatics.ca “Fast track” Case Study Summary Initial analysis identified a PCR marker based on differential abundance of a bacterial species that is being used to pilot our iterative validation process Benefits of this approach – Fast: sequence data to PCR primers in a couple days – Simple: does not require large amounts of processing power Limitations of this approach – Depends on differential abundance of known bacteria (if the differential bacteria aren’t highly similar to those in the Metaphlan database, this approach will not work) – Depends on taxa, which have been shown to be more variable across environments than (gene) functions This is a “low hanging fruit” approach; a good first step Thea Van Rossum, Fiona Brinkman

Module 6 bioinformatics.ca Bacterial Shotgun – alternative methods Get predicted proteins Cluster Find differential clusters Design PCR More complete taxonomic analysis – Kraken, Discribinate Gene function analysis – MEGAN4 (SEED, KEGG) – RealtimeMetagenomics (FIGFams) Cluster based analysis: Thea Van Rossum

Module 6 bioinformatics.ca From Biomarker to Lab Test Identify discriminative taxa or functions Design primers (Primer-BLAST or IDT Realtime PCR Tool) Validate primers in silico (Primer Prospector, Primer-BLAST) Validate primers in vitro (qPCR) Identify informative region for primer design (CD-HIT) Primer 1Primer 2 Mike Peabody

Module 6 bioinformatics.ca Remember other markers… Community diversity could be a useful indicator of ecosystem health Microbiome analysis may suggest other types of screening tests… (metabolites, etc) Markers are only as good as the data they are based on, so design experiments carefully, include +ve and - ve controls 1 Anamaria Crisan, Fiona Brinkman

Module 6 bioinformatics.ca We are on a Coffee Break & Networking Session