The power of metagenomic read recruitment

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Multiple Regression.
Advertisements

Confidence Intervals for Population Means
DNA BLAST Lab.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 6 By Herb I. Gross and Richard A. Medeiros next.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Todd J. Treangen, Steven L. Salzberg
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
Sources of Inherited Variation Mutations & Sexual Reproduction.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
INCM 9201 Quantitative Methods Confidence Intervals for Means.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
BNFO 615 Usman Roshan. Projects and papers An opportunity to do hands on work Proposal presentations due by end of September Papers: present at least.
Plus: Exam Scoring How is it done. How many questions are there
Metagenomic Species Diversity.
US Customary Measurement System
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Nucleotide variation in the human genome
Test Taking Strategies
Of Sea Urchins, Birds and Men
Extract DNA and RNA from the same E. coli culture
Genome alignment Usman Roshan.
Least Squares Regression
Lecture 2.1.
Chapter 3: Displaying and Describing Categorical Data
University of California at San Diego
Identifying templates for protein modeling:
US Customary Measurement System
CSE182-L12 Gene Finding.
US Customary Measurement System
Module 8 Statistical Reasoning in Everyday Life
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 8 Estimating with Confidence
H = -Σpi log2 pi.
Analyze To study something closely and carefully. To learn the nature and relationship of the parts of something by a close and careful examination. Example:
CHAPTER 8 Estimating with Confidence
Comparing Two Groups Statistics 2126.
US Customary Measurement System
US Customary Measurement System
Warm-Up Study Notes from last week!.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Chapter 7 Multifactorial Traits
Basic Local Alignment Search Tool
Playing Games.
US Customary Measurement System
PowerPoint Presentation Guide
CHAPTER 8 Estimating with Confidence
Icebreaker Difficulty = Low.
CHAPTER 8 Estimating with Confidence
BF528 - Genomic Variation and SNP Analysis
Biologically Inspired Computing: Operators for Evolutionary Algorithms
EGR 2131 Unit 12 Synchronous Sequential Circuits
External Assessment – 2018 Biology HL
US Customary Measurement System
LearnZillion Notes: --This is your hook. Start with a question to draw the student in. We want that student saying, “huh, how do you do X?” Try to be specific.
CHAPTER 8 Estimating with Confidence
SNPs and CNPs By: David Wendel.
Genome resolved metagenomics
Key NGS principles. Key NGS principles. A and B, identification of structural variants: Longer (paired-end or mate-pair) sequencing reads are more adept.
Presentation transcript:

The power of metagenomic read recruitment http://merenlab.org/momics

The power of metagenomic read recruitment to see the ‘obvious’ http://merenlab.org/momics

Metagenomic read recruitment is at the core of most things we will discuss, and most things you will do when you do your own ‘omics analyses. It is the strategy of aligning short reads in a metagenome to a reference context. This context could be an entire genome or a single contig.

Let’s say we have a reference context like this,

And a metagenome from an environment.

Metagenomic read recruitment will find all the reads that match to different parts of this genome. And we can display them like this to see where they match. Q: what do you think that blank area mean?

Does this help? So that area means that the reference genome we are using to recruit reads contains two genes that are not occurring in the naturally occurring population in this environment even though it is similar. Q: So what’s up with that highly covered section?

Let’s assome we have to more metagenomes from two more environments.

And let’s keep in mind where those genes are so it is easier to follow.

Here is environment #2. Q: How would you interpret this?

Here is the environment #3. Q: How would you interpret this one? Indeed, these short reads can tell us something about the nature of these populations in these environments regarding their occurrence and abundance. Abundance is often is correlated with ‘coverage’.

Coverage is a statistic that tells us the average number of short reads matching to each nucleotide position in a reference sequence. Q: What is the coverage of this genome in the second environment? Make a guess.

How about the third one?

Hmm. The average coverage of this genome is identical in these two environments. But is this an accurate representation of the ecology of these environments in the context of this genome? Clearly coverage can be misleading. Q: What else could we do to have a more balanced understanding?

Detection is a statistic that tells us about the fraction of nucleotides in the reference sequence that are covered by at least at 1X.

Aha. Much better.

The power of metagenomic read recruitment to see the ‘subtle’ http://merenlab.org/momics

Aha. Much better.

Let’s say we zoom in to this part

And take a much more careful look at the reads recruited. Q: Do you expect every single read to be identical to the reference context? It rarely occurs in fact. But sometimes due to reasons that are not related to biology…

For instance if there is a single read in this mapping results where there at that particular location you see an A instead of a T. Q: What would you think? This could be due to a sequencing error, right?

How about now?

What would this statistic tell you?

Next you see this.

How about now? What can we tell about the complexity of this environment?

Then let’s say we see this.

What would this mean? As you can see, there is so much we can learn about single-nucleotide variants. Temporal and spatial variation in SNVs can offer insights into evolutionary pressures acting upon environmental populations. We will talk much more about this in later weeks. Let’s go back to where we were now. Q: Where were we before we jumped into this side track?