This Must Be the Place: The Abundance and Distribution of Microbes using Maximum Entropy Will Shoemaker.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Community assembly through trait selection ( CATS ): Modelling from incomplete information.
Trait-based Analyses for Fishes and Invertebrates in Streams Mark Pyron Stoeckerecological.com.
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.
Commonness and rarity in species distribution Sophia Qian Niu Graduate seminar: Lost in space.
Functional traits, trade-offs and community structure in phytoplankton and other microbes Elena Litchman, Christopher Klausmeier and Kyle Edwards Michigan.
Macroecology …characterizing and explaining patterns of abundance, distribution, and diversity.
Robert May ecologist Photo: Hubble Telescope We have a catalog of all the celestial bodies our instruments can detect in the universe, but …
Species-Abundance Distribution: Neutral regularity or idiosyncratic stochasticity? Fangliang He Department of Renewable Resources University of Alberta.
Parameter Estimation using likelihood functions Tutorial #1
Maximum Entropy Model (I) LING 572 Fei Xia Week 5: 02/05-02/07/08 1.
Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
The Challenge of Scale: Is Biodiversity Big Science? Woody Turner Biodiversity & Ecological Forecasting Team Meeting University of Maryland May 1, 2008.
Stepping Forward Population Objectives Partners in Flight Conservation Design Workshop April 2006 and Delivering Conservation.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Machine Learning CMPT 726 Simon Fraser University
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology Roderick C. Dewar Research School of Biological Sciences The Australian.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Butterfly diversity…………………… in rain forest. What is ecological diversity? Based on Based on 1) Species richness, i.e. number of species present 1) Species.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Environmental Modeling Steven I. Gordon Ohio Supercomputer Center June, 2004.
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
National Accounts and SAM Estimation Using Cross-Entropy Methods Sherman Robinson.
Potomac Flow-by Stated Management Objectives (1) estimate the amount and quality of biotic habitat available at different flow levels, particularly as.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Geographic variations in microbial cytometric diversity
Niches, Interactions and Movements. Calculating a Species Distribution Range Jorge Soberon M. A. Townsend Peterson.
Macroecology & uneven distributions of wealth Ken Locey.
Combinatorial insights into distributions of wealth, size, and abundance Ken Locey.
When is the onset of a phenophase? Calculating phenological metrics from status monitoring data in the National Phenology Database Jherime L. Kellermann.
Macroecology & uneven distributions of wealth Ken Locey.
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.
John Harte, UC Berkeley INTECOL London August 20, 2013 Maximum Entropy and Mechanism: Prospects for a Happy Marriage.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
 1 Species Richness 5.19 UF Community-level Studies Many community-level studies collect occupancy-type data (species lists). Imperfect detection.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Emergence of Landscape Ecology Equilibrium View Constant species composition Disturbance & succession = subordinate factors Ecosystems self-contained Internal.
Monitoring and Estimating Species Richness Paul F. Doherty, Jr. Fishery and Wildlife Biology Department Colorado State University Fort Collins, CO.
Multiple Season Study Design. 2 Recap All of the issues discussed with respect to single season designs are still pertinent.  why, what and how  how.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
 Occupancy Model Extensions. Number of Patches or Sample Units Unknown, Single Season So far have assumed the number of sampling units in the population.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Aerial lakes photo.
Citizen Science’s contribution to GEO BON
Identifying personal microbiomes using metagenomic codes
Bringing Organism Observations Into Bioinformatics Networks
Information-Theoretic Listening
Towards a Gulf-wide Bird Monitoring Network;
Delivering Conservation
Estimating mean abundance from repeated presence-absence surveys
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Dynamic Causal Modelling for M/EEG
Species diversity indices
Microbiome studies for microbial disease pathogenesis research
Null models in community ecology
The general problem Distant (remote) homology poses challenges:
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

This Must Be the Place: The Abundance and Distribution of Microbes using Maximum Entropy Will Shoemaker

Outline Microbial Abundance & Distribution Entropy MaxEnt Theory – General principle – Maximum Entropy Theory of Ecology Previous Usage for Macrobes Preliminary Trials on Microbes Future Directions

Microbial Abundance & Distribution ~ bacteria & archaea (Whitman et al., 1998) Microbes play crucial roles in ecological functioning and human health Ability to infer community composition increases Still little known about patterns of abundance relative to macrobes

Current Studies Conditionally Rare Taxa (CRT) contribute to microbial diversity – Low abundance – Presence temporally variable – Most difficult taxa to detect Emphasizes importance of having a null model for abundance data (Shade & Gilbert, 2015)

Some Issues with Models Models can make assumptions – ex. trade-offs, life-history traits, etc. Models can allow for parameter manipulation – ex. UNTB What about looking at patterns in a model based off of what we know?

What Do We know? Good idea to start from the data – How does the data constrain our inference? Large amounts of open-access microbial sequence data – ex. JGI, MG-RAST, NCBI – Metadata often poor quality Some constraints are easily inferred – N = Number of individuals – S = Number of species Basis for calculating our uncertainty in a distribution – i.e. our entropy

What is Entropy? Let’s reframe. Q: what is information? – A: Reduction in uncertainty – i.e. a reduction in entropy – Entropy = uncertainty But how is entropy calculated? Quantifying information content relies on the frequency of events in a distribution Jaynes, 1982

Entropy & Fair Coins What is the entropy of a fair coin? Let’s do the math But, entropy depends on what data you have! What’s my entropy vs. yours?

So, what did we just do? We both calculated the maximum entropy for one coin toss With incomplete information you predicted the most uniform distribution

Why maximum entropy (MaxEnt) ? We just learned that information is useful Maximize entropy = minimize commitment Model all you know and nothing more – What you’re modeling is a set of constraints that must hold Then choose the most uniform distribution – i.e. maximum entropy

MaxEnt Example

What MaxEnt is not It’s not “creating uncertainty” – The uncertainty is already in the data The results are not “arbitrary” – It’s constrained by the data It’s not about reducing biology – “MaxEnt is most useful …where the observed frequencies do not agree with the maximum entropy predictions” - E. T. Jaynes (Jaynes, 2003)

Maximum Entropy Theory of Ecology (METE) & the Species Abundance Distribution (SAD) One soft constraint: – Average abundance (N 0/ S 0 ) Based on Shannon’s information entropy Uses two LaGrangian multipliers to solve for the maximum entropy solution to a SAD Fisher’s log-series distribution

Predicted Abundance Observed Abundance Mammal Community Database (MCDB), R 2 = 0.83 North American Butterfly Count (NABC), R 2 = 0.93 Breeding Bird Survey (BBS), R 2 = 0.91Christmas Bird Count (CBC), R 2 = 0.90 White et al., 2012

Limitations Current METE can’t handle very large N 0 N 0 > ~1xE6 A rapidly changing system may not be well- described by state variables Likely to fail in systems with heterogeneous habitats over large spatial scales

Preliminary Trials Can METE explain OTU abundance? – i.e. not using metadata to infer distribution One well-maintained dataset – Human Microbiome Project (Barbara et al., 2012) – 16s regions V3-V5 GI tract & skin microbiomes Compare to another MaxEnt distribution – Geometric series – Hard constrained for N 0 & S 0

Work Environment / Methods METE package from Weecology Lab – Used for METE distribution & geometric series code – Estimated fit using custom IPython notebooks – Markdown documentation & visualization in line with Python code – Store in GitHub repo – Will make public once analysis complete (Xiao et al., 2013)

Raw data: r 2 = Log- transformed: r 2 = Abundant taxa skewing the fit SAD- GI Tract METE

SAD– GI Tract Geometric Series Raw data: r 2 = Log- transformed: r 2 = Low abundant taxa skewing the fit

SAD–Skin METE Raw data: r 2 = Log- transformed: r 2 = Abundant taxa skewing the fit

SAD–Skin Geometric Series Raw data: r 2 = Log- transformed: r 2 = Low abundant taxa skewing the fit

Summary METE & the geometric series both fail to predict the SAD – Surprising giving METE’s prior success with Macrobes Over predicted for the most taxa across sites Geometric series under predicted the few abundant taxa.

What Use is METE to Microbial Ecology? Plenty of use as a null model N 0 / S 0 might constrain abundance at a higher or lower scales Potential use to compare OTU clustering methods – Large scale microbiome / microbial comm. sequencing efforts – The most widely used algorithm is most sensitive to any change in parameters: UCLUST Schmidt et al., 2014

Future Direction Apply towards traits and effect of S 0 – Microbial trait-based modeling using METE KEGG annotated Global Ocean Sampling Expedition metagenomes (~80) – Examine microbial patterns using sub-OTU clustering Minimum Entropy Decomposition Work on research with the evolutionary ecology of quorum sensing in Janthinobacterium

Questions?

Works Cited The Human Microbiome Project Consortium, Barbara A.; Nelson, Karen E.; Pop, Mihai; Creasy, Heather H.; Giglio, Michelle G.; Huttenhower, Curtis; Gevers, Dirk; Petrosino, Joseph F. et al. (2012). "A framework for human microbiome research". Nature 486 (7402): 215–221. Gilbert JA, Steele JA, Caporaso JG, et al. Defining seasonal marine microbial community dynamics. The ISME Journal 2012;6(2): Harte, P. H. et al. Maximum Entropy and Ecology. Schmidt, T. S. B., Matias Rodrigues, J. F. & von Mering, C. Limits to Robustness and Reproducibility in the Demarcation of Operational Taxonomic Units. Environ. Microbiol. (2014). Shade, A. & Gilbert, J. a. Temporal patterns of rarity provide a more complete view of microbial diversity. Trends Microbiol. 1–6 (2015). Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: the unseen majority. Proc. Natl. Acad. Sci. U. S. A. 95, 6578–6583 (1998). Characterizing species abundance distributions across taxa and ecosystems using a simple maximum entropy model. Ecology. 93, 1772–1778 (2012). Xiao, X., McGlinn, D. J. & White, E. P. A strong test of the Maximum Entropy Theory of Ecology. arXiv Prepr. arXiv (2013).

Given data with any constraint on the problem, the probability distribution which maximized the entropy is identical with the frequency distribution which can be realized the greatest number of ways MaxEnt tells us which predictions are most likely given our information

Discuss strong test of MaxEnt?

  Species Sites

How Does METE Calculate Abundance? 1.Calculate β based off of N 0 and S 0 1.Done for the range of 1 to S 0 2. Calculate

Fairly Good Fit Harte, P. H. et al. Maximum Entropy and Ecology.

Fairly Poor Fit Harte, P. H. et al. Maximum Entropy and Ecology.