Modeling Crypto Occurrence, Using Lab- Specific Matrix Spike Recovery Data Michael Messner, Ph.D. Mathematical Statistician EPA Office of Ground Water.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Probabilistic models Haixu Tang School of Informatics.
Chapter 7 Statistical Data Treatment and Evaluation
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Linear Regression.
QUANTITATIVE DATA ANALYSIS
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Statistics.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Parameters and Statistics Probabilities The Binomial Probability Test.
EEM332 Design of Experiments En. Mohd Nazri Mahmud
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
5-3 Inference on the Means of Two Populations, Variances Unknown
The t Tests Independent Samples.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Chapter Two Probability Distributions: Discrete Variables
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Concepts and Notions for Econometrics Probability and Statistics.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Probability theory 2 Tron Anders Moger September 13th 2006.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Random Sampling, Point Estimation and Maximum Likelihood.
Chapter 14 Monte Carlo Simulation Introduction Find several parameters Parameter follow the specific probability distribution Generate parameter.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Statistics. Key statistics and their purposes Chi squared test: determines if a data set is random or accounted for by an unwanted variable Standard deviation:
User Study Evaluation Human-Computer Interaction.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Incorporating heterogeneity in meta-analyses: A case study Liz Stojanovski University of Newcastle Presentation at IBS Taupo, New Zealand, 2009.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Quick and Simple Statistics Peter Kasper. Basic Concepts Variables & Distributions Variables & Distributions Mean & Standard Deviation Mean & Standard.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
AP Statistics Section 11.1 B More on Significance Tests.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
For starters - pick up the file pebmass.PDW from the H:Drive. Put it on your G:/Drive and open this sheet in PsiPlot.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
P Values - part 2 Samples & Populations Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland.
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter ?? 7 Statistical Issues in Research Planning and Evaluation C H A P T E R.
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Statistical Methods Michael J. Watts
ESTIMATION.
Two-Sample Hypothesis Testing
Statistical Methods Michael J. Watts
Handout on Statistics Summary for Financial Analysis: Random Variables, Probability and Probability Distributions, Measures of Central Tendency, Dispersion,
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Modeling Crypto Occurrence, Using Lab- Specific Matrix Spike Recovery Data Michael Messner, Ph.D. Mathematical Statistician EPA Office of Ground Water and Drinking Water Standards and Risk Management Division

Outline Disclaimer Data Used Uncertainty in Crypto Numbers Spiked Model Building Preferred Model (Model 5) Results of Recovery Modeling Informing the Crypto Occurrence Model

Disclaimer Views expressed in this presentation are the authors and are not necessarily those of the USEPA.

Data Used Results were obtained from analyses of 1263 source water samples that were spiked with Cryptosporidium (matrix spike samples). Dates range from Feb, 2004 to May For each matrix spike sample, the data include: – Organization (Lab ID) – Sample volume filtered – Sample volume spiked – Number of Crypto measured – Number of Crypto spiked The fraction of volume spiked is found by dividing “Sample volume filtered” by “Sample volume spiked”

Uncertainty in Crypto Numbers Spiked Spiking suspensions (“tubes”), provided by two vendors, were prepared using flow cytometry. Both vendors checked hundreds of their tubes by carefully counting the tubes’ oocysts. Based on data provided by one lab, a pooled estimate of relative standard deviation (RSD) is 1.35%. The other lab provided a histogram, rather than statistical summaries. The next slide shows that their precision appears to match that of the first lab.

Histogram of Lab 2 and Normal Density Function mu = 100, s = 1.35

Model Building All models assume that the number of oocysts counted is Binomial with parameters N (exact number of oocysts in the spiked sample) and r, the probability that an oocyst in the sample will be observed and counted. All the models account for uncertainty in N, based on 1.35% RSD. Basic modeling approach was to start simple, using 2-parameter models, using log likelihood to gauge model quality.

Models Model 1: r varies from assay to assay (both within and between labs) as a beta random variable. Model 2: ln(r/(1-r)) = logit(r) varies from assay to assay as a normal random variable. Model 3: With probability z, r varies as a Beta random variable, but the rest of the time (1-z), r is exactly zero. Model 4: With probability z, logit(r) varies as a normal random variable, but the rest of the time (1-z), r is exactly zero. Model 5: Both the probability of zero recovery and expected value of logit(r) vary from lab to lab as a bivariate normal random variable. Covariance allows these two features to be related.

Model 5 Hierarchy High Level: – Grand means (mu0 and mu1) of lab-specific parameters logit(r) & pr{r=0} – Precision matrix R (R -1 = var-covar matrix) – Within-lab precision parameter phi0 Medium Level: – Lab-specific averages of logit(r) – Lab-specific pr{r=0} Low Level: – Sample-specific recoveries (product of nonzero recovery and an indicator of zero recovery – Data (not shown in the figure). K ~ dbinom(N,r) Number spiked (Sp) Number counted (K)

WinBUGS Code

Results WinBUGS generates statistics about the model parameters and a Markov Chain Monte Carlo (MCMC) or “uncertainty” sample. MCMC sample of size 10K takes about 4 min.

Results 0 not in interval for logit(r) and logit(z)  reject hypothesis that median probabilities for these are in interval  covariance is not significant, so can’t reject notion that Pr{zero} is distributed independently of median recovery (when not zero) Can’t say that Labs with poor recovery don’t also have high probability of totally missing spiked oocysts.

Labs Differ w.r.t. Mean Logit(r) Logit(0.881) = 2 Logit(0.731) = 1 Logit(0.5) = 0 Logit(0.269) = -1 Logit(0.119) = -2 Posterior median for this lab is  median r = 26.5% Average Recovery* = 24.2% * (count/expected), averaged across samples Posterior median for this lab is  median r = 55.9% Average Recovery* = 62.4% Posterior median for this lab is  median r = 64.3% Average Recovery* = 65.3% Central Value

Labs Differ w.r.t. Pr{r=0} Lab found Crypto in all 60 spikes Lab found no Crypto in 5 of 76 spikes Lab found no Crypto in 17 of 223 spikes Lab found no Crypto in 4 of 22 spikes

Informing the Occurrence Model Okay, so what good is all this? Can use MCMC sample to inform our upcoming estimate of the Long-Term Rule’s (LT2’s) benefit. – Public water systems are monitoring their source waters for Crypto. – The new Crypto data, together with a model that accounts for lab-specific recovery will produce better estimates of actual occurrence. – Better occurrence estimates  better risk analyses  improved estimate of the benefit of treatment changes that result from LT2 implementation.

The funny thing about hierarchical models… …is that, once you’ve tried one (and succeeded), you’ll see hierarchical models everywhere… …which makes you wonder if you’re like that fellow with a hammer, to whom every problem looks like a nail. Hierarchical modeling : Try it, you’ll like it.