An Ecological Trap for Ecologists: Zero-Modified Models Western Mensurationists’ Meeting 2009 Tzeng Yih Lam 06.23.2009 Tzeng Yih Lam, OSU Manuela Huso,

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Tests of Static Asset Pricing Models
7. Models for Count Data, Inflation Models. Models for Count Data.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions.
Empirical Methods for Microeconomic Applications William Greene Department of Economics Stern School of Business.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Generalized Linear Models
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Difference Two Groups 1. Content Experimental Research Methods: Prospective Randomization, Manipulation Control Research designs Validity Construct Internal.
Chapter 7 Estimation: Single Population
The maximum likelihood method Likelihood = probability that an observation is predicted by the specified model Plausible observations and plausible models.
Methods Workshop (3/10/07) Topic: Event Count Models.
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2013 William Greene Department of Economics Stern School.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Probability Distributions and Dataset Properties Lecture 2 Likelihood Methods in Forest Ecology October 9 th – 20 th, 2006.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
Issues in Estimation Data Generating Process:
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Extreme Value Theory for High Frequency Financial Data Abhinay Sawant April 20, 2009 Economics 201FS.
Discrete Choice Modeling William Greene Stern School of Business New York University.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Machine Learning 5. Parametric Methods.
Discrete Choice Modeling William Greene Stern School of Business New York University.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
 Occupancy Model Extensions. Number of Patches or Sample Units Unknown, Single Season So far have assumed the number of sampling units in the population.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Limited Dependent Variables
Classification of unlabeled data:
Discrete Choice Modeling
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
More about Posterior Distributions
Discrete Event Simulation - 4
Hypothesis testing. Chi-square test
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
YOU HAVE REACHED THE FINAL OBJECTIVE OF THE COURSE
5.1 Introduction to Curve Fitting why do we fit data to a function?
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2019 William Greene Department of Economics Stern School.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Probabilistic Surrogate Models
Presentation transcript:

An Ecological Trap for Ecologists: Zero-Modified Models Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Tzeng Yih Lam, OSU Manuela Huso, OSU Doug Maguire, OSU ^ Possible

Ecological Trap [ē-k ə - ˈ lä-ji-k ə l ˈ trap] A preference of falsely attractive habitat and a general avoidance of high-quality but less-attractive habitats. Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Wikipedia

Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Cues & The Solutions Zero-Inflated Models Hurdle Models Expected Count – Observed Count Conclusions ‘The Cues’: For rare species data, the marginal count frequency distribution contains large number of zeros, Poisson and/or NB GLM have poor fit. ‘The Solutions’: Zero-modified Models: A general class of finite mixture models that account for excessive zeros, Zero-Inflated Models (ZI) 1, Hurdle Models (H) 2. Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Lambert (1992); 2 Mullahy (1986)

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Cues & The Solutions Zero-Inflated Models Hurdle Models Expected Count – Observed Count Conclusions Zero-Inflated Models (Poisson; ZIP) Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Lambert (1992)  Two States: Perfect and Imperfect States,  Finite Mixture Model (FMM) with 1 latent structure: An observation belongs to either state.  Specify it as Zero-Inflated Negative Binomial (ZINB). Probability of Belonging to Perfect State

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Cues & The Solutions Zero-Inflated Models Hurdle Models Expected Count – Observed Count Conclusions Hurdle Models (Poisson; HPOIS) Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Mullahy (1986) 2 Baughman (2007)  Under FMM framework comparable to ZI models, Hurdle Models with 2 latent structures 2 : An observation either cross the ‘hurdle’ or not, All observations are in the Imperfect State.  Specify it as Hurdle Negative Binomial (HNB). Probability of Crossing the Hurdle

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanations Given known data generating process (dgp): (1)Is there any bias when the data is fitted to different ZI and H model specifications? (2)Is/Are there any universally best fit models? Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanation 4 Factors (1)LAMBDA ( λ ): (2)INFLA ( p ): (3)RATIO ( Var/Mean ): (4)SAMPLE : Western Mensurationists’ Meeting 2009 Tzeng Yih Lam  For each of the 27 dgp (LAMBDA × INFLA × RATIO), generate 1000 sets of SAMPLE random count,  Fit each set to six model specifications: POIS, NB, ZIP, ZINB, HPOIS, HNB,  Calculate mean %RBIAS for each parameter: λ, p, π and compute AICc.

Western Mensurationists’ Meeting 2009 Tzeng Yih Lam SAMPLE = 100

Western Mensurationists’ Meeting 2009 Tzeng Yih Lam SAMPLE = 100 Bias at LAMBDA = 0.3

Western Mensurationists’ Meeting 2009 Tzeng Yih Lam SAMPLE = 100 Bias with ZIP & HPOIS

Western Mensurationists’ Meeting 2009 Tzeng Yih Lam SAMPLE = 100 Bias with ZINB & HNB

Western Mensurationists’ Meeting 2009 Tzeng Yih Lam POIS NB HNB NB ZINB HNB ZIP ZINB HPOIS HNB ZIP ZINB HPOIS HNB NB ZINB HNB ZIP ZINB HPOIS HNB ZIP ZINB HPOIS HNB NB ZIP ZINB HPOIS HNB SAMPLE = 100 Lowest AICc

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanations (1)Variance of estimated λ is the highest when LAMBDA = 0.3, (2)Variance of estimated λ decreases with increasing LAMBDA but it increases with increasing RATIO and/or INFLA, (3)Probability in Perfect State, p, from ZI models has largest (+ve and –ve) bias and variance at LAMBDA = 0.3, (4)Overdispersion parameter, θ, requires ≥ 250 SAMPLE to achieve negligible bias. Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanations Maximum Likelihood Theory Ingredient = a simulated set of count Optimize the parameter estimates to match the marginal count distribution. Large bias and variance of λ and p at LAMBDA = 0.3, When there are either too many zeros or ones, Binomial GLM seems to be unstable Min and Agresti (2005)  unstable estimates of p  unstable estimates of λ and θ. There might not be enough information when LAMBDA = 0.3. ZI models estimation are based on EM algorithm, H models separately maximize the likelihood functions of π and λ. Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Perfect State in Ecology Context: It is a set of habitat conditions that do not host the interested species, Imperfect State in Ecology Context: It is a set of habitat conditions that host the interested species but one may not find the species there, This does not directly differentiate sink & source, saturated & unsaturated habitat, fundamental & realized niche etc. Zero Structural Random Accidental Stochastic Sampling True False

A: “Did you smoke any cigarette last week?” B: “No” ; 0 A: “Are you a smoker?” B: “Yes” Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Zero-Inflated Models have 2 states: Perfect and Imperfect States Main Assumption: You do not know the observation belong to which state. Hurdle models have 1 state: Imperfect State Zero-Inflated Models Hurdle Models

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam A priori knowledge such as species habitat range, will likely influence the model choice In ecology, scale matters … Grains Extent POIS NB ZIP ZINB HPOIS HNB

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Modeling of rare species habitat association Cunningham & Lindenmayer (2005) and many others…

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity, Extent & Grains Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam What Do the Ecologists Need To Do?

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity, Extent & Grains Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam The Great Escape (1963) Capt. Hilts (The Cooler King) 1961 British 650cc Triumphs

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity, Extent & Grains Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Escape from defining the types of zeros: There is no restriction on threshold for mixing & hurdle, Change current threshold from 0  1, Change perfect to near-perfect state (ZI models), changing ecological implication of the models. N-mixture models (Royle 2004) Escape from using ZI & H models : If one is uncomfortable with two-states processes, Small Area Estimation Rao(2003), Extreme Value Model Coles(2001).

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Acknowledgement Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States Acknowledgement A Priori Knowledge Rarity, Extent & Grains Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Hayes Family Foundation Funds for Silviculture Alternatives Dilworth Awards, OSU Doug Maguire

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Acknowledgement Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States Acknowledgement A Priori Knowledge Rarity, Extent & Grains Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam Thank You for Listening! Any *Err… Question?

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanations AICc, Information theory. More flexible model parameterization will have better fit. Sample size is an issue for ZINB and HNB models for fitting. Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory The Questions The Simulation Study The Bias & AICc Other Preliminary Key Findings Some Plausible Explanations Western Mensurationists’ Meeting 2009 Tzeng Yih Lam

Cues for Using Zero-Modified Models Cues for Zero-Modified Models Possible Trap #1 Possible Trap #2 Discussions Count & Normal Theory Perfect & Imperfect States A Priori Knowledge Rarity Conclusions Western Mensurationists’ Meeting 2009 Tzeng Yih Lam A priori knowledge such as species habitat range, and extent and grains will likely influence the model choice.