Www.data61.csiro.au Estimating relative species abundance from partially- observed data Melissa Dobbie| Statistician November 2015.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Measurement error in mortality models Clara Antón Fernández Robert E. Froese School of Forest Resources and Environmental Science. Michigan Technological.
O/E: a standardized way to make site-specific assessments of biological condition Chuck Hawkins Western Center for Monitoring and Assessment of Freshwater.
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
By Zach Andersen Jon Durrant Jayson Talakai
METHODS FOR HAPLOTYPE RECONSTRUCTION
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Differences Among Beneficial Insect Populations in Sequential Corn Plantings by Mika J. Hunter.
Katie Reed EPSSA Methods Workshop. School environment New Latino destinations Immigrant Incorporation Importance of “context of reception” for immigrants’
Modeling silky shark bycatch
Fire Sync Data Analysis Christel’s Baby Steps to Temporal and Spatial Analyses.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Variance and covariance M contains the mean Sums of squares General additive models.
Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models A Collaborative Approach to Analyzing Stream Network Data Andrew A.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Influence of variation in individuals, and spatial and temporal variation in resource availability on population dynamics. Jalene M. LaMontagne.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Chapter 11 Multiple Regression.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
© Digital Worlds Embedding Geographical Information Systems into the Curriculum.
Why Geography is important.
Supply Chain Management (SCM) Forecasting 3
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Variance and covariance Sums of squares General linear models.
A PREDICTION APPROACH TO AGRICULTURAL LAND USE ESTIMATION Ambrosio L., Marín C., Iglesias L., Montañés J., Rubio L.A. Universidad Politécnica Madrid. Spain.
Analysis of Clustered and Longitudinal Data
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Objectives of Multiple Regression
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Aggregate and Systemic Components of Risk in Total Survey Error Models John L. Eltinge U.S. Bureau of Labor Statistics International Total Survey Error.
Simple Linear Regression
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.
Successful Concepts Study Rationale Literature Review Study Design Rationale for Intervention Eligibility Criteria Endpoint Measurement Tools.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 5.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Glenn Meyers ISO Innovative Analytics 2007 CAS Annual Meeting Estimating Loss Cost at the Address Level.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Statistics for Engineer. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems and design.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Workshop on Applied Hierarchical Modeling in BUGS and unmarked Patuxent Wildlife Research Center November 2015.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Demand Forecasting Prof. Ravikesh Srivastava Lecture-11.
Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.
Introduction to Multilevel Analysis Presented by Vijay Pillai.
1 ESSnet on Small Area Estimation Meeting no. 4 ESSnet on Small Area Estimation Meeting no. 4 Neuchatel, 7-8 July 2011 WP4: Software Tools.
Monitoring and Estimating Species Richness Paul F. Doherty, Jr. Fishery and Wildlife Biology Department Colorado State University Fort Collins, CO.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
India project not used. Summary: Areas of Potential Cooperation to Realize Sinks Potential Perform pilot project analyses: issues. Stimulate development.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Assessing Disclosure Risk in Microdata
Regression.
Simultaneous Inferences and Other Regression Topics
Simple Regression Mary M. Whiteside, PhD.
Linear Hierarchical Modelling
I can determine the different sampling techniques used in real life.
Statistics is… Mathematics: The tools used to analyze data and quantify uncertainty are mathematical in nature (e.g., probability, counting methods). English:
Presentation transcript:

Estimating relative species abundance from partially- observed data Melissa Dobbie| Statistician November 2015

Context  Helicoverpa – a serious pest of grain legumes, summer grains and cotton  2 species:  Helicoverpa punctigera (Hp; Australian bollworm)  H. armigera (Ha; cotton bollworm)  Historically controlled by insecticides => resistance + reductions in beneficial populations => allow other pests with fast life cycles to develop to damaging levels.  Bacillus thuringiensis (Bt) expressing cotton introduced in 1996 to help control these pests => big uptake by growers  Most research limited to field level  Objective: determine how the configuration and composition of landscape influences Helicoverpa population dynamics Source: Cotton CRC website Estimating relative species abundance | Melissa Dobbie 2 |

Study design Estimating relative species abundance | Melissa Dobbie 3 |

Data generation: 2 stages 1.Total unclassified counts of eggs for both target species are recorded 2.Observed species proportions for a subset of the counted eggs are recorded (e.g. collected eggs are hatched and specimens classified) Assumption: all eggs observed are exclusively classified into species of interest Estimating relative species abundance | Melissa Dobbie 4 |

Data summary Estimating relative species abundance | Melissa Dobbie 5 |

Exploratory data analysis (2) Estimating relative species abundance | Melissa Dobbie 6 |

Exploratory data analysis (3) Estimating relative species abundance | Melissa Dobbie 7 |

1.Using the observed species proportions, partition the total unclassified count into relative species abundances. 2.Develop appropriate models for the relative abundance of each species BUT …………. Fails to take into account small sample discreteness Occurrence of zero observations a problem Errors associated with observed proportions ignored Estimating relative species abundance | Melissa Dobbie Methods – Naïve approach 8 |

Estimating relative species abundance | Melissa Dobbie Methods – Proposed approach 9 | 1.Model the observed species proportions 2.a) Using the resulting predicted proportions, partition the total unclassified count into relative species abundances. b) Develop appropriate models for the relative abundance of each species PROS …………. Covariates can be incorporated into each part of the model Better preserves the handling of small sample discreteness and small abundances Standard software and model fitting tools are readily available

Aim: Empirical model – to smooth and interpolate the observed proportions by using the covariate space. Interpretation not of interest. Use logistic mixed effects model framework with Fixed effects: Land use, Crop development, Moon phase, etc. –Stepwise model selection on logistic regression model to reduce number of potential candidate predictors Random effects: mixture of spatial, temporal and spatial-temporal effects, guided by the hierarchical study design Estimating relative species abundance | Melissa Dobbie Methods –Step 1 Model observed species proportions 10 |

Estimating relative species abundance | Melissa Dobbie Methods –Step 2 Model relative species abundance 11 | Aim: 1.Predictive model – quantify the effect of landscape composition and configuration and other drivers on species abundance 2.Use standard software and existing model fitting tools Use linear mixed effects model framework with Response (1 species): log(fitted proportion * total count + 0.5) Fixed effects: Land use, Crop development, Moon phase, etc. –RandomForest modelling used to identify important variables Random effects: mixture of spatial, temporal and spatial-temporal effects, guided by the hierarchical study design

Estimating relative species abundance | Melissa Dobbie Results: Ha relative abundance 12 |

Estimating relative species abundance | Melissa Dobbie Results: Hp relative abundance 13 |

Estimating relative species abundance | Melissa Dobbie Discussion 14 | Our approach was modest improvement on naïve approach Simpler modelling approach : model observed species abundance?  limited inference and generalization (collected eggs capped and dependent on eggs counted - both varied between sampling units and within and between seasons)  inferences about covariates meaningful? More sophisticated and unified modelling approach : jointly model both stages of data generation?  Computationally challenging to fit  Bespoke programming would be required

Estimating relative species abundance | Melissa Dobbie Discussion 15 | These types of data commonly occur in studies where immature life stages of a species are of interest, but taxonomic resolution is not clear or directly identifiable until after further processing. Two examples of such studies arise  In botany, where seeds are the observation unit but speciation necessarily occurs at a later stage, and  In entomology, where species abundance is the primary focus but unclassified egg samples form the observation unit. Acknowledgements Bill Venables (CSIRO Data61), Cate Paull (CSIRO Agriculture), Nancy Schellhorn (CSIRO Agriculture)

Analytics group Melissa Dobbie Statistician t wwww.data61.csiro.au Thank you