Model Selection for Selectivity in Fisheries Stock Assessments André Punt, Felipe Hurtado-Ferro, Athol Whitten 13 March 2013; CAPAM Selectivity workshop.

Slides:



Advertisements
Similar presentations
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Advertisements

Are the apparent rapid declines in top pelagic predators real? Mark Maunder, Shelton Harley, Mike Hinton, and others IATTC.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
458 Quantifying Uncertainty using Classical Methods (Likelihood Profile, Bootstrapping) Fish 458, Lecture 12.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Estimating Growth Within Size-Structured Fishery Stock Assessments ( What is the State of the Art and What does the Future Look Like? ) ANDRÉ E PUNT, MALCOLM.
An evaluation of alternative binning approaches for composition data in integrated stock assessments Cole Monnahan, Sean Anderson, Felipe Hurtado, Kotaro.
Growth in Age-Structured Stock Assessment Models R.I.C. Chris Francis CAPAM Growth Workshop, La Jolla, November 3-7, 2014.
C3: Estimation of size-transition matrices with and without molt probability for Alaska golden king crab using tag–recapture data M.S.M. Siddeek, J. Zheng,
AIGKC Model North Pacific Fishery Management Council Crab Modeling Workshop Report.
Dealing with interactions between area and year Mark Maunder IATTC.
The current status of fisheries stock assessment Mark Maunder Inter-American Tropical Tuna Commission (IATTC) Center for the Advancement of Population.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
458 More on Model Building and Selection (Observation and process error; simulation testing and diagnostics) Fish 458, Lecture 15.
458 Model Uncertainty and Model Selection Fish 458, Lecture 13.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
458 Policies and Their Evaluation Fish 458, Lecture 22.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Hui-Hua Lee 1, Kevin R. Piner 1, Mark N. Maunder 2 Evaluation of traditional versus conditional fitting of von Bertalanffy growth functions 1 NOAA Fisheries,
The Calibration Process
458 Fitting models to data – III (More on Maximum Likelihood Estimation) Fish 458, Lecture 10.
Fishing Effort: fishery patterns from individual actions Dr. Darren M. Gillis, Biological Sciences, University Of Manitoba, Winnipeg,
The (potential) value and use of empirical estimates of selectivity in integrated assessments John Walter, Brian Linton, Will Patterson and Clay Porch.
Time-Varying vs. Non-Time- Varying Growth in the Gulf of Mexico King Mackerel Stock Assessment: a Case Study Southeast Fisheries Science Center Jeff Isely,
Overview Definition Hypothesis
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
WP4: Models to predict & test recovery strategies Cefas: Laurence Kell & John Pinnegar Univ. Aberdeen: Tara Marshall & Bruce McAdam.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Pacific Hake Management Strategy Evaluation Joint Technical Committee Northwest Fisheries Science Center, NOAA Pacific Biological Station, DFO School of.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Use of multiple selectivity patterns as a proxy for spatial structure Felipe Hurtado-Ferro 1, André E. Punt 1 & Kevin T. Hill 2 1 University of Washington,
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
A retrospective investigation of selectivity for Pacific halibut CAPAM Selectivity workshop 14 March, 2013 Ian Stewart & Steve Martell.
Evaluation of a practical method to estimate the variance parameter of random effects for time varying selectivity Hui-Hua Lee, Mark Maunder, Alexandre.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Evaluating Transportation Impacts of Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld Kouros Mohammadian Taha.
Integrating archival tag data into stock assessment models.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Flexible estimation of growth transition matrices: pdf parameters as non-linear functions of body length Richard McGarvey and John Feenstra CAPAM Workshop,
M.S.M. Siddeeka*, J. Zhenga, A.E. Puntb, and D. Pengillya
U.S. Department of Commerce | National Oceanic and Atmospheric Administration | NOAA Fisheries | Page 1 Model Misspecification and Diagnostics and the.
The effect of variable sampling efficiency on reliability of the observation error as a measure of uncertainty in abundance indices from scientific surveys.
Extending length-based models for data-limited fisheries into a state-space framework Merrill B. Rudd* and James T. Thorson *PhD Student, School of Aquatic.
Machine Learning 5. Parametric Methods.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Using distributions of likelihoods to diagnose parameter misspecification of integrated stock assessment models Jiangfeng Zhu * Shanghai Ocean University,
Estimation of selectivity in Stock Synthesis: lessons learned from the tuna stock assessment Shigehide Iwata* 1 Toshihde Kitakado* 2 Yukio Takeuchi* 1.
CAN DIAGNOSTIC TESTS HELP IDENTIFY WHAT MODEL STRUCTURE IS MISSPECIFIED? Felipe Carvalho 1, Mark N. Maunder 2,3, Yi-Jay Chang 1, Kevin R. Piner 4, Andre.
Capture-recapture Models for Open Populations “Single-age Models” 6.13 UF-2015.
Some Insights into Data Weighting in Integrated Stock Assessments André E. Punt 21 October 2015 Index-1 length-4.
Lecture 10 review Spatial sampling design –Systematic sampling is generally better than random sampling if the sampling universe has large-scale structure.
A bit of history Fry 1940s: ”virtual population”, “catch curve”
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Data weighting and data conflicts in fishery stock assessments Chris Francis Wellington, New Zealand CAPAM workshop, “ Data conflict and weighting, likelihood.
Influence of selectivity and size composition misfit on the scaling of population estimates and possible solutions: an example with north Pacific albacore.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
NWFSC A short course on data weighting and process error in Stock Synthesis Allan Hicks CAPAM workshop October 19, 2015.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Is down weighting composition data adequate to deal with model misspecification or do we need to fix the model? Sheng-Ping Wang, Mark N. Maunder National.
CS479/679 Pattern Recognition Dr. George Bebis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Cross-validation for the selection of statistical models
Presentation transcript:

Model Selection for Selectivity in Fisheries Stock Assessments André Punt, Felipe Hurtado-Ferro, Athol Whitten 13 March 2013; CAPAM Selectivity workshop

Overview What is the problem we want to solve? Can selectivity be estimated anyway? Fleets and how we choose them Example assessments Alternative methods: – fit diagnostics – model selection and model weighting What do simulation studies tell us? Final thoughts 2

Definitions of Selectivity 3 Selectivity :  Is the relative probability of being captured by a fleet (as a function of age / length)  Depends on how “fleet” is defined Selectivity is NOT: Gear selectivity Availability

Some of the key questions-I 4 Should there be multiple fleets and, if so, how do we choose them choose them? More fleets (may) make the assumption of time- invariant selectivity more valid. More fleets lead to more parameters (and potentially model instability).

Some of the key questions-II 5 Given a fleet structure: What functional form to assume? Should selectivity change with time? Parametric or non-parameteric?

Some of the key questions-III 6 Given time-varying selectivity: Blocked or unblocked Which parameters of the selectivity function (or all) should change? AnnualFive-year blocks Age-at-50% selex

Caveat – Can selectivity be estimated anyway-I? 7 Selectivity is confounded with: Trends in recruitment (with time) Trends in natural mortality (with age / time)

Caveat – Can selectivity be estimated anyway-II? 8 Age Low recruitment? Low selectivity? Declining recruitment? Declining selectivity? High F

Caveat – Can selectivity be estimated anyway-III? 9 Fit of various selectivity-related models to a theoretical age- composition.

Caveat – Can selectivity be estimated anyway-III? 10 The Solution: MAKE ASSUMPTIONS: Natural mortality is time- and age-invariant Selectivity follows a functional form. Selectivity is non-parametric, but there are penalties on changes in selectivity with age/ length

Example Stocks 11 Pink ling Pacific sardine

Example Stocks (fleet structure)

Example Stocks (fleet structure) 13 Pink Ling One fleet or many Fleets: Trawl vs Non-trawl Zones 10,20,30 Onboard vs port samples

Sensitivity to Assumptions 14 Largest impacts: Is selectivity time-varying or static? Number of fleets / treatment of spatial structure Is selectivity asymptotic or dome-shaped?

Selection of Fleets 15 Definition: Ideally – group of vessels fishing in the same spatio- temporal stratum using the same gear and with the same targeting practices In practice – depends on data availability, computational resources, model stability, trends in monitored data.

Fleets as areas-I 16 It is common to represent “space” by “fleets” (e.g. pink ling): what does this assume? does it work? Key Assumptions: The population is fully mixed over its range Differences in age / length compositions are due to differences in selectivity.

Fleets as areas-II (does it work) 17 In theory “no” – in practice “perhaps”! Cope and Punt (2011) Fish Res. 107: Clearly, the differences in length and age structure among regions is due to differences in population structure; not selectivity! Self- evidently then the approach is wrong Simulations suggest that treating fleets as areas can reduce bias (Ferro-Hurtado et al.) but that spatial models may perform better (if the data exist – and perhaps not) but M probably isn’t age and time-invariant either!

The State of the Art (as I see it) Disaggregate data when including them in any assessment (it is easy to aggregate the data when fitting the model). Test for fleet structure early in the model development process. Apply clustering-type methods to combine areas / gear types (not statistical tests, which will lead to 100s of fleets). 18

Residual Analysis 19 In principle this is easy: Plot the data Compute some statistics Compare alternative assumptions… EBS Tanner crab

20 We know how to do this for index data (well) It gets trickier for compositional data (and hence selecting functional forms for selectivity) Fits to aggregated length data for pink ling when selectivity is assumed to be independent of zone

BUT! Evaluating mis-specification for compositional data is usually not this easy: – The fit may be correct “on average” but there are clear problems. – It may not be clear whether the model is mis- specified 21

22 Is this acceptable? And this?

BUT! Evaluating mis-specification for compositional data is usually not this easy: – The fit may be correct “on average” but there are clear problems. – It may not be clear whether the model is mis-specified Comparing time-varying and static selectivity can be even more challenging because it depends on how much selectivity can vary [Maunder and Harley identify an approach based on cross- validation to help with this] 23

Using profiles to identify mis-specification 24 Spatially-disaggregated Spatially-aggregated Plot the negative log-likelihood [compositional data only] for each fleet to identify fleets whose compositional data are “unduly” informative Fleets 2 and 13 (left) and 2 and 5 (right): fleet 13 (a) and 5 (b) are the same fleet and have only two length-frequencies… Should we learn this much?

Automatic Residual Analysis 25 Punt & Kinzey: NPFMC crab modelling workshop Two sample Kologorov- Smirnov test applied to artificial data sets

The State of the Art (as I see it)-I Always: – examine plots of residuals – compare expected effective sample sizes with input values But: – Viewing plots of residuals can be difficult – How to define / test for time-varying selectivity is tough – Residual patterns in fits to compositions need not be due to choices related to selectivity – There is no automatic approach for evaluating residuals plots for compositional data. – No testing of methods based on residual plots has occurred (yet?) 26

The State of the Art (as I see it)-II 27 Aggregated compositions Observed vs expected compositions

Model Selection 28 No-one would say that model selection (and model averaging) are not part of the tool box of analysts BUT do we know how well they work for stock assessment models? Model selection methods used: Maximum Likelihood F-tests / likelihood ratio tests AIC, BIC, AIC c Bayesian DIC

Examples of Model Selection AIC: – Butterworth et al. [2003]: is selectivity for southern bluefin tuna time-varying? – Butterworth & Rademeyer [2008]: is selectivity for Gulf of Maine cod dome-shaped or asymptotic ? DIC – Bogards et al. [2009]: is selecticity for North Sea spatially-varying or not? 29

Examples of Model Selection (Issues) AIC, BIC and DIC are too subtle: – Often fits for two models are negligibly different “by eye”, but highly “statistically significant” (  AIC>200). All these metrics depend on getting the likelihood “right”, in particular the effective sample sizes for the compositional data. 30

Model Selection and weights 31 So which model fits the data best? And if we accidentally copied the data file twice?

Effective Sample Sizes-I 32 Many assessments: Pre-specify EffNs. Use the “McAllister-Ianelli” approach. But Residuals are seldom independent An alternative is Chris Francis’ approach, but that may fail when there is time-varying selectivity.

Effective Sample Sizes-II Maunder [2011] compared various likelihood formulations including : – Multinomial – Fournier et al. with observed rather than expected proportions – Punt-Kennedy (with observed proportions)* – Dirichlet – Iterative (essentially the “McAllister-Ianelli” method) – Multivariate normal 33 Estimated effective sample size

AIC, BIC and Random Effects 34 Most (almost all) assessments using an “errors in variables” formulation of the likelihood function: rather than the correct (marginal) likelihood: How this impacts the performance of model selection methods is unknown.

The State of the Art (as I see it) AIC, BIC, and DIC are commonly used. But: – Do we need an analogue to the “1% rule” as is the case for CPUE standardization? – We need to get the effective sample sizes right! Using a likelihood function for which the effective sample size can be estimated is a good start! – Performance also depends on treatment of random effects (recruitment, selectivity) What is the value of looking at retrospective patterns? Can we identify when the cause of a retrospective pattern is definitely selectivity? 35

Simulation Testing 36 Operating Model Method 1Method nMethod 1Method n ….. Model Selection Performance measures

Simulation Testing Caveats before we start: – Simulations are only as good as the operating model Most simulation studies assume that the likelihood function is known (as is M) Few simulation studies allow for over-dispersion. No simulation studies simulate the “meta” aspects of stock assessments (such as how fleets are selected). – Avoid too many generalizations – most properties of estimators will be case-specific 37

Overdispersal? 38 How often do the data generated in simulation studies look like this? How much does it matter?

Overview of Broad Results Getting selectivity assumptions wrong matters! HOWEVER, other factors (data quality, contrast, M) may be MORE important. Estimating time-varying selectivity when selectivity is static is safer than ignoring it when selectivity is time-varying. Model selection methods can discriminate among selectivity functions very well (do I really believe this – why then does it seem so hard in reality?) 39

The State of the Art (as I see it) The structure of most (perhaps all) operating models is too simple and leads to simulated data sets looking “too good” – Andre’s suggestion: if you show someone 99 simulated data sets and the real data set, could they pick it out? Future simulation studies should: – Include model and fleet selection. – Focus on length-structured models. – Examine whether selectivity is length or age-based. 40

Final Thoughts 41 Methods development – Non-additive models? – State-space models? Residuals and model selection – Weighting philosophy Simulation studies – Standards for what constitutes a “decent” operating model? – Compare methods for implementing time-varying selectivity (blocked vs annual) – Consider length-structured models

Final Thoughts 42 Ignore “space” at your peril! What about model mis-specification in general.

Final Points to Ponder! Should guidelines be developed for when to: – downweight compositional data rather than modelling time-varying selectivity – fix selectivity and not estimate it! – use retrospective patterns in model selection / bootstrapping – conduct model selection when the selectivity pattern is “non-parameteric” – apply time-varying selectivity Model selection Fixing / estimating sigma – trump AIC, BIC and DIC using “by eye” residual patterns. 43

44 Questions? Support for this paper was provided by NOAA: The West Coast Groundfish project Development of ADMB libraries Simulation testing of assessment models