Introduction to Program MARK

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Analysis of variance (ANOVA)-the General Linear Model (GLM)
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
POPULATION ANALYSIS IN WILDLIFE BIOLOGY
The General Linear Model. The Simple Linear Model Linear Regression.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Multiple Linear Regression Model
Lecture 23: Tues., Dec. 2 Today: Thursday:
Topic 3: Regression.
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Analysis of Clustered and Longitudinal Data
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
The Practice of Social Research
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
Hypothesis Testing in Linear Regression Analysis
CLOSED CAPTURE-RECAPTURE
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Chapter 14 Introduction to Multiple Regression
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
BRIEF REVIEW OF STATISTICAL CONCEPTS AND METHODS.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Multiple Logistic Regression STAT E-150 Statistical Methods.
Correlation & Regression Analysis
BRIEF INTRODUCTION TO ROBUST DESIGN CAPTURE-RECAPTURE.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Population dynamics of aquatic top predators: effects of harvesting regimes and environmental factors Project leader: Professor Nils Chr. Stenseth Post-doc:
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
ESTIMATION OF ANIMAL VITAL RATES WITH KNOWN FATE STUDIES ALL MARKED ANIMALS DETECTED.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Capture-recapture Models for Open Populations “Single-age Models” 6.13 UF-2015.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
K-Sample Closed Capture-recapture Models UF 2015.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
 Occupancy Model Extensions. Number of Patches or Sample Units Unknown, Single Season So far have assumed the number of sampling units in the population.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Stats 242.3(02) Statistical Theory and Methodology.
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Notes on Logistic Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Wildlife Population Analysis What are those βs anyway?
Correlation and Regression
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
BASIC REGRESSION CONCEPTS
Multistate models Lecture 10.
Presentation transcript:

Introduction to Program MARK Stephen J. Dinsmore Iowa State University

Lecture outline Introduction – modeling and inference Parameters Model structure The input file PIMs, design matrices and more in MARK Analysis tips

Motivation for modeling What are your goals? To just “analyze data”, or do you seek a deeper understanding of a complex process? What questions are you interested in answering? How will you use the information? By definition, a model is an approximation of truth and not truth itself!

Introduction Models, modeling, and estimation Process: capture, tag, release, recapture The “art” is balancing effort across each of these categories

Population characteristics Open versus closed populations Assumptions Results of assumption violations Understanding this distinction is a critical step in the modeling process.

Methods for marking Leg bands, neck collars Standard i.d. tags PIT tags Radio collars/transmitters Camera “traps”

Encounter techniques Live resightings (mainly birds) Live captures (sturgeon, many others) Dead recoveries (waterfowl)

Summarizing encounters Release and recapture data for each animal are summarized in an encounter history. A separate encounter history should be constructed for each animal. Encounter histories consist of strings of 1’s (animal was encountered) and 0’s (not encountered) in most cases.

What can we estimate? Survival (S; or apparent survival ) Population size (N) Emigration/immigration (γ″, γ′) Movement probabilities (δ) Reproduction/recruitment (F) Rate of population change (λ) Occupancy rate (ψ)

Models in MARK Live encounters (Cormack-Jolly-Seber) Dead recoveries (band recovery) Joint live and dead encounters Known fate (radio telemetry) Closed captures Robust design Multi-strata Pradel lambda models Patch occupancy Nest survival And the list is still growing…

Features of MARK Parameter estimation (model averaging) Multiple attribute groups (age, sex classes) Individual, group, and time covariates Unequal time intervals AIC model selection Quasi-likelihood theory (over-dispersion)

Questions?

Basic data – encounter histories LLLL Live recaptures, known fate This example codes for 4 occasions LDLD Joint live-dead recoveries This example codes for 2 occasions

Live encounters Example of possible outcomes Seen Release Dead or emigrated Live Not seen p 1-p  1- 

Live encounters Example – 5 encounter occasions LLLLL (1=encountered, 0=not encountered) Estimate  (apparent survival; time t to time t-1) and p (conditional capture probability; time 2 to time t). Last  and last p are confounded without some constraint on one of them (MARK reports them as a product).

Live encounters Example encounter histories 11111 - 1p22p33p44p5

Model assumptions Tagged individuals are representative of the population of interest. Numbers of releases are known. Tagging is accurate, no tag loss, no misread tags, etc. Releases are “instantaneous” (relative to time interval between releases). Fates of individuals and cohorts are independent. Individuals in each identifiable group (age or sex class, etc.) have the same survival and capture probability.

Dead recoveries Example of possible outcomes Reported Not reported Release Live Die S 1-S r 1-r

Dead recoveries Example – 3 encounter occasions LDLDLD Estimate survival (S) and reporting probability (r).

Dead recoveries Example encounter histories 100001 - S1S2(1-S3)r3 000010 - (1-S3)(1- r3)+S3

Joint live-dead Example of possible outcomes Release Live Die S 1-S Reported r 1-r Not reported Seen p 1-p Not seen

Joint live-dead Example – 3 encounter occasions LDLDLD Estimate survival (S), reporting probability (r), capture probability (p), and fidelity (F).

Known fate Mainly used for radio telemetry data where capture probability is 1.0. Example of possible outcomes Live Release Die S 1-S

Known fate Example – 4 encounter occasions LDLDLDLD Estimate survival (S) only

Known fate Example encounter histories 10101010 - S1S2S3S4

Closed captures Example of possible outcomes Seen c Seen Not seen p Release Seen p 1-p Not seen 1-p Not seen

Closed captures Example – 4 encounter occasions LLLL Estimate initial capture (p) and recapture (c) probabilities and population size (N).

Closed captures Example encounter histories 1111 - p1c2c3c4 0101 - (1-p1)p2(1-c3)c4

Robust design Useful model that incorporates features of open and closed C-R theory. Can estimate all survival rates (i-1), not just i-2 as with CJS model. Estimate of population size for each primary sampling period. Can estimate temporary emigration (γ).

Robust design P, c, N S γ″ γ′ P, c, N

Summary Quick introduction to basic models and their structure. Next: introduction to MARK including the input file, basic modeling (PIMs and design matrices), model selection, and inference.

Questions?

Getting started in MARK The input file Required: encounter history, frequency, always ends with a ; Optional: comment area (/* comment */), covariates Can be coded as 1 individual per line, or summarized with multiple individuals per line or in m-arrays

Examples of input data CJS model 1111110 1 0 ; Robust design 1111110 1 0 ; Robust design 001100001001000000 1 0 0 0 0 0 84; Nest survival /*BYWG B-013 99-085 */ 14 18 21 1 1;

Program MARK Many of the procedures I’ll demonstrate can be done in more than one way in MARK! Examples include building models using PIMs or design matrices (I prefer the latter) and selecting model(s) for inference.

MARK vocabulary PIMs (Parameter Index Matrices) Design matrices Link functions AIC (Akaike’s Information Criterion) Model selection

A priori biological hypotheses Model building in MARK A priori biological hypotheses PIM design matrix (β) link function real parameters

About PIMs PIMs provide one means of constraining the parameters in a model. Each PIM indexes a different parameter for each group (for live recaptures data with 3 groups, there will be 3 PIMs for apparent survival and 3 PIMs for recapture probability). Remember: the values in the PIMs correspond to estimable parameters and NOT the number of occasions. I recommend leaving the PIMs alone, unless you need to add age effects.

PIMs Cohort 1 1 2 3 4 5 2 2 3 4 5 3 3 4 5 4 4 5 5 5    

PIMs From the previous slide, this is what the PIM would look like in MARK for 5 occasions and 4 estimates of  1 2 3 4 2 3 4 3 4 4

More PIMs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

More PIMs 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 6 7 8 9 2 7 8 9 3 8 9 4 9 5

Design matrices A useful way to further “constrain” the parameters as they appear in the PIMs. The only way to introduce time trends (linear, etc.) and covariates into models. The structure of the design matrix will depend on constraints placed in the PIMs.

Design matrices Basic concept: MARK allows the user to apply a linear regression model as a constraint on any parameter (e.g., survival) with the use of a design matrix. Here, the response variable (a rate such as survival) is expressed as a linear regression function of 1 or more factors.

Design matrices Basic linear model is Y = Xβ + ε Y is the response variable (e.g., survival) X is a vector of “dummy” variables (1’s and 0’s) β is the slope ε is a vector of random error terms

Design matrix - example Suppose we want to determine if male and female Mourning Doves have different survival rates. In linear regression, we have Yi = β0 + β1xi + εi Each variate Yi is the sum of the intercept (β0), the product of the slope (β1) and the variable x (xi), and the random error term (εi). But what is xi? It is a “dummy” variable specifying sex (for example, 0 for male, 1 for female). The test is whether the slope (β1) is different from zero. If β1 is not different from 0, then no sex effect.

Design matrices In MARK, rows in the design matrix correspond the parameters set in the PIMs and columns correspond to the βi. You cannot add any structure to the design matrix that is missing from the PIMs (hence my preference to leave the PIMs alone except to add age effects).

Link functions The rates (e.g., survival) in the linear regression model must be transformed in MARK. Several transformations are available, each having different properties. In MARK, we will primarily use the logit and sin link functions. MARK is good at setting the default link function for you.

Likelihoods Likelihood = Pr{encounter history}observed Loge(likelihood) = observed*loge[Pr{encounter history}]

Covariates Individual - some unique characteristic of each individual in the population such as body mass at capture or fork length. Group– a characteristic of the entire group such as sex or age class. Time – a unique time-specific characteristic such as river flow data or temperature.

Covariates Every study should be incorporating covariates! Some recommendations: Individual covariates often apply to survival (e.g., mass at capture, size measures, fitness measures, habitat, etc.). Group covariates can affect survival (e.g., weather) or capture probability (e.g., effort). Time covariates often influence capture.

Goodness-of-fit This is an area where further work is needed. Best overall goodness-of-fit test is in Program RELEASE (included in MARK). GOF for CJS model based on results of Tests 2 and 3 in RELEASE. No good GOF tests for complex models. Ad hoc procedure for robust design.

AIC model selection AIC = Akaike’s Information Criterion From Information Theory, which is one of many ways to objectively assess the relative importance of a set of models. Remember – the AIC best model is not “the model”, but rather is the model within the set that had the best support, given the data.

AIC model selection AIC = -2ln(L) +2K L is the likelihood of the model and K is the number of parameters in that model. A smaller log likelihood means a better fit. The +2K term is a “penalty” for adding more parameters, although this is balanced by an improved model fit. Message: There is an important trade-off between fit and # of parameters, and AIC provides an objective means of balancing this.

Quasi-likelihood theory QAIC - a way to account for over-dispersion in the data. Over-dispersion results from a lack of independence, e.g., animals that travel in family groups. In MARK, we use ad hoc procedures to estimate c (a variance inflation factor). Result: variance is inflated.

Model averaging Which parameter estimate do you report when you have estimates from 10 models? The estimate from the best model? All estimates? Or, some “average”? Model averaging incorporates this model selection uncertainty into parameter estimates. Best used when there are several competing models (Δ-AIC <2).

Number of parameters With complex models, MARK has a hard time correctly counting the number of parameters when parameter estimates are close to a boundary (e.g., near 0 or 1). Sin link function is best, logit link function sometimes performs poorly. Message: always check MARK to be sure parameters are counted correctly.

Model notation Describe models concisely (limited space in MARK). Some basic nomenclature: Full time variation (t) Linear time trends (T) No variation (.) Additive effects (t+temperature) Multiplicative effects (group*t)

Model notation Examples  (.) means 1 = 2 = 3 …  (t) means 1, 2, … t-1  (t+Mass) means 1, 2, … t-1 are each a function of body mass

Model notation Keep model names simple, but descriptive Parameters are written with sources of variation listed in parentheses.  (t) p (t)  (T+Weight) p (t+effort)  (t*group) p (T)

How do we “test” for effects? For example, how would we know that weight influenced the survival of bird? Need to consider models with and without weight. Model selection results: Are models with weight among the “best” in the model set? Look at the β for weight – does its confidence interval overlap zero? Likelihood ratio tests Remember this is all conditional on the model set.

Developing candidate models Inference is conditional on the set of models we consider. Considerable effort should go into developing a concise set of models for consideration. How many? Typically, 5-20 models will suffice. Models should address realistic questions and should not include factors known to be unimportant. Message: use what you already know.

Study design considerations Trade-off between sample marked and recapture probability. What is an adequate sample size? Consider question to be asked – estimate population size, or survival, or lambda?

Discussion Additional discussion topics: Model assumptions and the results of assumption violations. Developing the set of candidate models. Selecting the appropriate model for analyses. Others? This afternoon – an example in MARK.

Models – a review

Patch occupancy Presence-absence data Define a “patch” – ponds, islands, plots, etc. Multiple visits to each site Assumes closure during sampling period Parameters: Ψ p ε and γ (robust design only)

Patch occupancy Modeling details: Robust design formulations: Handles missing visits (coded as “.” in EH) Covariates Robust design formulations: Psi and epsilon Psi and gamma Psi(1), epsilon, gamma

Nest survival Required data for each nest: The day the nest was found (k). The last day the nest was checked alive (l). The last day the nest was checked (m). Nest fate (0 = successful, 1 = failure) (f). The number of nests with this encounter history.

Input file – example Nest survival group=1; 1 5 11 1 1 71 1.24; 1 5 11 1 1 71 1.24; 6 9 9 0 1 60 1.90; 10 26 29 1 1 68 2.21; 11 19 24 1 1 65 1.88; 15 22 22 0 1 65 2.03; Nest survival group=2; 4 8 8 0 1 72 1.99; 7 18 21 1 1 63 1.40; 17 30 30 0 1 67 1.77; 19 28 33 1 1 70 1.28;

Coding the data Coding the triplet k, l, and m: k=1, l=3, m=5, fate=1 → S1S2 [1-S3S4] k=1, l=3, m=3, fate=0 → S1S2 k=1, l=3, m=3, fate=1 is invalid (can’t be alive and dead on day 3) k=1, l=1, m=3, fate=1 → [1-S1S2] k=1, l=1, m=1, fate=0 or 1 is invalid (nest was active only on day 1) See MARK help file for more details

Model assumptions Homogeneity of daily nest survival rates. Nest fates are independent. All visits to nests are recorded. Nest discovery and subsequent checks do not influence nest survival. Nest checks are independent of fate. Nest fates are correctly determined. Age of nest of at discovery.

Estimate nesting success? For constant nest survival, period success is DSR exponentiated to period length What happens if there is: Temporal variation in nest survival? Covariates? A combination of both?

Temporal variation Which 10-day interval provides the “best” estimate of nest success? 10 days 10 days

Getting the “best” estimate Need a start date for “best” estimate – when? Does simple mean work? What about bias between observed and true nest initiation dates? Use Horvitz-Thompson estimator to correct for this bias.

Other considerations Stage-specific survival Divide EH into parts: Incubation – 1 10 10 0 1; Nestling – 10 20 22 1 1; Nest age

Model-based predictions MARK provides a regression equation that can be used for predictions Logit Smale = 3.53+0.28*1, Smale = 0.9784 Logit Sfemale = 3.53+0.28*0, Sfemale = 0.9716