TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Objectives 10.1 Simple linear regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Correlation and regression
Objectives (BPS chapter 24)
Simple Linear Regression
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Multiple regression analysis
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Midterm Review Goodness of Fit and Predictive Accuracy
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Simple Linear Regression Analysis
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Objectives of Multiple Regression
Introduction to Multilevel Modeling Using SPSS
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Day 7 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
Regression Analysis (2)
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Detecting trends in dragonfly data - Difficulties and opportunities - Arco van Strien Statistics Netherlands (CBS) Introduction.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
CHAPTER 16: Inference in Practice ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Lesson Inference for Regression. Knowledge Objectives Identify the conditions necessary to do inference for regression. Explain what is meant by.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
Chapter 13 Multiple Regression
Adjusted from slides attributed to Andrew Ainsworth
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Chapter 8: Simple Linear Regression Yang Zhenlin.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Stats Methods at IC Lecture 3: Regression.
Missing data: Why you should care about it and what to do about it
CHAPTER 29: Multiple Regression*
LESSON 24: INFERENCES USING REGRESSION
6-1 Introduction To Empirical Models
Simple Linear Regression
Introductory Statistics
Presentation transcript:

TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)

What is TRIM? TRends and Indices for Monitoring data Computer program for the analysis of time series of count data with missing observations Loglinear, Poisson regression (GLM) Made for the production of wildlife statistics by Statistics Netherlands (Jeroen Pannekoek / freeware / version 3.0) Introduction

Why TRIM? To get better indices? No, GLM in statistical packages (Splus, Genstat...) may produce similar results But statistical packages are often unpractical for large datasets TRIM is more easy to use Introduction

The program of this workshop Aim: a basic understanding of TRIM basic theory of imputation how to use TRIM to impute missing counts and to assess indices etc. basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Introduction

INDEX: the total (= sum of al sites) for a year divided by the total of the base year

Missing values affect indices Theory imputation

How to impute missing values? ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1 (site & year effect taken into account) Theory imputation

Another example.. ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR Theory imputation

And another example... ESTIMATION OF SITE 2 IN YEAR 2? SITE1 SUGGESTS: THREE TIMES AS MANY AS IN YEAR Theory imputation

Try this one….. THERE IS NOT A SINGLE SOLUTION (TRIM will prompt an ERROR) Theory imputation

Difficult to guess missings here.. Theory imputation

Estimating missing values by an iterative procedure (REQUIRED IN CASE OF MORE THAN A FEW MISSING VALUES) Theory imputation

RECALCULATE THE MARGIN TOTALS AND REPEAT ESTIMATION OF MISSING First estimate of site 2, year 2: 1 X 4/7 = 0.6 >>0.6 >>4.6 >>1.6 >>7.6 Theory imputation

REPEAT AGAIN: MISSING VALUE = 1.22, 1.40, 1.54 ETC. … >> 2 2nd estimate of site 2, year 2: 1.6 X 4.6/7.6 = 0.96 Theory imputation

To get proper indices, it is necessary to estimate (impute) missings Missings may be estimated from the margin totals using an iterative procedure (taking into account both site effect as year effect) (Note: TRIM uses a much faster algorithm to impute missing values). Assumption: year-to-year changes are similar for all sites (assumption will be relaxed later!) Test this assumption using a Goodness-of-fit (X 2 test) Theory imputation

(2.8) (4.2) (1.2) (1.8) X 2 : COMPARE EXPECTED COUNTS WITH REAL COUNTS PER CELL X 2 IS SUMMATION OF (COUNTED - EXPECTED VALUE) 2 / EXP. VALUE (2-1.8) 2 /1.8 + (4-4.2) 2 /4.2 ETC. >> X 2 = 0.08 WITH A P-VALUE OF 0.78 >> MODEL NOT REJECTED (FITS, but note: cell values in this example are too small for a proper X 2 test) Theory imputation

Imputation without covariate (X 2 = 18 and p-value = 0.18) Theory imputation

Using a covariate: better imputa- tions & indices, X 2 = 1.7 p = 0.99 Theory imputation

What is the best model? < not rejected <<< rejected < not rejected Both model 2 and 3 are valid Theory imputation

Summary imputation theory To get proper indices, it is necessary to impute missings Assumption: year-to-year changes are similar for all sites of the same covariate category Test assumption using a GOF test; if p-value < 0.05, try better covariates If these cannot be found, the resulting indices may be of low quality (and standard errors high). See also FAQ’s! Theory imputation

The program of this workshop Aim: a basic understanding of TRIM basic theory of imputation how to use TRIM to impute missing counts and to assess indices etc. basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weigh particular sites Using TRIM

several statistical models (time effects, linear model) statistical complications (overdispersion, serial correlation) taken into account Wald tests to test significances model versus imputed indices interpretation of slope Using TRIM

Time effects model (skylark data) without covariate Using TRIM

Time effects model with covariate 0 = total 1= dunes 2 = heathland Using TRIM

Lineair trend model ( uses trend estimate to impute missing values) Using TRIM

Lineair trend model with a changepoint at year 2 Using TRIM

Lineair trend model with changepoints at year 2 and 3 Using TRIM

Lineair trend model with all changepoints = time effects model Use lineair trend model when: data are too sparse for the time effects model one is interested in testing trends, e.g. trends before and after a particular year (or let TRIM stepwise search for relevant changepoints) But be careful with simple linear models! Using TRIM

Statistical complications: Serial correlation: dependence of counts of earlier years (0 = no corr.) Overdispersion: deviation from Poisson distribution (1 = Poisson) Using TRIM Run TRIM with overdispersion = on and serial correlation = on, else standard errors and statistical tests are usually invalid

Running TRIM features trim command file output: GOF (as X 2 ) test and Wald tests output (fitted values, indices) indices, time totals overall trend slope Frequently Asked Questions different models (lineair trend model, changepoints, covariate) Using TRIM

Both 2 and 3 are valid. Model 3 is the most sparse model. What is the best model?

Model choice The indices depend on the statistical model! TRIM allows to search for the best model using GOF test, Akaikes Information Criterion and Wald tests In case of substantial overdispersion, one has to rely on the Wald tests Using TRIM

Wald tests Different Wald-tests to test for the significance of: the trend slope parameters changes in the slope deviations from a linear trend the effect of each covariate Using TRIM

TRIM generates both model indices and imputed indices Using TRIM

Imputed vs model indices Imputed indices: summation of real counts plus - for missing counts - model predictions. Closer to real counts (more realistic course in time) Model indices: summation of model predictions of all sites. Often more stable Using TRIM Usually Model and Imputed Indices hardly differ!

TRIM computes both additive and multiplicative slopes Additive + s.e. Multiplicative + s.e Relation: ln(1,0497) = Using TRIM Multiplicative parameters are easier to understand

Interpretation multiplicative slope Slope of 1.05 means 5% increase a year Using TRIM Standard error of means a confidence interval of 2 x = Thus, slope between and Or, 2% to 8% increase a year = significant different from 1

Summary use of TRIM: choice between time effects and linear trend model include overdispersion & serial correlation in models use GOF and Wald tests for better models and indices & to test hypotheses choice between model and imputed indices use multiplicative slope Using TRIM

The program of this workshop Aim: a basic understanding of TRIM basic theory of imputation how to use TRIM to impute missing counts and to assess indices etc. basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Weighting

Unequal sampling due to stratified random site selection, with oversampling of particular strata. Weighting results in unbiased national indices site selection by the free choice of observers, with oversampling of particular regions & attractive habitat types. Weighting reduces the bias of indices. Weighting

To cope with unequal sampling. stratify the data, e.g. into regions and habitat types strata are to be expected to have different indices & trends weigh strata according to (1) the number of sample sites in the stratum and (2) the area surface of the stratum or weigh by population size per stratum Weighting

Weighting factor for each stratum Weighting factor for stratum i = total area of i / area of i sampled Weighting or 10 or 5

Another example.. Weighting factor for stratum i = total area of i / area of i sampled Weighting 100/5= 20 (or 4) 50/10=5 (or 1)

Weighting in TRIM include weight factor (different per stratum) in data file for each site and year record weight strata and combine the results to produce a weighted total (= run TRIM with weighting = on and covariate = on) Weighting

Indices for Skylark unweighted (0 = total index 1= dunes 2 = heath-land) Weighting

Indices for Skylark with weight factor for each dune site = 10 (0 = total index 1= dunes 2 = heathland) Weighting

Final remarks To facilitate the calculation of many indices on a routine basis TRIM in batch mode, using TRIM Command Language (see manual) Option to incorporate TRIM in your own automation system (Access or Delphi or so) (not in manual)

That’s all, but: if you have any questions about TRIM, see the manual, the FAQ’s in TRIM or mail Arco van Strien Success!