TigerStat ECOTS 2014. Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.

Slides:



Advertisements
Similar presentations
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Advertisements

Inference for Regression
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Simple Linear Regression and Correlation
Heteroskedasticity The Problem:
TigerStat: An Immersive 3-D Game for Statistics Classes Rod Sturdivant, John Jackson, Kevin Cummiskey Department of Mathematical Sciences, USMA West Point.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
INTERPRETATION OF A REGRESSION EQUATION
Objectives (BPS chapter 24)
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Lecture 24: Thurs., April 8th
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Interpreting Bi-variate OLS Regression
Business Statistics - QBM117 Statistical inference for regression.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
Correlation and Regression Analysis
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Returning to Consumption
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Introduction to Linear Regression
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Econ 314: Project 1 Answers and Questions Examining the Growth Data Trends, Cycles, and Turning Points.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
BPS - 5th Ed. Chapter 231 Inference for Regression.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
Inference for Least Squares Lines
CHAPTER 29: Multiple Regression*
QM222 Class 15 Section D1 Review for test Multicollinearity
Presentation transcript:

TigerStat ECOTS 2014

Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution of the population is important to ensure sustainability Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution of the population is important to ensure sustainability Real World Problem

Lab Materials

PLAYING THE GAME NOTE: NO TIGERS are hurt in the playing of this game

DURING GAME PLAY encourages thinking about the sample size encourages considering representativeness DATA COLLECTED UPDATES

Literature review Article from NATURE How to estimate age of LIONS Similar issue – how to ensure a sustainable population of lions Literature review Article from NATURE How to estimate age of LIONS Similar issue – how to ensure a sustainable population of lions

Research question and plan Do techniques for estimating lion age apply to tigers? To collect a sample and test model what issues must be considered? How many tigers to sample? What data should we collect? How do we use our data to answer the question? Lion model Percentage of black on the nose (Sample of 63 females)

Looking at the data Plot variables against AGE What appears to be the best predictor? Produce a simple regression model for AGE Is the predictor significant? What is the estimated coefficient?

Looking at the SLOPE How much variability are there in estimated slopes? How much does this matter? Are all statistically significant? What does this mean? What is “practical significance” in this setting? What does your model predict for a tiger with 50% nose black? For 10%? 90%? How much of an increase in AGE does your model suggest for an increase of 25% nose black? How do your answers compare to your neighbor?

Looking at the MODEL Produce some diagnostics for your simple regression model for AGE What is the R 2 value? What does this tell you? Is the the model appropriate? What issues (if any) do you see and how would you propose fixing? If there is an issue, how might sampling play a role in this? Idea DISTRIBUTION of slopes! (easy to show – histogram of class values) Recognition of significance level meaning (i.e. 5% type-1 error) Prediction vs. explaining

Example “One student” (15 tigers) Linear fit reasonable? Source | SS df MS Number of obs = F( 1, 13) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] noseblack | _cons |

Examining model fit Residuals, leverage, influence diagnostics  Pattern?  Outlier?  Influential Point?

Fit removing outlier Slight increase in R 2 (from ) Slope coefficient decrease of 8% (from 12.74) Source | SS df MS Number of obs = F( 1, 12) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] noseblack | _cons |

REAL questions Enough evidence to reject model fit? Heteroskedasticity? Would you try a transformation (without having the Nature article)? What is the model used for – is it “good enough”? Is the data “good enough”? EVERY STUDENT HAS DIFFERENT DATA, DIFFERENT ISSUES and (potentially) DIFFERENT MODELS!!!!

Transform the data using the proposal from the nature article  Easy to create a new variable in Excel or other software  Is the new model appropriate?  What is the coefficient for the transformed variable?  Use both models to predict the AGE for a tiger with 90% Nose Black. How do they compare? How do the CI and PI compare?  Try for several different values – how much does the transformation matter?

Fit using arcsin transformation Source | SS df MS Number of obs = F( 1, 13) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | R 2 to and fit appears better

Predicting Ages Implications if model applied to estimate age for population of tigers? % black Linear Arcsin Interesting discussion of R 2 and prediction of individual tigers using the model here…

Sample of 27 Tigers (Tigger123) R-squared = Adj R-squared = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | Original data fit and residuals Transformed data fit excellent Parameters similar to smaller data

Sample of 70+ Tigers (ClaireBear) R-squared = Adj R-squared = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | Original data fit and residuals Transformed data fit excellent Parameters similar to smaller data…but more change

Opportunities Would we have tried this transformation? How about others? Compare… Sample has more young tigers…particularly in small sample - sampling issues? How do we avoid this? Implications if model applied to estimate age for population of tigers? How can we do better in prediction? Role of R 2 Role of MODELS and use of data Different samples for different students/groups – sampling distributions

Enhancements How to make sampling issues and statistical thinking more related to game play –Tiger behavior and ease of tagging based on age and other factors –Tiger population distribution Richer data (missing, messy, more characteristics) Tiger behavior “Gaming” tuning knobs – too easy/hard…balance of time to collect and student engagement FUTURE possibilities for a RICH, IMMERSIVE ENVIRONMENT –Other animals –Disease spread –A lot more…

STUDENT EVALUATIONS Question% Agree Website/game instructions easy to understand97.5 Helped understand using regression to model real data85.2 Creativity can play a role in research91.3 Had a positive effect on my interest in statistics77.5 Helpful in showing the entire process for a research study79.8 How to integrate textbook material into real world problem77.5 Showing the importance of biases/other factors68.8 Importance of checking for data errors, outliers74.7 Showing there is more to statistical study than p-values88.9 Agree or strongly agree percentages In most questions, those not agreeing were neutral Other questions also positive results

STUDENT EVALUATIONS “it helps students understand the material in a way that they can make it more memorable and meaningful to them” “it was fun and helpful in learning” “it was very fun and creative and then it was more interesting to do calculations” “It was a lot more fun then some of our other activities, and in my opinion helped a lot with the material we were working on. It was easier to connect the ideas. I'd recommend using it again.”

STUDENT EVALUATIONS Only 1 negative response Nearly all students recommended using the activity again FUN mentioned by most LEARNING mentioned by most

INSTRUCTOR EVALUATIONS All planned to use again Observed: Student engagement and interest Positive learning gain USED in a variety of ways In class and out of class data collection Nature article included As class activity, project, even as a midterm!!!

An EXAMPLE The TigerStat activity was a success! 1. 2 lectures + 1 lab talked about: correlation, least squares estimation of the line, and sampling distributions / inference for a linear model lecture where I went through a multivariate example (where the response needed a log transformation). 3. I assigned most of the lab for them to do (including the game), and then I had them write up just a small bit of it. The majority of the students really got it. I was impressed. For 1.5 weeks of presenting on linear models, they actually understood a lot of the details of model building, assessment, and interpretation. It was great!