TigerStat: An Immersive 3-D Game for Statistics Classes Rod Sturdivant, John Jackson, Kevin Cummiskey Department of Mathematical Sciences, USMA West Point.

Slides:



Advertisements
Similar presentations
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
EC220 - Introduction to econometrics (chapter 7)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient (2010/2011.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Shonda Kuiper Grinnell College April 27 th, 2010.
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Econ 314: Project 1 Answers and Questions Examining the Growth Data Trends, Cycles, and Turning Points.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Using Fun and Games to Engage Real-World Learning Dr. Shonda Kuiper Grinnell College Dr. Rod Sturdivant The Ohio State University ECOTS Workshop, May 20,
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
The slope, explained variance, residuals
QM222 Class 15 Section D1 Review for test Multicollinearity
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Presentation transcript:

TigerStat: An Immersive 3-D Game for Statistics Classes Rod Sturdivant, John Jackson, Kevin Cummiskey Department of Mathematical Sciences, USMA West Point

Playing Games with a Purpose: A New Approach to Teaching and Learning Statistics 2 This NSF Project involves: Developing interactive Web-based games Corresponding investigative laboratory modules Project Goals: Effectively teach statistical thinking Present process of scientific inquiry to undergraduate students Website for games/labs: 3 year grant (July 11 – June 14) Co-PI: Shonda Kuiper (Grinnell College) Rod Sturdivant (West Point) West Point contributors: John Jackson, Kevin Cummiskey, Billy Kaczynski, Rob Burks NSF TUES DUE #

Pedagogical Points 3 Use of technology by incorporating games into the classroom designed to: Foster a sense of engagement [“hard fun”, Papert (1998)] Have a low threat of failure early on but create a challenging environment that grows with the students’ knowledge, Create realistic, adaptable, and straightforward models representing current research in a variety of disciplines, and Provide an intrinsic motivation for students to want to learn.

Why Games? 4 Games can do more than be a distraction or played for fun “In addition to developing skills, play can also uniquely motivate students to develop basic competencies and interest in more specialized domains of knowledge by encouraging personal and social investments” Jenkins, (2005) - Henry Jenkins, Director of the Comparative Media Studies Program at Massachusetts Institute of Technology, “There is no reason that a generation that can memorize over 100 Pokémon characters with all their characteristics, history and evolution can’t learn the names, populations, capitals and relationships of all the 101 nations in the world. It just depends on how it is presented.” Prensky (2001)

Why Games? 5 Games lower the threat of failure. Games foster a sense of engagement through immersion. Games sequence tasks to allow early success. They maintain a threshold at which players feel challenged but not overwhelmed. Games link learning to goals and roles. Games create a social context that connects learners to others who share their interests. Games are multimodal. Games support early steps into a new domain. Games create simplified models of the world around us while maintaining realism in data (messy, missing, sampling bias). Games allow extension to a variety of more complex real world problems in a variety of disciplines.

6

Real World Problem 7 Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution of the population is important to ensure sustainability

Lab Materials Laboratory Exercise #1: Simple Regression Task: You are hired to develop models to use in estimating the age of a population of tigers. The Bolshoy Kosha (Russian for big cat) Reserve is a newly created animal reserve that was uniquely developed to help endangered species prosper. This 10,000 acre wild animal reservation was selected because an abundance of Siberian tigers have been found in the area. The diverse terrain of the reserve provides a wide variety of habitats for many different species of animals. Since the tigers is this area are much more abundant that any other area in the world, they are starting to draw a significant number of researchers. Your primary responsibility will be to help these researchers as they come to study the tigers and then incorporate the results of their research into a system to identify the best management practices for this reserve. Establishing a simple model to estimate the age of a tiger. While the exact age is not known for most of the tigers in your reserve, the age of some tigers are known. These have been carefully monitored by keeping them in a smaller research zone within the BK land area. To estimate the age of a tiger that is captured on your reserve, you will need to compare characteristics of the captured tiger to the ones that live on the research zone (whose ages are known). 8

Read literature Nature Article 9 Aging Lions in Eastern and Southern Africa by Karyl L. Whitman and Craig Packer

Research question and plan 10 Do techniques for estimating lion age apply to tigers? To collect a sample and test model what issues must be considered? How many tigers to sample? What data should we collect? How do we use our data to answer the question? Lion model Percentage of black on the nose (Sample of 63 females)

Demonstration –TigerStat 11

New Release Addition Gives students updates on what data collected DURING GAME PLAY –encourages thinking about the sample size –encourages considering representativeness 12

Example “Anonymous student” (15 tigers) Linear fit reasonable? 13 Source | SS df MS Number of obs = F( 1, 13) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] noseblack | _cons |

Examining model fit Residuals, leverage, influence diagnostics –Pattern? –Outlier? –Influential Point? 14

Fit removing outlier Slight increase in R 2 (from ) Slope coefficient decrease of 8% (from 12.74) 15 Source | SS df MS Number of obs = F( 1, 12) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] noseblack | _cons |

REAL questions Enough evidence to reject model fit? Heteroskedasticity? Would you try a transformation (without having the Nature article)? What is the model used for – is it “good enough”? Is the data “good enough”? 16

Fit using arcsin transformation 17 Source | SS df MS Number of obs = F( 1, 13) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | R 2 to and fit appears better

Predicting Ages Implications if model applied to estimate age for population of tigers? 18 % black Linear Arcsin Interesting discussion of R 2 and prediction of individual tigers using the model here…

Sample of 27 Tigers (Tigger123) 19 R-squared = Adj R-squared = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | Original data fit and residuals Transformed data fit excellent Parameters similar to smaller data

Sample of 70+ Tigers (ClaireBear) 20 R-squared = Adj R-squared = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] t_noseblack | _cons | Original data fit and residuals Transformed data fit excellent Parameters similar to smaller data…but more change

Opportunities 21 Would we have tried this transformation? How about others? Compare… Sample has more young tigers…particularly in small sample - sampling issues? How do we avoid this? Implications if model applied to estimate age for population of tigers? How can we do better in prediction? Role of R 2 Role of MODELS and use of data Different samples for different students/groups – sampling distributions

Enhancements 22 How to make sampling issues and statistical thinking more related to game play –Tiger behavior and ease of tagging based on age and other factors –Tagged tiger data viewed during game play Richer data (missing, messy, more characteristics) Tiger behavior “Gaming” tuning knobs – too easy/hard…balance of time to collect and student engagement FUTURE possibilities for a RICH, IMMERSIVE ENVIRONMENT –Other animals –Disease spread –A lot more…