Stat 112: Notes 1 Main topics of course: –Simple Regression –Multiple Regression –Analysis of Variance –Chapters 3-9 of textbook Readings for Notes 1: Chapter Also, Chapter 2 contains review of material from Stat 111.
Monitoring Tiger Prey Abundance The Siberian (Amur) tiger is a species of tigers found in the Russian Far East. Tigers in general are in trouble. At the beginning of the 20 th century, there were around 100,000 tigers. Today, there are less than 6000 tigers in the world and there are only about 400 Siberian tigers. The Sika deer is a staple of the Siberian tiger diet. It is also hunted by the local people. To balance the needs of the local people and at the same time ensure there are adequate prey for tigers, local government managers need accurate estimates of the number of Sika deer in an area.
Estimating Deer Abundance Counting Method: The number of deer in a plot can be determined accurately but with considerable time and work. It requires 3- 5 expert field workers to monitor the plot and to classify whether deer tracks are moving into or out of the plot. Can “total tracks counted” be used to estimate the number of deer in the plot? This is much easier to collect.
Deer Density vs. Tracks Counted Study was done in which density was determined by expert field workers over a range of plots. How would we estimate the deer density if we counted 1 track per squared kilometer?
Simple Regression Model How would we estimate the deer density if we counted 1 track per squared kilometer? Idea: Estimate the mean deer density when we count 1 track per squared kilometer. Simple Regression Setup: –Y=outcome (density per km squared) –X=explanatory variable (tracks counted per km squared –Note: outcome is sometimes called dependent variable and explanatory variable is sometimes called independent or predictor variable Simple Regression Model: Model for the mean (expected value) of Y given X, denoted
Simple Linear Regression Model
Using the Simple Linear Regression Model for Estimating Deer Density A;dklsfkaj;s
Estimating the Slope and Intercept
Simple Linear Regression Using JMP Use Analyze, Fit Y by X. Put response variable in Y and explanatory variable in X (make sure X is continuous by clicking on the X column, clicking Cols and Column Info and checking that the Modeling Type is Continuous). Click on fit line under red triangle next to Bivariate Fit of Y by X.
Residuals
Root Mean Square Error Technical Note: RMSE^2 is average squared residual. RMSE is close to but not exactly average absolute residual
Poverty and MDs Do states with more poverty tend to have fewer doctors? Which states have an unusually high number of doctors given their poverty rate or an unusally low number of doctors given their poverty rate.
Residuals in JMP Saving the residuals in JMP: –To save the residuals, after fitting the line using Fit Y by X, click the red triangle next to linear fit and click save residuals. A column with the residuals is created on the data spreadsheet. –The residuals can be sorted by clicking Sorting the residuals: –Click the table menu, then click sort, click the name of the column with the residuals, click by and then click sort. Labeling observations: –To label an observation in the graph, click the row with the observation and then click the rows menu and label. By default, JMP will use the observation number to label the observation. To make JMP use state to label the observation, click the state column, click the Cols menu and click label
Residuals for Poverty-MD Data
Summary for Notes 1 Regression Model: Model for the mean of an outcome Y given a value of the explanatory variable X, E(Y|X). Simple Linear Regression Model: Regression Models are useful for: –Predicting Y from X –Understanding the association between Y and X. –Identifying observations that are unusual in their relationship between Y and X (large magnitude of residuals).