Download presentation
Presentation is loading. Please wait.
Published byRigoberto Hubbart Modified over 9 years ago
1
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting coefficients Prediction Cautions Inference for slope, correlation
2
Statistics: Unlocking the Power of Data Lock 5 ANOVA is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review
3
Statistics: Unlocking the Power of Data Lock 5 A 2 test is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review
4
Statistics: Unlocking the Power of Data Lock 5 MODELING
5
Statistics: Unlocking the Power of Data Lock 5 Can you estimate the temperature on a summer evening, just by listening to crickets chirp? We will fit a model to predict temperature based on cricket chirp rate Crickets and Temperature
6
Statistics: Unlocking the Power of Data Lock 5 Crickets and Temperature Response Variable, y Explanatory Variable, x
7
Statistics: Unlocking the Power of Data Lock 5 Linear Model A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x
8
Statistics: Unlocking the Power of Data Lock 5 Regression Line Goal: Find a straight line that best fits the data in a scatterplot
9
Statistics: Unlocking the Power of Data Lock 5 Equation of the Line The estimated regression line is InterceptSlope Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0
10
Statistics: Unlocking the Power of Data Lock 5 Regression in StatKey
11
Statistics: Unlocking the Power of Data Lock 5 Regression in RStudio
12
Statistics: Unlocking the Power of Data Lock 5 Regression Model Which is a correct interpretation? a) The average temperature is 37.68 b) For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute d) For every extra 0.23 chirps per minute, the predicted temperature increases by 37.68
13
Statistics: Unlocking the Power of Data Lock 5 Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min
14
Statistics: Unlocking the Power of Data Lock 5 Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is
15
Statistics: Unlocking the Power of Data Lock 5 Prediction
16
Statistics: Unlocking the Power of Data Lock 5 Prediction If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60 (b) 70 (c) 80
17
Statistics: Unlocking the Power of Data Lock 5 Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.68 . Do you think this is a good prediction? (a) Yes (b) No
18
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!
19
Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned
20
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear
21
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! http://illuminations.nctm.org/LessonDetai l.aspx?ID=L455
22
Statistics: Unlocking the Power of Data Lock 5 Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy 83.4090 -0.8895 Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89 (c)Both (d)Neither
23
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)
24
Statistics: Unlocking the Power of Data Lock 5 Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response
25
Statistics: Unlocking the Power of Data Lock 5 Regression Line How do we find the best fitting line???
26
Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values
27
Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values
28
Statistics: Unlocking the Power of Data Lock 5 Residual The residual is also the vertical distance from each point to the line
29
Statistics: Unlocking the Power of Data Lock 5 Residual Want to make all the residuals as small as possible. How would you measure this?
30
Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of squared residuals
31
Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression
32
Statistics: Unlocking the Power of Data Lock 5 Sample to Population Everything we have done so far is based solely on sample data Now, we will extend from the sample to the population
33
Statistics: Unlocking the Power of Data Lock 5 Intercept Slope Simple Linear Model Random error
34
Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope
35
Statistics: Unlocking the Power of Data Lock 5 Inference for Slope Give a 95% confidence interval for the true slope. Is the slope significantly different from 0? (a) Yes (b) No
36
Statistics: Unlocking the Power of Data Lock 5 Confidence Interval
37
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test
38
Statistics: Unlocking the Power of Data Lock 5 Test for a correlation between temperature and cricket chirps (r = 0.9906). Correlation Test
39
Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for an association between two quantitative variables.
40
Statistics: Unlocking the Power of Data Lock 5 Small Samples The t-distribution is only appropriate for large samples (definitely not n = 7)! We should have done inference for the slope using simulation methods...
41
Statistics: Unlocking the Power of Data Lock 5
42
r = 0 Challenge: If the correlation between x and y is 0, what would the regression line be?
43
Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.6, 9.1 Do HW 7 (due Monday, 3/31) NO LATE HOMEWORK ACCEPTED – SOLUTIONS WILL BE POSTED IMMEDIATELY AFTER CLASS TO HELP YOU PREPARE FOR EXAM 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.