Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.

Similar presentations


Presentation on theme: "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting."— Presentation transcript:

1 Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting coefficients Prediction Cautions Inference for slope, correlation

2 Statistics: Unlocking the Power of Data Lock 5 ANOVA is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

3 Statistics: Unlocking the Power of Data Lock 5 A  2 test is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

4 Statistics: Unlocking the Power of Data Lock 5 MODELING

5 Statistics: Unlocking the Power of Data Lock 5 Can you estimate the temperature on a summer evening, just by listening to crickets chirp? We will fit a model to predict temperature based on cricket chirp rate Crickets and Temperature

6 Statistics: Unlocking the Power of Data Lock 5 Crickets and Temperature Response Variable, y Explanatory Variable, x

7 Statistics: Unlocking the Power of Data Lock 5 Linear Model A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x

8 Statistics: Unlocking the Power of Data Lock 5 Regression Line Goal: Find a straight line that best fits the data in a scatterplot

9 Statistics: Unlocking the Power of Data Lock 5 Equation of the Line The estimated regression line is InterceptSlope Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0

10 Statistics: Unlocking the Power of Data Lock 5 Regression in StatKey

11 Statistics: Unlocking the Power of Data Lock 5 Regression in RStudio

12 Statistics: Unlocking the Power of Data Lock 5 Regression Model Which is a correct interpretation? a) The average temperature is 37.68  b) For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute d) For every extra 0.23 chirps per minute, the predicted temperature increases by 37.68 

13 Statistics: Unlocking the Power of Data Lock 5 Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min

14 Statistics: Unlocking the Power of Data Lock 5 Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

15 Statistics: Unlocking the Power of Data Lock 5 Prediction

16 Statistics: Unlocking the Power of Data Lock 5 Prediction If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60  (b) 70  (c) 80 

17 Statistics: Unlocking the Power of Data Lock 5 Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.68 . Do you think this is a good prediction? (a) Yes (b) No

18 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!

19 Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

20 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear

21 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! http://illuminations.nctm.org/LessonDetai l.aspx?ID=L455

22 Statistics: Unlocking the Power of Data Lock 5 Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy 83.4090 -0.8895 Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89 (c)Both (d)Neither

23 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

24 Statistics: Unlocking the Power of Data Lock 5 Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

25 Statistics: Unlocking the Power of Data Lock 5 Regression Line How do we find the best fitting line???

26 Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

27 Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

28 Statistics: Unlocking the Power of Data Lock 5 Residual The residual is also the vertical distance from each point to the line

29 Statistics: Unlocking the Power of Data Lock 5 Residual Want to make all the residuals as small as possible. How would you measure this?

30 Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of squared residuals

31 Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression

32 Statistics: Unlocking the Power of Data Lock 5 Sample to Population Everything we have done so far is based solely on sample data Now, we will extend from the sample to the population

33 Statistics: Unlocking the Power of Data Lock 5 Intercept Slope Simple Linear Model Random error

34 Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope

35 Statistics: Unlocking the Power of Data Lock 5 Inference for Slope Give a 95% confidence interval for the true slope. Is the slope significantly different from 0? (a) Yes (b) No

36 Statistics: Unlocking the Power of Data Lock 5 Confidence Interval

37 Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test

38 Statistics: Unlocking the Power of Data Lock 5 Test for a correlation between temperature and cricket chirps (r = 0.9906). Correlation Test

39 Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for an association between two quantitative variables.

40 Statistics: Unlocking the Power of Data Lock 5 Small Samples The t-distribution is only appropriate for large samples (definitely not n = 7)! We should have done inference for the slope using simulation methods...

41 Statistics: Unlocking the Power of Data Lock 5

42 r = 0 Challenge: If the correlation between x and y is 0, what would the regression line be?

43 Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.6, 9.1 Do HW 7 (due Monday, 3/31)  NO LATE HOMEWORK ACCEPTED – SOLUTIONS WILL BE POSTED IMMEDIATELY AFTER CLASS TO HELP YOU PREPARE FOR EXAM 2


Download ppt "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting."

Similar presentations


Ads by Google