Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Regression Analysis: Exploring Associations between Variables.

Similar presentations


Presentation on theme: "Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Regression Analysis: Exploring Associations between Variables."— Presentation transcript:

1 Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Regression Analysis: Exploring Associations between Variables

2 4 - 2 Copyright © 2014 Pearson Education, Inc. All rights reserved Learning Objectives  Be able to write a concise and accurate description of an association between two continuous variables based on a scatterplot.  Understand how to use a regression line to summarize a linear association between two continuous variables.  Interpret the intercept and slope of a regression line in context and know how to use the regression line to predict mean values of the response variable.  Critically evaluate a regression model.

3 Copyright © 2014 Pearson Education, Inc. All rights reserved 4.1 Visualizing Variability with a Scatterplot

4 4 - 4 Copyright © 2014 Pearson Education, Inc. All rights reserved Scatterplots  Used to investigate a positive, negative, or no association between two numerical variables.  In states where women tend to marry at an older age, men also tend to marry at an older age.

5 4 - 5 Copyright © 2014 Pearson Education, Inc. All rights reserved Positive Trend  Older cars tend to have more miles than newer cars.  Newer cars tend to have fewer miles than older cars.  There is a positive association between car age and miles the car has been driven.

6 4 - 6 Copyright © 2014 Pearson Education, Inc. All rights reserved Negative Trend  Countries with higher literacy rates tend to have fewer births per woman.  Countries with lower literacy rates tend to have more births per woman.  There is a negative association between literacy rate and births per woman.

7 4 - 7 Copyright © 2014 Pearson Education, Inc. All rights reserved No Trend  There is no trend between the speed and age of a marathon runner.  Knowing the age of a marathon runner does not help predict the runner’s speed.  There is no association between a marathon runner’s age and speed.

8 4 - 8 Copyright © 2014 Pearson Education, Inc. All rights reserved Strength of Association  If for each value of x, there is a small spread of y values, then there is a strong association between x and y.  If for each value of x, there is a large spread of y values, then there is a weak or no association between x and y.  If there is a strong (weak) association between x and y, then x is a good (bad) predictor of y.

9 4 - 9 Copyright © 2014 Pearson Education, Inc. All rights reserved Strength of Association

10 4 - 10 Copyright © 2014 Pearson Education, Inc. All rights reserved Linear Trends  A trend is linear if there is a line such that the points in general do not stray far from the line.  Linear trends are the easiest to work with.  There is a positive linear association between number of searches for “Vampire” and number for “Zombie”.

11 4 - 11 Copyright © 2014 Pearson Education, Inc. All rights reserved Other Shapes  Nonlinear association can also occur, but this is covered in a more advanced statistics course.  Only use techniques from this chapter when there is a linear trend.

12 4 - 12 Copyright © 2014 Pearson Education, Inc. All rights reserved Summary of Analysis of the Scatterplot  Look to see if there is a trend or association.  Determine the strength of trend. Is the association strong or weak?  Look at the shape of the trend. Is it linear? Is it nonlinear?

13 4 - 13 Copyright © 2014 Pearson Education, Inc. All rights reserved Writing Clear Descriptions Based on Association  Good:  People who have higher salaries tend to travel farther on vacation.  A person who has a high salary is predicted to travel far on vacation.  Bad:  Because they have higher salaries, they travel farther.  A person with a high salary will travel farther on vacation.

14 Copyright © 2014 Pearson Education, Inc. All rights reserved 4.2 Measuring Strength of Association with Correlation

15 4 - 15 Copyright © 2014 Pearson Education, Inc. All rights reserved The Correlation Coefficient r  The correlation coefficient is a number, r, that measures the strength of the linear association between two variables.  -1 ≤ r ≤ 1  If r is close to 1, then there is a strong positive linear association.  If r is close to -1, then there is a strong negative linear association.  If r is close to 0, then there is a weak or no association.

16 4 - 16 Copyright © 2014 Pearson Education, Inc. All rights reserved Positive Correlation

17 4 - 17 Copyright © 2014 Pearson Education, Inc. All rights reserved Weak or No Correlation

18 4 - 18 Copyright © 2014 Pearson Education, Inc. All rights reserved Negative Correlation

19 4 - 19 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting Correlation  The correlation between daily swim suits and ski jackets purchased in an apparel store is r = -0.96  There is a strong negative correlation between daily swim suits and ski jackets purchased.  On days with strong swim suit sales, one predicts that ski jacket sales would be weak.  This does not mean that people who buy swim suits are causing potential ski jacket buyers to not buy.

20 4 - 20 Copyright © 2014 Pearson Education, Inc. All rights reserved Using StatCrunch to Find r  Enter Data  Stat→Regression→ SimpleLinear  Select the Variables  Calculate

21 4 - 21 Copyright © 2014 Pearson Education, Inc. All rights reserved Switching x and y  r for life expectancy Women vs. Men: r = 0.977  r for life expectancy Men vs. Women: r = 0.977  Switching x and y has no effect on r.

22 4 - 22 Copyright © 2014 Pearson Education, Inc. All rights reserved Correlation, Arithmetic, and Units  Multiplying all x’s or all y’s by a constant does not change r.  Adding the same constant to all x’s or all y’s does not change r.  Changing units such as in→cm or ºF→ºC does not change r.  r is unitless.

23 4 - 23 Copyright © 2014 Pearson Education, Inc. All rights reserved Correlation and Linearity and Outliers  Only use linear correlation to interpret the data when there is a linear relationship  An outlier can strongly influence the correlation.

24 Copyright © 2014 Pearson Education, Inc. All rights reserved 4.3 Modeling Linear Trends

25 4 - 25 Copyright © 2014 Pearson Education, Inc. All rights reserved Least Squares Regression Line  The Regression Line is the “best fit” line for the data.  The line minimizes the average squared vertical distances.  It is only useful with data with a linear model.

26 4 - 26 Copyright © 2014 Pearson Education, Inc. All rights reserved StatCrunch and the Regression Line  Enter the data.  Select x and y variables.  Stat→Regression →Simple Linear  Select: Plot the Fitted Line  Calculate

27 4 - 27 Copyright © 2014 Pearson Education, Inc. All rights reserved Using the Regression Line  Predict the revenue per day when the company spend $500 per month on ads.  Predicted Rev = 1.8 + 1.2(5) = 7.8  The company’s daily revenue is predicted to be $780 when it spends $500 per month on ads.

28 4 - 28 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting the Slope  The slope is the coefficient in front of x in the regression line equation.  Rise/Run means that if x is increased by 1, then y is predicted or increases by an average of the slope value.  The slope is only meaningful if the data follows a linear model.

29 4 - 29 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting the Slope  The slope is 1.2.  If x is increased by 1, y has an average increase of 1.2.  For every $100 the company spends on ads, it averages an additional $120 in revenue.

30 4 - 30 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting the y-intercept  The y-intercept is the value of y when x is 0.  Use the y-intercept to interpret the data only when:  It makes sense to have a value of 0 for x.  The calculated y-intercept value is meaningful.  The data include values equal to or close to 0.

31 4 - 31 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting the y-intercept  The y-intercept is 1.8.  If the company spends no money, $0, on advertising, it is predicted to have an average daily revenue of $180.

32 4 - 32 Copyright © 2014 Pearson Education, Inc. All rights reserved Why Not to Use the y-intercept  A sample of high school freshmen and sophomores resulted in a regression equation that relates age to height in inches: predicted height = -9.2 + 4.9x  The y-intercept is -9.2.  A height of -9.2 inches is meaningless.  The sample only included teenagers. The age of 0 years is too far from the ages in the sample.  The slope is meaningful. High school freshmen and sophomores grow an average of 4.9 inches per year.

33 4 - 33 Copyright © 2014 Pearson Education, Inc. All rights reserved Correlation is Not Causation  A strong correlation is not evidence of a cause-and-effect relationship.  Do not use the words, “causes”, “makes”, “will”, “because”, etc. when making regression analysis based conclusions.  Do use the words, “predict”, “tends”, and “on average”.

34 4 - 34 Copyright © 2014 Pearson Education, Inc. All rights reserved More on the Regression Line  The equation does change when x and y are switched.  If the linear model is a “good fit” for the data, then the mean value of y for a given x will nearly lie on the regression line.

35 Copyright © 2014 Pearson Education, Inc. All rights reserved 4.4 Evaluating the Linear Model

36 4 - 36 Copyright © 2014 Pearson Education, Inc. All rights reserved Nonlinear Data  If you can’t imagine a line don’t try to find one.  If the association is not linear, don’t attempt to find or interpret r or the equation of the least squares regression line.

37 4 - 37 Copyright © 2014 Pearson Education, Inc. All rights reserved Slope and Causation Predicted Salary = 22,000 + 8,000 College Years  Wrong: Each year in college results in an additional salary increase of $8,000.  Wrong: A person with one more year of college education will earn an extra $8,000.  Correct: On average, people with one more year of college education tend to earn an extra $8,000.

38 4 - 38 Copyright © 2014 Pearson Education, Inc. All rights reserved Beware of Outliers  Outliers have a strong effect on both the correlation and the equation of the regression line.  An outlier that strongly effects the regression line is called an influential point.  When there is an influential point present, perform regression analysis both with and without the influential point.

39 4 - 39 Copyright © 2014 Pearson Education, Inc. All rights reserved Example of an Influential Point

40 4 - 40 Copyright © 2014 Pearson Education, Inc. All rights reserved Regression of Aggregate Data  Using Aggregate Data for regression means that each point represents the mean of all the y-values with a given x-value.  When using aggregate data, be sure to include the word “mean” in all interpretations.

41 4 - 41 Copyright © 2014 Pearson Education, Inc. All rights reserved Aggregate Data  There is a weak correlation between math SAT scores and critical reading SAT scores.  There is a strong correlation between states’ mean math SAT scores and states’ mean critical reading SAT scores.

42 4 - 42 Copyright © 2014 Pearson Education, Inc. All rights reserved Don’t Extrapolate  Only use the regression line to predict y-values for x-values that are within or near the range of the data.  Predicted Height = 31.78 + 2.45 Age  Predict for a 50 year old:  31.78 + 2.45(50) = 154.28 inches  The predicted height of a 50 year old man is over 12 feet tall????

43 4 - 43 Copyright © 2014 Pearson Education, Inc. All rights reserved Coefficient of Determination r 2  r 2 measures how much of the variation in the response variable, y, can be explained by the explanatory variable, x.  r 2 is used to help determine which explanatory variable would be best for making predictions about the response variable.

44 4 - 44 Copyright © 2014 Pearson Education, Inc. All rights reserved Coefficient of Determination Example  60.5% of the variation in the value of cars can be explained by the age of the car. The other 39.5% cannot be explained by the age of the car.

45 Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Case Study

46 4 - 46 Copyright © 2014 Pearson Education, Inc. All rights reserved Scatterplot of City Government Income vs. Private Meter Income Without Brinks  Positive weak linear association  Predicted Collection = 688497 + 145.5 (City Income)

47 4 - 47 Copyright © 2014 Pearson Education, Inc. All rights reserved Are Brinks Employees Stealing from Parking Meters?  New York City contracted Brinks to collect parking meter money. The city suspects that employees are keeping some of it.  There is data on the monthly meter collection of honest (not Brinks) collectors vs. the city’s total income for that month.

48 4 - 48 Copyright © 2014 Pearson Education, Inc. All rights reserved Predicted vs. Actual Brinks Collection  Predicted Collection = 688497 + 145.5 (City Income)  One month, City Income was $7016 and Brinks collected $1,330,143.  688497 + 145.5(7016) = $1,709,325  Discrepancy: 1,709,352 – 1,330,143 = $379,182

49 4 - 49 Copyright © 2014 Pearson Education, Inc. All rights reserved Comparing Brinks vs. Honest Employees  Conclusion: Income when Brinks is working is clearly lower than when the honest employees are working.

50 Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Guided Exercise 1

51 4 - 51 Copyright © 2014 Pearson Education, Inc. All rights reserved Does the Cost of a Flight Depend on the Distance?  How much would it cost to fly 500 miles?  Use a complete regression analysis.

52 4 - 52 Copyright © 2014 Pearson Education, Inc. All rights reserved Create a Scatterplot  Since the cost tends to increase as mileage increases and since there is no apparent strong curvature, the linear model is appropriate.

53 4 - 53 Copyright © 2014 Pearson Education, Inc. All rights reserved The Regression Line  Interpret the Slope: 0.08.  For every additional mile, on average, the price goes up by $0.08.  Interpret the y-intercept: 163  This is the predicted price for a 0 mile flight. The y-intercept is meaningless here.

54 4 - 54 Copyright © 2014 Pearson Education, Inc. All rights reserved Answer the Question  How much would it cost to fly 500 miles?  Predicted Cost = 162.60 + 0.0796 (miles)  162.60 + 0.0796 (500) = 202.40  A 500 mile flight is predicted to cost $202.40.

55 Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Guided Exercise 2

56 4 - 56 Copyright © 2014 Pearson Education, Inc. All rights reserved Test Scores: Slope  The summary statistics between the midterm and final exam scores are:  Midterm: Mean = 75, Standard Dev. = 10  Final: Mean = 75, Standard Dev. = 10  r = 0.7, n = 20  First find the slope:

57 4 - 57 Copyright © 2014 Pearson Education, Inc. All rights reserved Test Scores: y-intercept Midterm: Mean = 75, Standard Dev. = 10 Final: Mean = 75, Standard Dev. = 10 r = 0.7, n = 20, b = 0.7  Then find the y-intercept a from the equation:

58 4 - 58 Copyright © 2014 Pearson Education, Inc. All rights reserved Test Scores: Regression Line Midterm: Mean = 75, Standard Dev. = 10 Final: Mean = 75, Standard Dev. = 10 r = 0.7, n = 20, b = 0.7, a = 22.5  Write out the following equation: Predicted = a + bx  Predicted Final Score = 22.5 + 0.7(Midterm Score)  Use the equation to predict the final score for a midterm score of 95%.  Predicted Final = 22.5 + 0.7(95) = 89  This is less than 95 since the slope is less than 1.


Download ppt "Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 4 Regression Analysis: Exploring Associations between Variables."

Similar presentations


Ads by Google