Presentation is loading. Please wait.

Presentation is loading. Please wait.

Note: In this chapter, we only cover sections 10-1 through 10-3

Similar presentations


Presentation on theme: "Note: In this chapter, we only cover sections 10-1 through 10-3"— Presentation transcript:

1 Note: In this chapter, we only cover sections 10-1 through 10-3

2 Sections 10-1 & 10-2 Correlation

3 Paired Data In this chapter, we will be looking at paired data (x’s and y’s). We will be given paired data to study to see if a relationship exists between the 2 variables. We will be using some variable x to help us predict values of another variable called y.

4 Definition A correlation exists between two variables when there appears to be some pattern relating them. Correlation analysis is the statistical technique used to determine the strength of the relationship between two variables.

5 Correlation Eventually we will be testing to see if there is a correlation between two different variables. For instance: Is there a correlation between the amount of time a student studies and the student’s grade? Is there a correlation between your gas bill and the number of gallons your gas tank holds? It is possible to have a positive correlation, negative correlation, or no correlation between the two variables. One way to determine the correlation is by looking at a scatter diagram.

6 Definition A scatter diagram (or scatter plot) is a graph in which the pairs of data are plotted as points on a graph. Each subject is a dot on the graph. One set of data provides the x-coordinate, and the other provides the y-coordinate.

7 Positive Linear Correlation Notice when you read left to right, the points go in the upward direction. Positive correlation Strong positive correlation Perfect positive correlation

8 Negative Linear Correlation Notice when you read left to right, the points go in the downward direction. Negative correlation Strong negative correlation Perfect negative correlation

9 No Linear Correlation Notice when you read left to right, there is no specific pattern. No correlation Nonlinear correlation In this figure, there is a pattern but it is not linear. We will not discuss nonlinear relationships in this class.

10 Scatter Diagram on the Calculator
Let’s discuss how we can create a scatter diagram on our calculator, so we can look at data and determine if there is a relationship. You will need to begin by plugging your data into L1 and L2. Go to STAT, 1: Edit, enter the x’s under L1 and the y’s under L2.

11 Scatter Diagram on the Calculator
Once you have entered your data into the lists, hit 2nd, then Y = . The STAT PLOTS screen appears. Hit 1 to select Plot1. The following screen will appear: You will want to turn on the plots, so make sure On is chosen. You need to choose the Type, which is the first picture, a scatterplot. Make sure the right lists appear for x and y. The Mark should be on the first choice, the square. Once you have everything entered in correctly, you will hit ZOOM, 9: ZoomStat and your scatter diagram will appear.

12 Example The following data represent the weights of cars and their highway miles per gallon. Use a scatter diagram to investigate whether or not there is a relationship between these two sets of data. Enter both sets of lists into your calculator, and follow the directions in the previous slides. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25

13 Solution This graph has each car plotted, with its weight as the x-coordinate, and its highway miles per gallon as the y-coordinate. Just by looking at the scatter diagram, it does look like the plots are going in the downward direction when reading left to right. But is this close enough to a line to be meaningful? There is a better way to measure a relationship instead of just “eyeballing” it.

14 Linear Correlation Coefficient
The linear correlation coefficient measures the strength of the linear relationship between paired x and y values.

15 Notation for the Linear Correlation Coefficient
r represents linear correlation coefficient for a sample  (rho) represents linear correlation coefficient for a population

16 Properties of the Linear Correlation Coefficient r
1. –1  r  1 r is between –1 and 1. –1 is a perfect negative correlation, 1 is a perfect positive correlation. 2. If variables are independent (no correlation), then r = 0. 3. It should be rounded to 3 decimal places. 4. r is taken from a sample, so it is a statistic.

17 Positive Correlation Note: All three of these are positive correlations, so r is positive in all three cases. The better the correlation, the closer r is to 1.

18 Negative Correlation Note: All three of these are negative correlations, r is negative in all three cases.

19 Note: Because there is no linear correlation r = 0.
No Correlation Note: Because there is no linear correlation r = 0.

20 Correlation Coefficient
Here is the general idea of where the value of r should fall depending on if there is a positive, negative, or no correlation. The closer to 1, the stronger the positive correlation. The closer to –1, the stronger the negative correlation. The closer to 0, the weaker the positive/negative correlation.

21 Correlation Coefficient on the Calculator
Let’s discuss how we can find the correlation coefficient r on our calculator, so we see if our data has a positive, a negative, or no correlation. Enter the data into L1 and L2. Hit STAT, go over to TESTS, and choose E: LinRegTTest (on some calculators this may choice F), then press Enter. This screen appears: Enter the two lists where your data is located. Freq should be 1. For now, none of the other information matters. Go down to Calculate and hit ENTER.

22 Correlation Coefficient on the Calculator
After you hit ENTER, the following screen will appear: You will have to scroll down to find r. Remember to round r to 3 decimal places.

23 Proportion of Variation
You may have noticed that when you find the correlation coefficient, the calculator also gives you r2. This is the coefficient of determination, or proportion of variation. It tells you the proportion of the variation in y that is explained by the relationship between x and y.

24 Example Earlier, we created this scatter diagram and stated, by looking at it, that it appeared to show a relationship. Find the correlation coefficient and the coefficient of determination. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25

25 Solution The coefficient of determination is r2 = This means that 66.9% of the variation in the mpg of these cars is explained by the relationship of a car’s weight to its mpg. Some (about 33.1%) of the differences in mpg are due to other factors. The correlation coefficient is r = –0.818, which means there seems to be a negative correlation between the weight of the cars and the highway miles per gallon. But is this correlation strong enough to be meaningful?

26 How strong is strong enough?
Setting a cutoff point for a strong correlation is difficult. The cutoff is different depending on the sample size and the significance level you want. We want to use the sample data to determine if the population has a linear correlation. We will do this by setting up a hypothesis test. As with our other hypothesis tests, we will use our calculators. The calculator gives a P-value without the need of a test statistic.

27 Hypothesis Testing We can use r to estimate r and then do hypothesis testing on the value of r . We will be using the LinRegTTest on our calculator to test our claim. Check out the next slides to learn more about the testing process.

28 Formal Hypothesis Test
For this hypothesis test, the parameter will be r. Remember: -1 ≤ r ≤ ≤ r ≤ 1 Just like r, r is always between -1 and 1. If r = 0, then there is NO linear correlation between the variables.

29 5 Steps to Hypothesis Tests
In each problem, you should include the following steps. 1. Set up the hypotheses with the correct parameter. Label which one is the claim. 2. State what input screen on the calculator you used and the P-value. (Round to 3 sig. digits) 3. Decide to reject or fail to reject H0. 4. Decide whether to support or fail to support H1. 5. Interpret the conclusion about the original claim.

30 Null and Alternative Hypotheses
Note we use r as our parameter. H0: = (no linear correlation) H1:  (linear correlation) We want to know if there is a linear correlation or not. r = 0 would mean no correlation, so r ≠ 0 would mean there is a correlation. Note: It is possible to have r > 0 or r < 0 in the alternative hypothesis, but we will not be using these.

31 Hypothesis Test on the Calculator
The LinRegTTest that you used to find r is also the hypothesis test. Instead of looking for the value of r this time, we need to know the P-value because we need to compare it to the level of significance to determine whether to reject or fail to reject the null hypothesis. So after plugging in the appropriate information (see next slide), find the P-value. Remember to round to 3 significant digits!

32 LinRegTTest on Calculator
Let’s discuss how we can find the P-value on our calculator, so we can determine whether to reject or fail to reject our null hypothesis. As before, enter the data into L1 and L2. Hit STAT, go over to TESTS, and choose E: LinRegTTest (on some calculators this may be choice F), then press Enter. This screen appears: Enter the two lists where your data is located. Freq should be 1, choose ≠ for the alternative hypothesis, always leave RegEQ blank, then go to Calculate and hit ENTER. The following screen will appear: The P-value should be on the screen that appears. Remember to round it to 3 significant digits. Let’s do an example!

33 Example Earlier, we saw from a scatter diagram that the following data appeared to have a linear correlation. Is it strong enough to say about the whole population? At the 0.05 level of significance, do the data below provide sufficient evidence that weight and hwy mpg of a car are linearly related? Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25 We will set up a formal hypothesis test to determine if the evidence of a correlation is sufficient.

34 Solution H0: r = 0 (no correlation) H1: r ≠ 0 (correlation)
LinRegTTest, P-value = < 0.05, so we REJECT H0 Rejecting H0 means that we SUPPORT H1 There is sufficient evidence to support a linear correlation between weight and highway miles per gallon in all cars. Note: When there is a correlation, look at r to decide if the relationship is positive or negative. Here, r = , so it is negative correlation.

35 Example The following data represent employee test scores and performance ratings. X: Test 10 4 15 11 14 9 12 17 5 18 16 3 Y: Rating 31 27 30 26 38 21 29 36 33 25 Test at the 0.05 level of significance for a linear correlation.

36 Solution Test at the 0.05 level of significance for a linear correlation. H0: r = 0 (no correlation) H1: r  0 (correlation) LinRegTTest, P-value = 0.192 0.192 > 0.05, so we FAIL to REJECT H0 Failing to reject H0 means we fail to support H1 There is NOT sufficient evidence to support a linear correlation between test scores and performance ratings for all employees.

37 In this section we cover p. 536-541
Regression In this section we cover p

38 Linear Regression Once we know there is a linear correlation, we need to know what that correlation is. For that, we need the linear equation that models the data. In this section, we will discuss regression, more specifically, a regression equation. If we determine there is a correlation between x and y, we can make predictions by using the regression equation. First, some review on equations of lines.

39 Definitions y-Intercept: The point where the line crosses the y-axis. The point where x is equal to zero. Slope: How much y will change every time you increase the value of x by one unit. You maybe have heard the term “rise over run” or the “change in y over the change in x”

40 Positive and Negative Slope
When reading a graph from left to right, if the points on the line run uphill the line has a positive slope (which indicates a positive correlation). If you read the graph left to right and the points run downhill the line has a negative slope (which indicates a negative correlation). **Note: You have to read left to right, NOT right to left.**

41 Slope POSITIVE SLOPE NEGATIVE SLOPE

42 Finding Slope and Y-Intercept
On the next slide, you are given a set of points. If we plot those points and connect them, we will have a graph. We can find the y-intercept by looking at the graph or by looking at the set of data. (Remember it is when x is zero or where it crosses the y-axis.) We can find the slope by determining the change in y over the change in x. See next slide

43 Graph the set of points:
Given a set of points: Graph the set of points: x y 1 3 2 5 7 The y-intercept: (0,1) Slope: Change in y is 2 Change in x is 1 Hence, the slope is

44 Relationship Between x and y
The relationship between x and y-coordinates of a line can be expressed with a Linear Equation y = (slope)x + (y-intercept) Typically it is written y = mx + b where m is the slope and b is the y-intercept. For instance, if given the equation y = 3x – 2, the slope of this equation would be 3 and the y-intercept would be –2 or (0, –2). If given the equation y = –5x + 12, the slope would be –5 and the y-intercept would be 12 or (0, 12).

45 We could write the equation of the line as
Linear Equation Looking at our last example, we had found the y-intercept to be (0,1) and the slope to be 2/1 or 2. We could write the equation of the line as y = 2x + 1

46 Determine the Slope and y-intercept
1. y = 5 + 3x 2. y = –2x – 3 3. y = 17 4. y = 0.24x See next slide for solutions.

47 Solutions 1. y = 5 + 3x 2. y = –2x – 3 3. y = 17 4. y = 0.24x
Slope: y-intercept: (0,5) Notice that when the order is changed, we must find x to find the slope. Slope: – y-intercept: (0, –3) Slope: y-intercept: (0,17) Slope: y-intercept: (0,0)

48 Definition Regression Equation (or Least-Squares Line)
An equation expressing a relationship between x and y variables taken from sample data. The symbol for this equation is (y-hat) This equation allows us to make estimates. It cannot give exact values because it is only based on sample data.

49 Linear Regression Let’s take a look at a couple of examples to see how we can find a linear equation from our calculator. We can find the linear equation by using LinRegTTest and scrolling down to the a and b values. The typical equation of a straight line y = mx + b is expressed in the form y = a + bx in your TI-83/84 (and in the statistics world in general). Be careful how you write equations due to this difference! If the equation is found using LinRegTTest, a is the y-intercept, and b is the slope! a and b should be rounded to at least 3 significant digits

50 Example We found in the last section that the data below indicate that weight and hwy mpg of a car are linearly related. Answer the following questions. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25 a. Use LinRegTTest to find the regression equation. b. Interpret the value of the slope of this equation. c. Predict the hwy mpg for a car that weighs 3000 pounds.

51 Solution Linear Equation
a. Use LinRegTTest to find the regression equation. When you plug the data into your lists and use LinRegTTest, a screen should appear similar to this one. Remember a is your y-intercept and b is your slope value. Linear Equation Note that 3 sig digits is different from 3 decimal places, and the significant digits may start on the left of the decimal point. In this case, 3 sig digits means rounds to 34.7, but more decimal places is ok if you wish.

52 Solution b. Interpret the value of the slope of this equation. Because the equation is , we know our slope is – (look for the x). We are comparing weight of cars (our x values) and highway miles per gallon (our y values). Slope is the change in y when x is increased by 1 unit. This is called the marginal change. So, we can interpret the slope as: For every increase of 1 pound the highway miles per gallon decrease by Note: We always say increase for the x’s. Here we say decrease for the y’s because our slope is negative.

53 Solution c. Predict the hwy mpg for a car that weighs 3000 pounds.
Remember our x-values represent the weight in pounds. If our equation is , then to find out the prediction of hwy mpg, we can plug in 3000 for x and find our predicted value y. Round predictions at least as far as the original y data—we often go one place farther. What this means: If a car weighs 3000 pounds, it will get approximately 21.0 highway miles per gallon.

54 If There is No Correlation
When LinRegTTest determines that there is NOT sufficient evidence for a linear correlation, then the regression equation should NOT be used to make estimates and predictions about the population. In these cases, the best estimate you can make is to give the average. So no matter what x value you are given, you would have to give the average y value as the prediction. You can find this by hand, or use 1-Var Stats. What you are saying is, “This equation is not a good one. The best estimate I can give you is the average.”

55 Example In the last section, we found that these data did NOT provide sufficient evidence of a linear correlation. Find the regression equation. Estimate the rating of a person with a test score of 8. X: Test 10 4 15 11 14 9 12 17 5 18 16 3 Y: Rating 31 27 30 26 38 21 29 36 33 25

56 Solution Find the regression equation.
Estimate the rating of a person with a test score of 8. Because it was determined that there is not a linear correlation, the regression equation should not be used. It would be unreliable to use for estimates about the population, and is worthless to help us predict future values. We will need to find the average performance rating to use as our estimate. See next slide…

57 Solution OR with newer calculators:
b. Predict the rating of a person with a test score of 8. Use 1-Var Stats with the list you put your y’s in (probably L2) and look for the mean. (It will be called , even though it is really .) STATCalc, 1: 1-Var Stats L2 (2nd 2) The average rating is This means that no matter which test score we are looking at (including 8), the best estimate for the rating will be about 29.3. OR with newer calculators:

58 Rules to Follow with Linear Regression
Only use the regression equation if the hypothesis test indicates a linear correlation. If there is no correlation, your best estimate is , the mean of the y values. Only plug in x values that are within the range of the sample data. (Some books allow x’s slightly outside but very close.) You never know when a strange jump may happen just outside of the range you looked at. Only use the regression equation to make predictions about the original population. (If the sample was all men, don’t use the equation to talk about women.) If your data is old, make sure it is still valid. Don’t use an outdated equation.

59 Example—Putting it All Together
The following are heights and weights of 9 female supermodels: We determined in the last section that these data have a linear correlation. Determine if there is a linear correlation at the 0.05 significance level, and if so, find the regression equation. Interpret the slope. Estimate the weight of a female supermodel who is 69 in tall. Can we estimate the weight of a female supermodel who is 62 in tall? Can we estimate the weight of a male supermodel who is 67 in tall? Can we estimate the weight of a math teacher who is 69 in tall? Height (in) 71 70.5 72 70 66.5 Weight (lb) 125 119 128 127 105 123 115

60 Solution Determine if there is a linear correlation at the significance level, and if so, find the regression equation. Set up the hypothesis test to test for linear correlation. H0: ρ = 0 (no correlation) H1: ρ ≠ 0 (correlation) LinRegTTest, P-value = < 0.05, reject H0 Rejecting H0 means we support H1 There is sufficient to evidence to support a linear correlation between height and weight of supermodels. This is a positive correlation with equation

61 Solution Interpret the slope. The slope indicates that a supermodel should weigh an extra 3.88 lb when her height is increased by 1 in. (Or, when a supermodel’s height increases by 1 in, her weight increases by about 3.88 lb.) Estimate the weight of a female supermodel who is 69 in tall.

62 Solution Can we estimate the weight of a female supermodel who is 62 in tall? No, our data only included models between 66.5 and 72 in tall, so we can not use the equation to predict the weight of a model who is 62 in tall. Can we estimate the weight of a male supermodel who is 67 in tall? No, the data only included female supermodels, so we can not use it to predict weights for males. Can we estimate the weight of a math teacher who is 69 in tall? Not unless she is a supermodel! 


Download ppt "Note: In this chapter, we only cover sections 10-1 through 10-3"

Similar presentations


Ads by Google