Presentation is loading. Please wait.

Presentation is loading. Please wait.

HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.1.

Similar presentations


Presentation on theme: "HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.1."— Presentation transcript:

1 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.1 Scatter Plots and Correlation With the quality added value you’ve come to expect from D.R.S., University of Cordele

2 Types of Relationships: HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation Strong Linear Relationship Non-Linear Relationship No Relationship Weak Linear Relationship Plot (x,y) data points and think about whether x and y are somehow related

3 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Table 12.2: Sample of NFL Quarterbacks (2011–2012 Season) Number of Passing Touchdowns 2012 Base Salary (in Millions of Dollars) Quarterback Rating Drew Brees463.0110.6 Michael Vick1812.584.9 Philip Rivers2710.288.7 Tony Romo310.825102.5 Aaron Rodgers458.0122.5 Jay Cutler137.785.7 Alex Smith175.090.7 Eli Manning291.7592.9 Tim Tebow122.172.9 Tom Brady390.95105.6 Source: Yahoo! Sports. “NFL - Statistics by Position.” http://sports.yahoo.com/nfl/stats/byposition?pos=QB&conference=NFL&year=season_20 11&sort=49&timeframe=All (20 May 2012). Source: Spotrac.com. “NFL Player Contracts, Salaries, and Transactions.” http://www.spotrac.com/nfl/ (2 Oct. 2012).

4 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.1: Creating a Scatter Plot to Identify Trends in Data Use the data from Table 12.2 to produce a scatter plot that shows the relationship between the base salary of an NFL quarterback and the number of touchdowns the quarterback has thrown in one season. Solution We might expect for the number of touchdowns a quarterback throws in one season to influence his salary. Taking this into consideration, we will place the number of touchdowns on the x-axis and the base salary on the y-axis.

5 Scatter Plot of (touchdowns, salary) on TI-84 Put Touchdowns in list L 1, Salary in list L 2 Y= old algebra plots should be cleared out of there 2 nd STAT PLOT all should be “Off” to start with 1:Plot 1: On, choose Type, Lists L 1 and L 2, Mark Remember 2 ND 1, 2 ND 2 to put in list names? ZOOM 9:ZoomStat If unexplainable error, 2 ND MEM 7 1 2 to clear all and then retype the lists of data. TRACE and Left Arrow and Right Arrow to explore it

6 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.1: Creating a Scatter Plot to Identify Trends in Data (cont.) Is there any apparent relationship between the number of passing touchdowns and the salary? __________________

7 Scatter Plot of (touchdowns, rating) on TI-84 Ratings in list L 3, Touchdowns still in L 1, Salary in L 2 Type the Ratings into List L 3 if you haven’t already done so. 2 nd STAT PLOT 1:Plot 1: Change to Lists L 1 and L 3 ZOOM 9:ZoomStat TRACE and Left Arrow and Right Arrow to explore it

8 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.2: Creating a Scatter Plot to Identify Trends in Data (cont.) Is there any apparent relationship between the number of passing touchdowns and the QB Rating? _____________. It appears to be a _______ relationship with _______ slope.

9 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.3: Determining Whether a Scatter Plot Would Have a Positive Slope, Negative Slope, or Not Follow a Straight-Line Pattern Determine whether the points in a scatter plot for the two variables are likely to have a positive slope, negative slope, or not follow a straight-line pattern. a. The number of hours you study for an exam and the score you make on that exam _________________ b. The price of a used car and the number of miles on the odometer _____________________________ c. The pressure on a gas pedal and the speed of the car _____________________________________ d. Shoe size and IQ for adults ___________________

10 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Scatter Plots and Correlation The Pearson correlation coefficient, , is the parameter that measures the strength of a linear relationship between two quantitative variables in a population. The correlation coefficient for a sample is denoted by r. It always takes a value between −1 and 1, inclusive. ρ is the Greek letter “rho”. Practice writing the rho character here:

11 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved.

12 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. HAWKES LEARNING SYSTEMS math courseware specialists Regression, Inference, and Model Building 12.1 Scatter Plots and Correlation –1 ≤ r ≤ 1 Close to –1 means a strong negative correlation. Close to 0 means no correlation. Close to 1 means a strong positive correlation.

13 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator Calculate the correlation coefficient, r, for the data from Table 12.2 relating touchdowns thrown and base salaries. Solution The data we need from Table 12.2 are reproduced in the following table. (Should already be in your calculator’s lists.) But we will not dig into the details of that awful formula! The TI-84 has built-in goodies.

14 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator (cont.) NFL Quarterbacks Number of Passing Touchdowns Base Salary (in Millions of Dollars) 46 3.0 18 12.5 27 10.2 31 0.825 45 8.0 13 7.7 17 5.0 29 1.75 12 2.1 39 0.95 in List L 1 in List L 2 Do you expect r to be Close to -1 ? Close to 0? Close to 1?

15 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.4: Calculating the Correlation Coefficient Using a TI-83/84 Plus Calculator (cont.) It’s STAT, TESTS, ALPHA F (ALPHA E on the 83/Plus) Repeat for the lists for Passing Touchdowns and QB Rating. In that case, r = _______. Put in the list names VARS, Y-VARS, 1, 1 will be useful later. Highlight Calculate, press ENTER, down arrows to find r = _____________

16 Use TI-84 LinRegTTest for a full Hypothesis Test (more than just getting the correlation coefficient, r) The next few slides describe the use of LinRegTTest. It’s STAT, TESTS, ALPHA F (ALPHA E on the 83/Plus) This description is about the full hypothesis test to determine “Is the relationship significant?” The outputs include the value of r, the correlation coefficient, which is of greatest interest at this early point in our study. The Hawkes materials talk about the LinReg feature but I’m recommending the LinRegTTest instead because you get more information for about the same effort.

17

18 LinRegTTest inputs (not identical to the quarterback example!) Here are the inputs: Xlist and Ylist – where you put the data – Shortcut: 2 ND 2 puts L 2 Freq: 1 (unless…) β & ρ: ≠ 0 – This is the Alternative Hypothesis RegEq: VARS, right arrow to Y-VARS, 1, 1 – Just put it in for later Highlight “Calculate” Press ENTER

19 LinRegTTest Outputs, first screen (from a different problem) t= the t statistic value for this test (the formula is in the book)

20 LinRegTTest Outputs, second screen (from a different problem) b later, for Regression s much later, for advanced Regression

21 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Testing Linear Relationships for Significance Significant Linear Relationship (Two-Tailed Test) H 0 :  = 0 (Implies that there is no significant linear relationship) H a :  ≠ 0 (Implies that there is a significant linear relationship) (Now they’re getting into the Hypothesis Testing we saw a brief preview of earlier in this set of slides.) Testing Linear Relationships for Significance (cont.) Significant Negative Linear Relationship (Left-Tailed Test) H 0 :  ≥ 0 (Implies that there is no significant negative linear relationship) H a :  < 0 (Implies that there is a significant negative linear relationship) Testing Linear Relationships for Significance (cont.) Significant Positive Linear Relationship (Right-Tailed Test) H 0 :  ≤ 0 (Implies that there is no significant positive linear relationship) H a :  > 0 (Implies that there is a significant positive linear relationship) This is the one we use the most. Be aware that this one exists.

22 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Test Statistic for a Hypothesis Test for a Correlation Coefficient The test statistic for testing the significance of the correlation coefficient is given by TI-84 LinRegTTest will calculate this value for us. Test Statistic for a Hypothesis Test for a Correlation Coefficient (cont.) where r is the sample correlation coefficient and n is the number of data pairs in the sample. The number of degrees of freedom for the t-distribution of the test statistic is given by n  2.

23 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Testing the Correlation Coefficient for Significance Using Hypothesis Testing Rejection Regions for Testing Linear Relationships Significant Linear Relationship (Two-Tailed Test) Reject the null hypothesis, H 0, if Significant Negative Linear Relationship (Left-Tailed Test) Reject the null hypothesis, H 0, if Significant Positive Linear Relationship (Right-Tailed Test) Reject the null hypothesis, H 0, if But we will use the p-value method because LinRegTTest gives us a p-value and the experiment specifies the α

24 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant Use a hypothesis test to determine if the linear relationship between the number of parking tickets a student receives during a semester and his or her GPA during the same semester is statistically significant at the 0.05 level of significance. Refer to the data presented in the following table. GPA and Number of Parking Tickets Number of Tickets 000011122233578 GPA 3.63.92.43.13.54.03.62.83.02.23.93.12.12.81.7

25 Example 12.7 Use the TI-84 LinRegTTest to perform the hypothesis test. Use the p-value method: The LinRegTTest gives you a p-value. If the p-value is < the given Level of Significance α = 0.05, then REJECT the null hypothesis; conclude that there IS a significant linear relationship. Otherwise, Fail To Reject – no significant relationship. And you can disregard most or all of the by-hand detail that is in the book and in the online Help.

26 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Solution Step 1: State the null and alternative hypotheses. We wish to test the claim that a significant linear relationship exists between the number of parking tickets a student receives during a semester and his or her GPA during the same semester. Thus, the hypotheses are stated as follows.

27 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.7: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 2: Determine which distribution to use for the test statistic, and state the level of significance. We will use the t-test statistic presented previously in this section along with a significance level of  = 0.05 to perform this hypothesis test. Step 3: Gather data and calculate the necessary sample statistics. (Do LinRegTTest)

28 Example 12-7 Hypothesis Test, concluded Compare p = _____ vs. α = ______ Decision: { Reject / Fail to Reject } the Null Hypothesis. Conclusion about Signficant Linear Relationship: Conclusion in Plain English:

29 Correlation does not imply Causation! If there seems to be a Correlation, it doesn’t necessarily mean that changes in one variable cause changes in the other variable. 1.There might be a lurking variable that affects both. 2.Or the two might be completely unrelated. The mathematical indication of a strong correlation is merely coincidental. Extreme examples can be seen at the Spurious Correlations web site (www.tylervigen.com)Spurious Correlations www.tylervigen.com

30 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant An online retailer wants to research the effectiveness of its mail-out catalogs. The company collects data from its eight largest markets with respect to the number of catalogs (in thousands) that were mailed out one fiscal year versus sales (in thousands of dollars) for that year. The results are as follows. Number of Catalogs Mailed and Sales Number of Catalogs (in Thousands) 23334456 Sales (in Thousands) $126$98$255$394$107$122$334$403

31 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Use a hypothesis test to determine if the linear relationship between the number of catalogs mailed out and sales is statistically significant at the 0.01 level of significance. Step 1: Hypotheses: H 0 : ___________ meaning _____________________. H a : ___________ meaning _____________________. Step 2: Decision to use the t distribution and level of significance _____ = 0.01

32 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.8: Performing a Hypothesis Test to Determine if the Linear Relationship between Two Variables Is Significant (cont.) Step 3: Gather data and calculate the necessary sample statistics. Using a TI-83/84 Plus calculator, enter the values for the numbers of catalogs mailed (x) in L1 and the sales values (y) in L2. Run LinRegTTest. Step 4: Conclusion: { Reject / Fail to Reject } the Null Hypothesis. Interpretation:

33 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Coefficient of Determination The coefficient of determination, r 2, is a measure of the proportion of the variation in the response variable (y) that can be associated with the variation in the explanatory variable (x). This too is reported to you in the LinRegTTest outputs.

34 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.9: Calculating and Interpreting the Coefficient of Determination If the correlation coefficient for the relationship between the numbers of rooms in houses and their prices is r = 0.65, how much of the variation in house prices can be associated with the variation in the numbers of rooms in the houses? Solution Recall that the coefficient of determination tells us the amount of variation in the response variable (house price) that is associated with the variation in the explanatory variable (number of rooms).

35 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.9: Calculating and Interpreting the Coefficient of Determination (cont.) Thus, the coefficient of determination for the relationship between the numbers of rooms in houses and their prices will tell us the proportion or percentage of the variation in house prices that can be associated with the variation in the numbers of rooms in the houses. Also, recall that the coefficient of determination is equal to the square of the correlation coefficient.

36 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.9: Calculating and Interpreting the Coefficient of Determination (cont.) Since we know that the correlation coefficient for these data is r = 0.65, we can calculate the coefficient of determination as r 2 = _____ Thus, approximately _____% of the variation in house prices can be associated with the variation in the numbers of rooms in the houses.

37 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Testing the Correlation Coefficient for Significance Using Critical Values of the Pearson Correlation Coefficient to Determine the Significance of a Linear Relationship A sample correlation coefficient, r, is statistically significant if (Why is this discussion here? Sometimes they give you a shred of a problem that gives some summary results and you have to use a printed table to make the determination. That’s the only time you’ll need to do this, for a few of those kinds of problems. In “real life”, in large problems, the LinRegTTest p- value is compared to alpha.)

38 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship Use the critical values in Table I to determine if the correlation between the number of passing touchdowns and base salary from Example 12.4 is statistically significant. Use a 0.05 level of significance. Solution Begin by finding the critical value for  = 0.05 with n = 10 in Table I. Find the value in the table where the row for n = 10 intersects the column for  = 0.05.

39 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.) n  = 0.05  = 0.01 60.8110.917 70.7540.875 80.7070.834 90.6660.798 100.6320.765 110.6020.735 120.5760.708 INTERPRETATION: “If my sample’s correlation coefficient, r, is at least as big as the value you look up in this table, then YES, significant linear relationship. Otherwise, no, no significant linear relationship.”

40 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 12.6: Using a Table of Critical Values to Determine Significance of a Linear Relationship (cont.) Thus, r  = 0.632. Comparing this critical value to the absolute value of the correlation coefficient we found for the data in Example 12.4, we have 0.251 < 0.632, and thus  r  < r . Therefore, the linear relationship between the variables is not statistically significant at the 0.05 level of significance. Thus, we do not have sufficient evidence, at the 0.05 level of significance, to conclude that a linear relationship exists between the number of passing touchdowns during the 2011–2012 season and the 2012 base salary of an NFL quarterback.

41 Correlation Coefficient in Excel

42 More with Excel That’s about all that can be done with basic Excel. There is an advanced feature on Data tab, then the Data Analysis add-in. It gets into the Regression topic in the next lesson.

43 ..

44 ..


Download ppt "HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.1."

Similar presentations


Ads by Google