Note: In this chapter, we only cover sections 10-1 through 10-3

Slides:



Advertisements
Similar presentations
Sections 10-1 and 10-2 Review and Preview and Correlation.
Advertisements

Section 10-3 Regression.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.2.
Bivariate Data – Scatter Plots and Correlation Coefficient…… Section 3.1 and 3.2.
1 The Basics of Regression. 2 Remember back in your prior school daze some algebra? You might recall the equation for a line as being y = mx + b. Or maybe.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
STATISTICS ELEMENTARY C.M. Pascual
SIMPLE LINEAR REGRESSION
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Section 12.1 Scatter Plots and Correlation HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Section 9-1 – Correlation A correlation is a relationship between two variables. The data can be represented by ordered pairs (x,y) where x is the independent.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
INTRODUCTORY STATISTICS Chapter 12 LINEAR REGRESSION AND CORRELATION PowerPoint Image Slideshow.
Copyright © 2017, 2014 Pearson Education, Inc. Slide 1 Chapter 4 Regression Analysis: Exploring Associations between Variables.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Statistics 200 Lecture #6 Thursday, September 8, 2016
Topics
Scatter Plots and Correlation
Regression and Correlation
Review and Preview and Correlation
Warm Up Scatter Plot Activity.
Is there a relationship between the lengths of body parts?
Regression.
Correlation and Simple Linear Regression
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Chapter 4 Correlation.
Multiple Regression.
Regression.
Correlation and Regression
Elementary Statistics
Exercise 4 Find the value of k such that the line passing through the points (−4, 2k) and (k, −5) has slope −1.
Lecture Slides Elementary Statistics Twelfth Edition
Regression and Residual Plots
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 12 Regression.
Keller: Stats for Mgmt & Econ, 7th Ed
Chapter 10 Correlation and Regression
Regression.
Elementary Statistics
Lecture Notes The Relation between Two Variables Q Q
Regression.
Regression.
Descriptive Analysis and Presentation of Bivariate Data
Regression.
Regression Chapter 8.
Regression.
SIMPLE LINEAR REGRESSION
Regression.
Correlation and the Pearson r
SIMPLE LINEAR REGRESSION
Topic 8 Correlation and Regression Analysis
Linear Models We will determine and use linear models, and use correlation coefficients.
Linear Regression and Correlation
Presentation transcript:

Note: In this chapter, we only cover sections 10-1 through 10-3

Sections 10-1 & 10-2 Correlation

Paired Data In this chapter, we will be looking at paired data (x’s and y’s). We will be given paired data to study to see if a relationship exists between the 2 variables. We will be using some variable x to help us predict values of another variable called y.

Definition A correlation exists between two variables when there appears to be some pattern relating them. Correlation analysis is the statistical technique used to determine the strength of the relationship between two variables.

Correlation Eventually we will be testing to see if there is a correlation between two different variables. For instance: Is there a correlation between the amount of time a student studies and the student’s grade? Is there a correlation between your gas bill and the number of gallons your gas tank holds? It is possible to have a positive correlation, negative correlation, or no correlation between the two variables. One way to determine the correlation is by looking at a scatter diagram.

Definition A scatter diagram (or scatter plot) is a graph in which the pairs of data are plotted as points on a graph. Each subject is a dot on the graph. One set of data provides the x-coordinate, and the other provides the y-coordinate.

Positive Linear Correlation Notice when you read left to right, the points go in the upward direction. Positive correlation Strong positive correlation Perfect positive correlation

Negative Linear Correlation Notice when you read left to right, the points go in the downward direction. Negative correlation Strong negative correlation Perfect negative correlation

No Linear Correlation Notice when you read left to right, there is no specific pattern. No correlation Nonlinear correlation In this figure, there is a pattern but it is not linear. We will not discuss nonlinear relationships in this class.

Scatter Diagram on the Calculator Let’s discuss how we can create a scatter diagram on our calculator, so we can look at data and determine if there is a relationship. You will need to begin by plugging your data into L1 and L2. Go to STAT, 1: Edit, enter the x’s under L1 and the y’s under L2.

Scatter Diagram on the Calculator Once you have entered your data into the lists, hit 2nd, then Y = . The STAT PLOTS screen appears. Hit 1 to select Plot1. The following screen will appear: You will want to turn on the plots, so make sure On is chosen. You need to choose the Type, which is the first picture, a scatterplot. Make sure the right lists appear for x and y. The Mark should be on the first choice, the square. Once you have everything entered in correctly, you will hit ZOOM, 9: ZoomStat and your scatter diagram will appear.

Example The following data represent the weights of cars and their highway miles per gallon. Use a scatter diagram to investigate whether or not there is a relationship between these two sets of data. Enter both sets of lists into your calculator, and follow the directions in the previous slides. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25

Solution This graph has each car plotted, with its weight as the x-coordinate, and its highway miles per gallon as the y-coordinate. Just by looking at the scatter diagram, it does look like the plots are going in the downward direction when reading left to right. But is this close enough to a line to be meaningful? There is a better way to measure a relationship instead of just “eyeballing” it.

Linear Correlation Coefficient The linear correlation coefficient measures the strength of the linear relationship between paired x and y values.

Notation for the Linear Correlation Coefficient r represents linear correlation coefficient for a sample  (rho) represents linear correlation coefficient for a population

Properties of the Linear Correlation Coefficient r 1. –1  r  1 r is between –1 and 1. –1 is a perfect negative correlation, 1 is a perfect positive correlation. 2. If variables are independent (no correlation), then r = 0. 3. It should be rounded to 3 decimal places. 4. r is taken from a sample, so it is a statistic.

Positive Correlation Note: All three of these are positive correlations, so r is positive in all three cases. The better the correlation, the closer r is to 1.

Negative Correlation Note: All three of these are negative correlations, r is negative in all three cases.

Note: Because there is no linear correlation r = 0. No Correlation Note: Because there is no linear correlation r = 0.

Correlation Coefficient Here is the general idea of where the value of r should fall depending on if there is a positive, negative, or no correlation. The closer to 1, the stronger the positive correlation. The closer to –1, the stronger the negative correlation. The closer to 0, the weaker the positive/negative correlation.

Correlation Coefficient on the Calculator Let’s discuss how we can find the correlation coefficient r on our calculator, so we see if our data has a positive, a negative, or no correlation. Enter the data into L1 and L2. Hit STAT, go over to TESTS, and choose E: LinRegTTest (on some calculators this may choice F), then press Enter. This screen appears: Enter the two lists where your data is located. Freq should be 1. For now, none of the other information matters. Go down to Calculate and hit ENTER.

Correlation Coefficient on the Calculator After you hit ENTER, the following screen will appear: You will have to scroll down to find r. Remember to round r to 3 decimal places.

Proportion of Variation You may have noticed that when you find the correlation coefficient, the calculator also gives you r2. This is the coefficient of determination, or proportion of variation. It tells you the proportion of the variation in y that is explained by the relationship between x and y.

Example Earlier, we created this scatter diagram and stated, by looking at it, that it appeared to show a relationship. Find the correlation coefficient and the coefficient of determination. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25

Solution The coefficient of determination is r2 = 0.669. This means that 66.9% of the variation in the mpg of these cars is explained by the relationship of a car’s weight to its mpg. Some (about 33.1%) of the differences in mpg are due to other factors. The correlation coefficient is r = –0.818, which means there seems to be a negative correlation between the weight of the cars and the highway miles per gallon. But is this correlation strong enough to be meaningful?

How strong is strong enough? Setting a cutoff point for a strong correlation is difficult. The cutoff is different depending on the sample size and the significance level you want. We want to use the sample data to determine if the population has a linear correlation. We will do this by setting up a hypothesis test. As with our other hypothesis tests, we will use our calculators. The calculator gives a P-value without the need of a test statistic.

Hypothesis Testing We can use r to estimate r and then do hypothesis testing on the value of r . We will be using the LinRegTTest on our calculator to test our claim. Check out the next slides to learn more about the testing process.

Formal Hypothesis Test For this hypothesis test, the parameter will be r. Remember: -1 ≤ r ≤ 1 -1 ≤ r ≤ 1 Just like r, r is always between -1 and 1. If r = 0, then there is NO linear correlation between the variables.

5 Steps to Hypothesis Tests In each problem, you should include the following steps. 1. Set up the hypotheses with the correct parameter. Label which one is the claim. 2. State what input screen on the calculator you used and the P-value. (Round to 3 sig. digits) 3. Decide to reject or fail to reject H0. 4. Decide whether to support or fail to support H1. 5. Interpret the conclusion about the original claim.

Null and Alternative Hypotheses Note we use r as our parameter. H0: = (no linear correlation) H1:  (linear correlation) We want to know if there is a linear correlation or not. r = 0 would mean no correlation, so r ≠ 0 would mean there is a correlation. Note: It is possible to have r > 0 or r < 0 in the alternative hypothesis, but we will not be using these.

Hypothesis Test on the Calculator The LinRegTTest that you used to find r is also the hypothesis test. Instead of looking for the value of r this time, we need to know the P-value because we need to compare it to the level of significance to determine whether to reject or fail to reject the null hypothesis. So after plugging in the appropriate information (see next slide), find the P-value. Remember to round to 3 significant digits!

LinRegTTest on Calculator Let’s discuss how we can find the P-value on our calculator, so we can determine whether to reject or fail to reject our null hypothesis. As before, enter the data into L1 and L2. Hit STAT, go over to TESTS, and choose E: LinRegTTest (on some calculators this may be choice F), then press Enter. This screen appears: Enter the two lists where your data is located. Freq should be 1, choose ≠ for the alternative hypothesis, always leave RegEQ blank, then go to Calculate and hit ENTER. The following screen will appear: The P-value should be on the screen that appears. Remember to round it to 3 significant digits. Let’s do an example!

Example Earlier, we saw from a scatter diagram that the following data appeared to have a linear correlation. Is it strong enough to say about the whole population? At the 0.05 level of significance, do the data below provide sufficient evidence that weight and hwy mpg of a car are linearly related? Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25 We will set up a formal hypothesis test to determine if the evidence of a correlation is sufficient.

Solution H0: r = 0 (no correlation) H1: r ≠ 0 (correlation) LinRegTTest, P-value = 0.00382 0.00382 < 0.05, so we REJECT H0 Rejecting H0 means that we SUPPORT H1 There is sufficient evidence to support a linear correlation between weight and highway miles per gallon in all cars. Note: When there is a correlation, look at r to decide if the relationship is positive or negative. Here, r = -0.818, so it is negative correlation.

Example The following data represent employee test scores and performance ratings. X: Test 10 4 15 11 14 9 12 17 5 18 16 3 Y: Rating 31 27 30 26 38 21 29 36 33 25 Test at the 0.05 level of significance for a linear correlation.

Solution Test at the 0.05 level of significance for a linear correlation. H0: r = 0 (no correlation) H1: r  0 (correlation) LinRegTTest, P-value = 0.192 0.192 > 0.05, so we FAIL to REJECT H0 Failing to reject H0 means we fail to support H1 There is NOT sufficient evidence to support a linear correlation between test scores and performance ratings for all employees.

In this section we cover p. 536-541 Regression In this section we cover p. 536-541

Linear Regression Once we know there is a linear correlation, we need to know what that correlation is. For that, we need the linear equation that models the data. In this section, we will discuss regression, more specifically, a regression equation. If we determine there is a correlation between x and y, we can make predictions by using the regression equation. First, some review on equations of lines.

Definitions y-Intercept: The point where the line crosses the y-axis. The point where x is equal to zero. Slope: How much y will change every time you increase the value of x by one unit. You maybe have heard the term “rise over run” or the “change in y over the change in x”

Positive and Negative Slope When reading a graph from left to right, if the points on the line run uphill the line has a positive slope (which indicates a positive correlation). If you read the graph left to right and the points run downhill the line has a negative slope (which indicates a negative correlation). **Note: You have to read left to right, NOT right to left.**

Slope POSITIVE SLOPE NEGATIVE SLOPE

Finding Slope and Y-Intercept On the next slide, you are given a set of points. If we plot those points and connect them, we will have a graph. We can find the y-intercept by looking at the graph or by looking at the set of data. (Remember it is when x is zero or where it crosses the y-axis.) We can find the slope by determining the change in y over the change in x. See next slide

Graph the set of points: Given a set of points: Graph the set of points: x y 1 3 2 5 7 The y-intercept: (0,1) Slope: Change in y is 2 Change in x is 1 Hence, the slope is

Relationship Between x and y The relationship between x and y-coordinates of a line can be expressed with a Linear Equation y = (slope)x + (y-intercept) Typically it is written y = mx + b where m is the slope and b is the y-intercept. For instance, if given the equation y = 3x – 2, the slope of this equation would be 3 and the y-intercept would be –2 or (0, –2). If given the equation y = –5x + 12, the slope would be –5 and the y-intercept would be 12 or (0, 12).

We could write the equation of the line as Linear Equation Looking at our last example, we had found the y-intercept to be (0,1) and the slope to be 2/1 or 2. We could write the equation of the line as y = 2x + 1

Determine the Slope and y-intercept 1. y = 5 + 3x 2. y = –2x – 3 3. y = 17 4. y = 0.24x See next slide for solutions.

Solutions 1. y = 5 + 3x 2. y = –2x – 3 3. y = 17 4. y = 0.24x Slope: 3 y-intercept: (0,5) Notice that when the order is changed, we must find x to find the slope. Slope: –2 y-intercept: (0, –3) Slope: 0 y-intercept: (0,17) Slope: 0.24 y-intercept: (0,0)

Definition Regression Equation (or Least-Squares Line) An equation expressing a relationship between x and y variables taken from sample data. The symbol for this equation is (y-hat) This equation allows us to make estimates. It cannot give exact values because it is only based on sample data.

Linear Regression Let’s take a look at a couple of examples to see how we can find a linear equation from our calculator. We can find the linear equation by using LinRegTTest and scrolling down to the a and b values. The typical equation of a straight line y = mx + b is expressed in the form y = a + bx in your TI-83/84 (and in the statistics world in general). Be careful how you write equations due to this difference! If the equation is found using LinRegTTest, a is the y-intercept, and b is the slope! a and b should be rounded to at least 3 significant digits

Example We found in the last section that the data below indicate that weight and hwy mpg of a car are linearly related. Answer the following questions. Weight of Car (lbs) 2948 3536 3472 2782 3766 4367 2649 2526 2665 3374 Highway mpg 23 19 20 14 16 21 25 a. Use LinRegTTest to find the regression equation. b. Interpret the value of the slope of this equation. c. Predict the hwy mpg for a car that weighs 3000 pounds.

Solution Linear Equation a. Use LinRegTTest to find the regression equation. When you plug the data into your lists and use LinRegTTest, a screen should appear similar to this one. Remember a is your y-intercept and b is your slope value. Linear Equation Note that 3 sig digits is different from 3 decimal places, and the significant digits may start on the left of the decimal point. In this case, 3 sig digits means 34.7359 rounds to 34.7, but more decimal places is ok if you wish.

Solution b. Interpret the value of the slope of this equation. Because the equation is , we know our slope is –0.00456 (look for the x). We are comparing weight of cars (our x values) and highway miles per gallon (our y values). Slope is the change in y when x is increased by 1 unit. This is called the marginal change. So, we can interpret the slope as: For every increase of 1 pound the highway miles per gallon decrease by 0.00456. Note: We always say increase for the x’s. Here we say decrease for the y’s because our slope is negative.

Solution c. Predict the hwy mpg for a car that weighs 3000 pounds. Remember our x-values represent the weight in pounds. If our equation is , then to find out the prediction of hwy mpg, we can plug in 3000 for x and find our predicted value y. Round predictions at least as far as the original y data—we often go one place farther. What this means: If a car weighs 3000 pounds, it will get approximately 21.0 highway miles per gallon.

If There is No Correlation When LinRegTTest determines that there is NOT sufficient evidence for a linear correlation, then the regression equation should NOT be used to make estimates and predictions about the population. In these cases, the best estimate you can make is to give the average. So no matter what x value you are given, you would have to give the average y value as the prediction. You can find this by hand, or use 1-Var Stats. What you are saying is, “This equation is not a good one. The best estimate I can give you is the average.”

Example In the last section, we found that these data did NOT provide sufficient evidence of a linear correlation. Find the regression equation. Estimate the rating of a person with a test score of 8. X: Test 10 4 15 11 14 9 12 17 5 18 16 3 Y: Rating 31 27 30 26 38 21 29 36 33 25

Solution Find the regression equation. Estimate the rating of a person with a test score of 8. Because it was determined that there is not a linear correlation, the regression equation should not be used. It would be unreliable to use for estimates about the population, and is worthless to help us predict future values. We will need to find the average performance rating to use as our estimate. See next slide…

Solution OR with newer calculators: b. Predict the rating of a person with a test score of 8. Use 1-Var Stats with the list you put your y’s in (probably L2) and look for the mean. (It will be called , even though it is really .) STATCalc, 1: 1-Var Stats L2 (2nd 2) The average rating is 29.3. This means that no matter which test score we are looking at (including 8), the best estimate for the rating will be about 29.3. OR with newer calculators:

Rules to Follow with Linear Regression Only use the regression equation if the hypothesis test indicates a linear correlation. If there is no correlation, your best estimate is , the mean of the y values. Only plug in x values that are within the range of the sample data. (Some books allow x’s slightly outside but very close.) You never know when a strange jump may happen just outside of the range you looked at. Only use the regression equation to make predictions about the original population. (If the sample was all men, don’t use the equation to talk about women.) If your data is old, make sure it is still valid. Don’t use an outdated equation.

Example—Putting it All Together The following are heights and weights of 9 female supermodels: We determined in the last section that these data have a linear correlation. Determine if there is a linear correlation at the 0.05 significance level, and if so, find the regression equation. Interpret the slope. Estimate the weight of a female supermodel who is 69 in tall. Can we estimate the weight of a female supermodel who is 62 in tall? Can we estimate the weight of a male supermodel who is 67 in tall? Can we estimate the weight of a math teacher who is 69 in tall? Height (in) 71 70.5 72 70 66.5 Weight (lb) 125 119 128 127 105 123 115

Solution Determine if there is a linear correlation at the 0.05 significance level, and if so, find the regression equation. Set up the hypothesis test to test for linear correlation. 1. H0: ρ = 0 (no correlation) H1: ρ ≠ 0 (correlation) LinRegTTest, P-value = 0.0103 0.0103 < 0.05, reject H0 Rejecting H0 means we support H1 There is sufficient to evidence to support a linear correlation between height and weight of supermodels. This is a positive correlation with equation

Solution Interpret the slope. The slope indicates that a supermodel should weigh an extra 3.88 lb when her height is increased by 1 in. (Or, when a supermodel’s height increases by 1 in, her weight increases by about 3.88 lb.) Estimate the weight of a female supermodel who is 69 in tall.

Solution Can we estimate the weight of a female supermodel who is 62 in tall? No, our data only included models between 66.5 and 72 in tall, so we can not use the equation to predict the weight of a model who is 62 in tall. Can we estimate the weight of a male supermodel who is 67 in tall? No, the data only included female supermodels, so we can not use it to predict weights for males. Can we estimate the weight of a math teacher who is 69 in tall? Not unless she is a supermodel! 