Download presentation
Presentation is loading. Please wait.
Published byClifford York Modified over 8 years ago
1
Section 9-1 – Correlation A correlation is a relationship between two variables. The data can be represented by ordered pairs (x,y) where x is the independent (or explanatory) variable and y is the dependent (or response) variable. There are several types of correlations that can be ascertained by graphing a scatter-plot of the ordered pairs and looking at the pattern. If the dots tend to run upward from left to right in a more or less linear fashion, the correlation is positive. If the dots tend to run downward from left to right in a more or less linear fashion, the correlation is negative. If the dots tend to be scattered all over the graph with no real pattern, the correlation is non-existent. If the dots form a pattern other than a line (parabola, for example), the correlation is non-linear.
2
Section 9-1 – Correlation The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. The sample correlation coefficient is denoted by the letter r. The population correlation coefficient is denoted by ρ, the Greek letter Rho (pronounced “row”). The correlation coefficient runs from -1 to 1; the closer the value is to either end, the stronger the relationship is. A correlation coefficient of 0 would signify absolutely no linear relationship. A correlation coefficient of -1 would signify a completely linear negative relationship. A correlation coefficient of 1 would signify a completely linear positive relationship.
3
Section 9-1 – Correlation While there is a formula for finding the value of r, we are going to use the calculator to find this for us. Steps to graphing a scatter-plot and finding the correlation coefficient on the calculator. 1)Turn StatPlot On – 2 nd Y=, select plot 1, turn it on, and make sure that it is looking at L 1 and L 2. 2)STAT-EDIT, enter data points. Use L 1 for the x-values and L 2 for the y-values.
4
Section 9-1 – Correlation 4)Set your window – WINDOW Set x-min to a number less than the smallest x-value in your list. Set x-max to a number greater than the largest x-value in your list. Repeat for y. 5)Hit the GRAPH key to look at your scatter-plot. 6)To find the correlation coefficient, press the buttons STAT- TEST – F. The calculator will give you the values for r 2, and r. We’ll talk more about r 2 later, but for now we are looking at r to quantify the strength of the relationship. This test also gives you the equation of the line of regression, as well as an abundance of other information that we will use later.
5
Section 9-1 – Correlation Once we have a number that represents the strength of the relationship, we need to determine whether or not this relationship is significant. This is necessary to determine whether the regression line can be used for predicting y-values. There are two ways to determine if the relationship is significant. Using the Pearson Correlation Coefficient chart (Table 11), found on page A28 in the back of your book is probably the quickest and easiest way to do this, but it can only be used if the desired α level is 0.05 or 0.01. If α is some other value, you will have to conduct a hypothesis test (the second method of determining whether a relationship is significant).
6
Section 9-1 – Correlation To use the chart, simply use the number of data sets (n) for the row and the α value for the column to find the critical value. If the absolute value of r is greater than the critical value, the relationship is significant. If you prefer to run a hypothesis test, or if α is not 0.05 or 0.01, you will need to first write the hypotheses. The hypotheses are written in the following ways: To test whether there is any correlation at all, the hypotheses are H 0 : ρ = 0 and H a : ρ≠0 Notice that this means that the null hypothesis states that there is NO relationship.
8
Section 9-1 – Correlation Fortunately, STAT-Test-F (LinRegTTest) will also give us the t-score, as well as the r value and the p value. Enter data into L1 and L2, then run STAT-Test-F, making sure to indicate whether you are running a left, right, or two-tailed test. Set Freq: to 1 and leave RegEQ: blank. We find the rejection region by using the t-distribution chart, just like we did in Chapter 7, except that instead of using n-1 for degrees of freedom, we use n-2 (because we have two variables now). We can also use the invT function on the calculator. (2 nd VARS 4) Just remember to use n – 2 for your degrees of freedom. If conducting a t-test is still problematic for you, refer back to Chapter 7 notes.
9
Section 9-1 – Correlation Correlation and Causation It is important to remember that just because two variables are related does not necessarily mean that one causes the other. There are 4 possibilities: 1)A direct cause-and-effect relationship between the variables. x causes y. For example, spending more money on advertising results in more sales. 2)A reverse cause-and-effect relationship between the variables. y causes x. For example, maybe more time between Old Faithful eruptions causes the next one to last longer, instead of the other way around.
10
Section 9-1 – Correlation 3)A third, as yet unknown, variable may be causing both x and y. The Chapter Opener on page 495 shows a positive correlation between a movie’s budget and its ticket sales. Which one causes the other? Maybe they are both caused by the actors who star in the movies. Big stars demand more money to appear in films (budget goes up). Big stars draw more people to the theaters to see their movies (ticket sales go up). Maybe they are both caused by the hype generated by the movie studio prior to the release of the movie. Advertising causes the budget to go up. Advertising may lure more into the theater to see the movie (ticket sales up).
11
Section 9-1 – Correlation 4)The variables only appear to be related; it’s a coincidence. For example, there may be a strong positive correlation between the number of coyotes living in an area and the number of families owning more than two cars in that same area, but it is highly unlikely that one causes the other. The relation would probably be due to coincidence.
12
Section 9-1 – Correlation Example 3 (Page 498) Old Faithful, located in Yellowstone National Park, is the world’s most famous geyser. The duration (in minutes) of several of Old Faithful’s eruptions and the times (in minutes) until the next eruption are shown in the table below. Display the data in a scatterplot and determine whether there appears to be a positive or negative linear correlation or no linear correlation at all.
13
Section 9-1 – Correlation Example 3 (Page 498) STAT-Edit (1) – Enter Duration (x) values into L1 Enter Time (y) values into L2 2 nd Y=, turn Stat Plot On Select Scatterplot (first option) Make sure that the correct lists are being looked at. Duration, x1.81.821.901.931.982.052.132.302.372.823.133.273.65 Time, y5658625657 605761737677 Duration, x3.783.833.884.104.274.304.434.474.534.554.604.63 Time, y798580899089 8689869291
14
Section 9-1 – Correlation Example 3 (Page 498) Window Set x-min to something less than the smallest x-value in your data set. Set x-max to something greater than the largest x- value in your data set. Repeat for y. Graph This plot appears to show a positive linear correlation. Duration, x1.81.821.901.931.982.052.132.302.372.823.133.273.65 Time, y5658625657 605761737677 Duration, x3.783.833.884.104.274.304.434.474.534.554.604.63 Time, y798580899089 8689869291
17
Advertising $ (in thousands) 2.41.62.02.61.41.62.02.2 Company Sales (in thousands) 225184220240180184186215
19
Assignments: Classwork:Pages 507-508 #1-14 All, Page 517 # 1-8 All Homework:Pages 508-511 #15-28 All Pages 517-520 #14-28 Evens
23
xy 2.4225 1.6184 2.0220 2.6240 1.4180 1.6184 2.0186 2.2215
26
Assignments: Classwork:Page 531 #1-8 All Homework:Pages 531-534, # 9-24 All
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.