20 - 1 Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Chapter 4 Sampling Distributions and Data Descriptions.
Lecture 8: Hypothesis Testing
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
& dding ubtracting ractions.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS Joint and Conditional Distributions
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
CALENDAR.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
The basics for simulations
PP Test Review Sections 6-1 to 6-6
Chi-Square and Analysis of Variance (ANOVA)
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Hypothesis Tests: Two Independent Samples
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
When you see… Find the zeros You think….
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Before Between After.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Clock will move after 1 minute
PSSA Preparation.
& dding ubtracting ractions.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Simple Linear Regression Analysis
Business Statistics, 4e by Ken Black
Correlation and Linear Regression
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
9. Two Functions of Two Random Variables
Presentation transcript:

Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation Coefficient. Reviewed 05 June 05/MODULE 20

Correlation In Module 19, we examined how two variables, x and y, relate to each other by using the simple linear regression tool. In that context, x was the independent variable and y was the dependent variable. Typical examples for the independent variable include measures of time, including age; whereas, typical examples for the dependent variable are continuous measurements such as blood cholesterol level. The general assumption is that there are separate normal distributions of the dependent variable y for each value of the independent variable x. Further, we need to assume that these separate normal distributions for the dependent variable all have the same population variance.

Clearly these assumptions are quite restrictive in that we are often interested in the relationship between two variables, x and y, where it is not at all clear which should be labeled the independent variable and which the dependent one. An example is the relationship between blood cholesterol level and blood pressure level.

For this situation, we have another tool to measure and test hypotheses about the relationship between these two variables. The tool is correlation and we focus here only on what is usually called the Pearson Product Moment Correlation Coefficient. There are other measures of correlation which we will not discuss here. There are restrictions for the use of this correlation tool as well, which include the basic assumption that the x and y variables together have a joint frequency distribution which is called the bivariate normal distribution. This distribution looks like a three-dimensional bell in a manner similar to the way a normal distribution for one variable looks like a cross section of a bell.

The degree of association or correlation between two variables is measured by the correlation coefficient. This is done in a manner similar to that for other population parameters and estimates of these parameters obtained by calculating statistics from samples. That is, there is a value for the population parameter for this coefficient which is estimated by selecting a random sample and calculating the appropriate coefficient using the data from this sample. We can also use the information from the sample to test hypotheses about the population.

The population parameter for the Pearson Product Moment Correlation Coefficient is defined as which is typically called Rho, for the Greek letter it represents. The estimate of ρ calculated from the sample data is the statistic

Fecal Fat (g/24 hr) and Urinary Oxalate (mg/24 hr) secreted by a random sample of n = 11 persons Fecal Fat and Urinary Oxalate Example

Scatter plot for Fecal Fat and Urinary Oxalate Data

Calculations for Regressing Urinary Oxalate on Fecal Fat

Regression Tools

At x = 40, the regression estimate for y is: So we can calculate The straight line depicting the regression relationship of y on x is

The hypothesis:H 0 : β = 0 vs. H 1 : β ≠0 2. The  level:  = The assumptions:Random normal samples for y- variable from populations defined by x-variable 4. The test statistic:ANOVA as specified by Test for regression of Urinary Oxalate on Fecal Fat

The critical region: Reject H 0 : β = 0 if the value calculated for F > F 0.95 (1, 9) = The result:SS(Reg) = bSS(xy) = 1.59 (2,766.00) = 4, SS(Total) = SS(y) = 16, SS(Res) = 16, – 4, = 12, The conclusion: Accept H 0 : β = 0 since F < 5.12

Correlation Tools The estimate of the correlation coefficient is:

r measures linear association r has values between -1 ≤ r ≤ + 1 r  + 1 implies strong positive linear association r  - 1 implies strong negative linear association r  0 implies no linear association The Correlation Coefficient

Note that this calculation requires only the sample estimate r of the correlation coefficient ρ and the sample size n and that we need to use the t distribution with n - 2 degrees of freedom. Correlation Hypothesis Testing The hypothesis of interest deals with whether there is linear association between x and y. If there is no such association, we would have  = 0. Hence, the hypotheses of interest are: H 0 :  = 0 vs. H 1 :   0 which we can test by using the test statistic:

The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic: Test of Correlation between Urinary Oxalate and Fecal Fat, n = 11, r =

The critical region: Reject H 0 :  = 0 if the value calculated for t is not between ± t (9) = The result: r = , n = The conclusion:Accept H 0 :  = 0 since t = 1.77 is between ± t (9) = 2.262

Test of Correlation for Tono-Pen vs. Goldman intraocular pressure, n = 40, r = The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic:

The rejection region: Reject H 0 :  = 0, if t is not between  t (38) = The result: n = 40, r = , r 2 = 0.44, 7. The conclusion: Reject H 0 :  = 0 since t = 5.44 is not between  2.02

Example : AJPH, 1995; 85:

Source: AJPH, 1995; 85:

Test of Correlation between infant mortality rate and gross domestic product, n = 17, r = The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic:

The rejection region: Reject H 0 :  = 0, if t is not between  t 0.975(15) = The result: n = 17, r = The conclusion: Reject H 0 :  = 0 since t = is not between  t (15) = 2.13

Test of correlation hypothesis for life expectancy for males and females, n = 17, r = The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic:

The rejection region: Reject H 0 :  = 0, if t is not between  t (15) = The result: n = 17, r = The conclusion: Reject H 0 :  = 0 since t = 3.49 is not between  t (15) = 2.13

Example : AJPH, 1997; 87:

Correlation between Mortality and Social Mistrust, n = 39, r = The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic:

The rejection region: Reject H 0 :  = 0, if t is not between  t (37) = The result: n = 39, r = The conclusion: Reject H 0 :  = 0 since t = 7.8 is not between  t (37) = 2.02

Example : AJPH, 1998; 88:

Source: AJPH, 1998; 88: Note: very few outliers can have a large impact on the location of the line

Test for Correlation between Gonorrhea rate and Chlamydia rate 1.The hypothesis:H 0 :  = 0 vs. H 1 :  ≠ 0 2. The  level:  = The assumptions:Random sample from bivariate normal distribution. 4. The test statistic:

The rejection region: Reject H 0 :  = 0, if t is not between  t (320)  The result: n = 322, r = The conclusion: Reject H 0 :  = 0 since t = is not between  t (320) = 2.00