Chapter 11 Association between two variables 第十一章 : 两变量关联性分析.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Correlation and Regression
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Simple Linear Regression Analysis
Correlation & Regression Math 137 Fresno State Burger.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Correlation Scatter Plots Correlation Coefficients Significance Test.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Regression Analysis (2)
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Covariance and correlation
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Is there a relationship between the lengths of body parts ?
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 14 Simple Regression
EQT 272 PROBABILITY AND STATISTICS
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Chapter 10 Correlation and Regression
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
Correlation & Regression
Examining Relationships in Quantitative Research
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December am – 11 am Puan Hasmawati Binti Hassan
Chapter 5: Introductory Linear Regression. INTRODUCTION TO LINEAR REGRESSION Regression – is a statistical procedure for establishing the relationship.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Principles of Biostatistics Chapter 17 Correlation 宇传华 网上免费统计资源(八)
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Regression and Correlation
Is there a relationship between the lengths of body parts?
Correlation & Regression
Correlation – Regression
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
CHAPTER fourteen Correlation and Regression Analysis
Correlation and Regression
Statistical Inference about Regression
Linear Regression and Correlation
بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.
Correlation & Regression
Presentation transcript:

Chapter 11 Association between two variables 第十一章 : 两变量关联性分析

§ 1 Simple Linear Correlation: r §2 Rank correlation: r s §3 Association between two categorical variables (r,τ) §4 Case discussion CONTENTS

Terminology Association 关联性 correlation 相 关 Intensity 密切程度 ( 强度 ) correlation coefficient 相关系数 Pearson correlation coefficient Pearson 相关系数 product moment correlation coefficient 积矩相关系数 Rank coefficient 秩相关系数 Spearman correlation coefficient Spearman 相关系数

§1 Simple Linear Correlation 1. Concept and descriptive statistic of relationship between two continuous variables 2. Statistical inference about correlation coefficient 3. Notes in application

1.Concept and descriptive statistic of relationship between two continuous variables The linear relationship between X and Y X,Y: random variables following normal distribution. both X and Y are measured from the same subject.

Cases on Studying Relationships Relationship between salt intake (X) and blood pressure (Y); Relationship between blood pressure (X) and body mass index (BMI) (Y); Relationship between height (X) and weight (Y); Relationship between blood pressure (X) and age (Y); ···

Correlation: to determine strength and direction of relationship between different variables Regression: to make predictions from one variable to the other based on the functional relationship between the variables Correlational techniques

Table 11-1 Concentration of thrombin (  /ml) and blood clotting time (second) from 15 healthy adults (Example 11-1) Subject concentration of thrombin Clotting time(S)

Figure 11-1 Scatter plot of relationship between thrombin concentration and clotting time

a: Positive correlation f: Non-linear correlation d: Perfect negative correlation b: perfect positive correlation e: Zero correlationc: Negative correlation Figure 11-2 Common types of relationship between two variables

Simple linear correlation coefficient ( r) Synonyms: Pearson correlation coefficient product moment correlation coefficient Definition: A statistical index to describe the intensity and the direction of association between two variables.

Symbol r: sample statistic;  : population parameter Equation for calculating correlation coefficient

0<r<1: positive correlation -1<r<0: negative correlation r=1: complete positive correlation r=-1: complete negative correlation r=0: zero correlation (  no correlation) r=0: zero correlation Intensity ----absolute value of rDirection----sign of r

Intensity ----absolute value of r r1 stronger linear association r 0 weaker linear association Direction----sign of r +: positive correlation -: negative correlation

Procedure of calculating correlation coefficient 1) Graphing “scatter plot”: linear trend 2) Calculation of r

Subject i Concentration of thrombin x (u/ml) Clotting time y (second) x2x2 y2y2 x×y x×y sum

l XX =0.404 , l YY = , l XY = ) Calculation of r X,Y : stronger negative relationship

2. Statistical inference about correlation coefficient--- hypothesis test 1)Establish testing hypothesis, determining significant level α H 0 :  =0 no linear association between X and Y H 1 :  ≠0 linear association between X and Y exists  =0.05 two-sided probability of type I error

2) Calculating statistic Method 1: t-test =n-2 For example 11-1, =15-2=13 From t distribution table, the critical value is t 0.05/2(13) =2.160 < |t|=8.874,  P<0.05, correlation coefficient is statistically significant at α=0.05. thrombin concentration and clotting time are negatively related.

Method 2: Consulting Appendix Table 13 in page 486, The r-critical value: r 0.05(13) =0.514 < |r|=0.926, P<0.05 The conclusion is that there is linear association between the clotting time and thrombin concentration.

R.A. Fisher’s Z transformation

Fisher’s transformation normal distribution CI for Z CI for  Calculation of CI for  r z

Example 3(cont. of example 2) The researcher got a sample correlation coefficient r=0.82 (P<0.01), he wants to estimate the strength of correlation further.

Conclusion: the 99% CI of correlation coefficient between forearm length and height is (0.184, 0.972). 99% CI for Z: (0.186,2.134);

§ 2. Spearman Rank Correlation ( Spearman 秩相关) 1.Calculation of Spearman correlation coefficient 2. Statistical inference about Spearman correlation coefficient

Terminology: rank 秩,等级 rank correlation coefficient 秩相关系数 Spearman rank correlation is applied if two variables are distributed far from normal. Nonparametric method

1. Spearman Rank Correlation Coefficient (r s ) Rank ordering according to its magnitude of values for each of the two variables based on the ranks Calculating the Spearman rank correlation coefficient based on the ranks

Table 11-2 hemorrhage degrees and thrombocyte counts (109/L) from 12 children of acute leukemia Patientplatelet Rank: p x (px)2(px)2 bleeding Rank: p y (py)2(py)2 p x *p y (1)(2) (3)(4)(5)(6)(7)(8) – – – – – – total For equal ranks, mean rank is used instead. Six ‘–’s, mean=( )/6=3.5

Calculation of r s (numerical values are from Table 11-2) PatientplateletRank: p x (p x ) 2 bleedingRank: q y (q y ) 2 p x *q y (1)(2) (3)(4)(5)(6)(7)(8) total (Page 212)

(1) - 1≤r s ≤1 and similar meaning as r does (2) Difference between r s and r. r s ≠ r Calculated by ranks Calculated by original values of data Explanation of Spearman rank correlation coefficient: r s

2. Statistical inference about Spearman rank correlation coefficient: r s 1) Setting up hypothesis, determining significant level H 0 :  s =0 H 1 :  s  0  =0.05/2 2) Calculating test statistic and obtain critical value: a) Consulting Appendix Table 14:Critical value of r s if n≤50 b) Calculating t by equations 11-5 and 11-6 if n>50 a) r s =-0.422, n=12, from appendix Table 14 in page 487, the critical value: r s,0.05(12) =0.587> |r s |=0.422,  P>0.05, failed to reject H 0

b)Calculating t by equations 11-5 and 11-6 : if n>50 For illustration, r s = is used to calculate t value as follows: 3) Conclusion: No association between hemorrhage degrees and thrombocyte counts. (The same conclusion has been obtained.)

§ 3. Association between two categorical variables (r, τ) 1.Association measures for 2 ×2 Table of cross classification data. 2. Association measures for 2 ×2 Table with pair-designed data 3. Association measures for R×C Table of cross classification data

1. Association measures for 2 ×2 Table of cross classification data. Table 11-3 diarrhea and feeding patterns of infants feeding pattern diarrhea Total YesNo Artificial feeding Breast feeding Total473582

feeding pattern diarrhea Total YesNo Artificial feeding Breast feeding Total Table 11-4 Data layout for 2 by 2 cross classification (Actual: A ij,Probability:  ij, i,j=1,2) Variable X (i th row) Variable Y (j th column) Total Y1Y1 Y2Y2 X1X1 A 11 (  11 )A 12 (  12 )n1. (1.)n1. (1.) X2X2 A 21 (  21 )A 12 (  12 )n 2. (  2.) Total n. 1 ( . 1 )n. 2 ( . 2 )n (  =1.0) Under H 0,  i.≈n i./n, . j ≈n. j /n. Observed number in cell (i,j): A ij, i,j=1,2. Under independence, the joint probability of a particular combination of results by the multiplication rule is:  ij =  i.× . j (11-7) or Expected number: T ij = n i.×n. j /n (11-8)

1)Hypothesis test: Null hypothesis: H 0 : independence between the two variables H 1 : Association between the two variables feeding pattern diarrhea Total YesNo Artificial feeding30(22.93)10(17.07)40 Breast feeding17(24.07)25(17.93)42 Total ) Test statistic: 3) Pearson’s contingency coefficient: There is week association.

2. Association measures for 2 ×2 Table of pair-designed data Table 11-5 Results of bacillus diphtheriae in two Culture mediums (From example 11-7) Culture medium A Culture medium B Total Total243256

1)Hypothesis test: Null hypothesis: H 0 : independence between the two mediums H 1 : association between the two mediums 2) Test statistic: 3) Pearson’s contingency coefficient: There is week association. Culture medium A Culture medium BTotal Total243256

3. Association measures for R×C Table of cross classified data Table 11-6 cross classification by type of thyroid enlargement and ancestral residence ancestral residence types of thyroid enlargement Total widespreadnodularmixed A B C Total Question: Does type of thyroid enlargement associate with ancestral residence?

1)Establish testing hypothesis: Null hypothesis: H 0 : independence between the two variables H 1 : association between the two variables 2) Test statistic: formula (7-10) 3) Contingency coefficient: by using formula (11-9) 4)Conclusion: Type of thyroid enlargement associates with ancestral residence.

Notes in application 1. r=0 does not mean zero correlation (might be non- linear correlation) 2. It is not suitable to make correlation analysis when levels of either variable X or Y are artificially selected. 3. Outliers can influence correlation coefficient heavily. 4. Correlation  cause-effect association, Correlation  intrinsic association. 5. The difference between statistical significance and intensity of correlation: Statistical significance of correlation coefficient --- the probability of r from the population  =0 is small. Intensity of correlation ----the absolute value of r

(a) Zero Correlation changed to Strong Correlation Degree of correlation is influenced by the extreme value (outlier).

(b) Strong Correlation changed to Zero Correlation Note: Scatter diagram can help you find the outliers.

A survey on relationship between student ’ s height and family income in a primary school after poolingin each stratum

You may miss another type of relationship. No linear correlational relationship (P>  ) does not mean zero correlation.

The SAS-CORR Procedure PROC CORR DATA=SAS-data-set; VAR variable1 variable2; RUN;

1. Simple linear correlation coefficient: r 2. Spearman rank correlation coefficient: r s 3. Association between two categorical variables (r orτ) SUMMARY

Assignments 1. Patterns of relationship between two continuous variables. 2. Properties of simple linear correlation coefficient r. 3. How many kinds of correlation coefficients are there? What type of variables is required for each of these correlation coefficients. 4. 1, 5, 6. (pp )

Reading materials (1) 《卫生统计学》(主编:方积乾) 第十一章 ( 仇小强 王彤编写) (pp ) (2) 《 Biostatistical Analysis 》 Chapter 19: 19.1, 19.2, 19.9, (pp ):

50 Thanks for attention !