Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Regression and correlation methods
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Simple Linear Regression
Generalized Linear Models (GLM)
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
PSY 307 – Statistics for the Behavioral Sciences
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
PPA 415 – Research Methods in Public Administration
MARE 250 Dr. Jason Turner Hypothesis Testing III.
SIMPLE LINEAR REGRESSION
Lec 6, Ch.5, pp90-105: Statistics (Objectives) Understand basic principles of statistics through reading these pages, especially… Know well about the normal.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Today Concepts underlying inferential statistics
Business Statistics - QBM117 Statistical inference for regression.
5-3 Inference on the Means of Two Populations, Variances Unknown
Simple Linear Regression and Correlation
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Lecture 5 Correlation and Regression
Correlation & Regression
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 18/08/2015 2:25 AM 1 Data analysis project Proposal must be approved.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Correlation and Linear Regression
Overview of Statistical Hypothesis Testing: The z-Test
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Simple Linear Regression Models
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 1 Review and important concepts Biological.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 23/10/2015 9:22 PM 1 Two-sample comparisons Underlying principles.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter Twelve The Two-Sample t-Test. Copyright © Houghton Mifflin Company. All rights reserved.Chapter is the mean of the first sample is the.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L4.1 Lecture 4: Fitting distributions: goodness of fit l Goodness of fit.
Lecture 10: Correlation and Regression Model.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 24/01/2016 8:44 PM 1 Simple linear regression What regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Statistics for Managers Using Microsoft® Excel 5th Edition
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L11.1 Simple linear regression What regression analysis does The simple.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 28/06/2016 4:11 PM 1 Review and important concepts.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.
Chapter 9 Introduction to the t Statistic
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 06/07/2016 6:16 AM 1 Single classification analysis of variance.
Applied Regression Analysis BUSI 6220
Inference about the slope parameter and correlation
Presentation transcript:

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis Measuring the strength of a correlation Assumptions Confidence intervals and hypothesis testing Comparing correlations Non-parametric correlations Power in correlation analysis The underlying principle of correlation analysis Measuring the strength of a correlation Assumptions Confidence intervals and hypothesis testing Comparing correlations Non-parametric correlations Power in correlation analysis

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.2 The underlying principle of correlation analysis Measures the extent to which two variables covary, in particular, the strength of the linear association between them. No implied causal relationship, therefore there is no distinction between dependent and independent variables. Measures the extent to which two variables covary, in particular, the strength of the linear association between them. No implied causal relationship, therefore there is no distinction between dependent and independent variables. X1X1 X2X2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.3 When do we use correlation? Do use it to determine the strength of association between to variables. Do not use it if you want to predict the value of X given Y, or vice versa. Do use it to determine the strength of association between to variables. Do not use it if you want to predict the value of X given Y, or vice versa. X1X1 X2X2 Correlation X Y Regression

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.4 Simple linear correlation versus simple linear regression Calculations are the same. In correlation analysis, one must sample randomly both X and Y. Correlation deals with association (importance). Regression deals with prediction (intensity). Calculations are the same. In correlation analysis, one must sample randomly both X and Y. Correlation deals with association (importance). Regression deals with prediction (intensity).

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.5 Lab example: fork length and round weight of sturgeon Since the two variables are not causally related, use correlation to measure strength of association.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.6 Regression: fork length and age of sturgeon The two variables are causally related. The relationship between the two provides an estimate of growth rates…...and we can use the relationship to predict the size of sturgeon of a given age. The two variables are causally related. The relationship between the two provides an estimate of growth rates…...and we can use the relationship to predict the size of sturgeon of a given age.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.7 Measuring the strength of a correlation Test statistic is the product-moment correlation coefficient r. X1X1 X2X2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.8 Measuring the strength of a correlation r always lies between -1 and 1. r 2 is the coefficient of determination, which measures the proportion of the variance in X 1 (or X 2 ) “explained” by variation in X 2 or X 1. r always lies between -1 and 1. r 2 is the coefficient of determination, which measures the proportion of the variance in X 1 (or X 2 ) “explained” by variation in X 2 or X 1. X1X1 X2X2 X2X2 X2X2 r = 0.9 r = 0.5 r = 0 r = -0.5 r = -0.9

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.9 Assumptions of correlation analysis I: Bivariate normality For each value of X 1, X 2 values are normally distributed, and vice versa. r = 0.8 r = 0

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.10 Assumptions of correlation analysis II: Homoscedasticity The variance of X 1, given X 2, is independent, and vice versa. But the variances of X 1 and X 2 need not be equal. The variance of X 1, given X 2, is independent, and vice versa. But the variances of X 1 and X 2 need not be equal. X2X2 X1X1 X2X2 Homoscedastic Heteroscedastic

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.11 Assumptions of correlation analysis III: Linearity The relationship between X 1 and X 2 is linear. X2X2 Linear X1X1 X2X2 Nonlinear

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.12 Violation of assumptions: fork length and age of sturgeon Relationship between fork length and age appears non-linear. Variance in fork length appears to increase with age. Relationship between fork length and age appears non-linear. Variance in fork length appears to increase with age.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.13 If parametric correlation assumptions aren’t met... Try transforming the data (e.g. log transform). Try a non-parametric correlation analysis. Try transforming the data (e.g. log transform). Try a non-parametric correlation analysis.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.14 Confidence intervals for correlation coefficients  confidence limit for Z- transformed correlation given by: Convert back to untransformed CI by:  confidence limit for Z- transformed correlation given by: Convert back to untransformed CI by: X2X2 Smaller CI X2X2 X1X1 X2X2 Larger CI

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.15 Hypothesis testing I H 0 :  = 0 Standard error of correlation coefficient given by: Calculate … and compare to t- distribution with N - 2 df. H 0 :  = 0 Standard error of correlation coefficient given by: Calculate … and compare to t- distribution with N - 2 df. X2X2 Reject H 0 X2X2 Accept H 0 X1X1 X2X2 Observed Expected

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.16 Hypothesis testing II H 0 : r =  Transform r and  to Calculate … and compare Z distribution with N - 3 df. H 0 : r =  Transform r and  to Calculate … and compare Z distribution with N - 3 df. X2X2 Reject H 0 X2X2 X1X1 X2X2 Accept H 0 Observed Expected

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.17 Comparing 2 correlations H 0 : r 1 = r  Transform r 1 and r  to: Calculate … and compare to Z distribution. H 0 : r 1 = r  Transform r 1 and r  to: Calculate … and compare to Z distribution. X2X2 Reject H 0 X2X2 X1X1 X2X2 Accept H 0 r1r1 r2r2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.18 Comparing multiple correlations H 0 : r i = r j = r k = … based on n i, n j, n k …observations Z transform all r i s to z i s and calculate … and compare to  2 distribution with df = k -1. H 0 : r i = r j = r k = … based on n i, n j, n k …observations Z transform all r i s to z i s and calculate … and compare to  2 distribution with df = k -1.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.19 Computing common correlations If H 0 : r i = r j = r k = … is accepted, then each r i estimates the same (population) correlation . To calculate , first calculate weighted Z-score z w : If H 0 : r i = r j = r k = … is accepted, then each r i estimates the same (population) correlation . To calculate , first calculate weighted Z-score z w : Then back-transform to get  X2X2 X1X1 X2X2 Accept H 0 r1r1 r2r2 r3r3

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.20 Non-parametric correlations Use when one or more assumptions are not met. Essentially a parametric correlation of the ranks. Most common statistic is Spearman rank correlation. Use when one or more assumptions are not met. Essentially a parametric correlation of the ranks. Most common statistic is Spearman rank correlation. X2X2 X1X1 Rank X 1 Rank X 2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.21 Power and sample size in correlation If we test H 0 :  = 0 with sample size n, we can determine 1 -  by using the Z-transformation for critical values (for given  ) of the true correlation  (z  ) and sample correlation r (z r ). X1X1 X2X2 Z Probability 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.22 Power and sample size in correlation Once Z  (1) is determined, we can calculate the probability of obtaining a Z-value of this size or greater, i.e. . Power is then 1- . Once Z  (1) is determined, we can calculate the probability of obtaining a Z-value of this size or greater, i.e. . Power is then 1- . X1X1 X2X2 Z Probability 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.23 Power and sample size in correlation: an example Correlation of wing length and tail length of a sample of 12 birds so 1 -  = 0.98 Correlation of wing length and tail length of a sample of 12 birds so 1 -  = 0.98

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.24 Minimal sample size Given desired power 1 - , how large a sample is required to reject H 0 :  = 0 if it is false with a specified    Calculate: Given desired power 1 - , how large a sample is required to reject H 0 :  = 0 if it is false with a specified    Calculate: X2X2 Reject H 0 ? X2X2 X1X1 X2X2 Observed Expected

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.25 Minimal sample size: an example We want to reject H 0 :  = 0 99% of the time when |    > 0.5  and   (2)  =.05  So  (1) =.01 and for r =.50, we have... We want to reject H 0 :  = 0 99% of the time when |    > 0.5  and   (2)  =.05  So  (1) =.01 and for r =.50, we have... Hence So, a sample size of at least 64 should be used.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.26 Power and sample size in comparing 2 correlations Power of a test for difference between two correlation coefficients is 1- , where  is one-tailed probability of: X2X2 Reject H 0 X2X2 X1X1 X2X2 Accept H 0 r1r1 r2r2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.27 An example What is power to detect a difference? From table of normal deviates, What is power to detect a difference? From table of normal deviates, So, power = 0.22