Download presentation
Presentation is loading. Please wait.
1
Regression vs. Correlation Both: Two variables Continuous data Regression: Change in X causes change in Y Independent and dependent variables or Predict X based on Y Correlation: No dependence (causation) assumed Estimate the degree to which 2 variables vary together
2
Correlation: more on bivariate statistics No dependence (causation) assumed Can call variables XY or X 1 X 2 Are to variables independent, or do they covary
3
Nature of variables Purpose of investigator Y random, X fixed Both random Establish and estimate dependence of Y upon X, describe functional relationship or predict Y from X Model I regression Model II regression, with few exceptions, eg prediction Establish and estimate association (interdependence) between X & Y MeaninglessCorrelation co- efficient, significance only if, normally distributed Adapted from Sokal & Rolf pg 559
4
X1X1 Y(X 2 ) X1X1 Visualize Correlation positive negative Increase in X associated with increase in Y Increase in X associated with decrease in Y
5
X1X1 Y(X 2 ) X1X1 No correlation vertical horizontal
6
r = xy Pearson product-moment correlation coefficient Summed products of deviations of x & y x 2 y 2 (x-xbar) 2 * (y-ybar) 2 = [(x-xbar) *(y-ybar)] ss X * ss Y =
7
Equivalent calculations (1) r = xy (n-1) s x s y Wheres x = SD X s y = SD Y
8
(r 2 ) = regression SS total SS (Yi-Ybar)2 (Ŷi-Ybar)2 = Equivalent calculations (2) r= r2 = regression SS total SS
9
Testing significance: H 0 : r ( ) = 0 Assumes that data come from bivariate normal distribution true population parameter
10
t = r srsr Reject null if…… t calc > t (2), srsr = 1-r 2 n-2 SE of r
11
data start; infile 'C:\Documents and Settings\cmayer3\My Documents\teaching\Biostatistics\Lectures\monitoring data for corr.csv' dlm=',' DSD; input year day site $ depth temp DO spCond turb pH Kpar secchi alk Chla; options ls=180; proc print; data one; set start; options ls=100; proc corr; var temp DO spCond turb pH Kpar secchi alk Chla; Correlations on raw data data two; set start; lnturb=log(turb); Create new variables by transformation lnsecchi=log(secchi); lgturb=log10(turb); lgsecchi=log10(secchi); sqturb=sqrt(turb); sqsecchi=sqrt(secchi); proc print; data three; set two; Correlations on transformed data proc corr; var lnturb lnsecchi; proc corr; var lgturb lgsecchi; proc corr; var sqturb sqsecchi; data four; set two; Plot raw and transformed options ls=100; proc plot; plot turb*secchi; plot lnturb*lnsecchi; plot lgturb*lgsecchi; plot sqturb*sqsecchi; run;
12
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations temp DO spCond turb pH Kpar secchi alk Chla temp 1.00000 -0.21792 0.06538 -0.14523 0.35328 -0.23911 0.15689 0.11311 0.37612 0.0302 0.5202 0.1515 0.0003 0.1541 0.1209 0.3895 0.0001 99 99 99 99 99 37 99 60 99 DO -0.21792 1.00000 0.01542 -0.21550 0.50679 -0.24013 -0.06504 0.15790 0.38699 0.0302 0.8796 0.0322 <.0001 0.1523 0.5224 0.2282 <.0001 99 99 99 99 99 37 99 60 99 spCond 0.06538 0.01542 1.00000 0.48214 -0.29017 0.78394 -0.51332 0.74021 0.21367 0.5202 0.8796 <.0001 0.0036 <.0001 <.0001 <.0001 0.0337 99 99 99 99 99 37 99 60 99 turb -0.14523 -0.21550 0.48214 1.00000 -0.33727 0.89941 -0.50336 0.47441 0.07208 0.1515 0.0322 <.0001 0.0006 <.0001 <.0001 0.0001 0.4783 99 99 99 99 99 37 99 60 99 pH 0.35328 0.50679 -0.29017 -0.33727 1.00000 -0.56355 0.14049 -0.14061 0.61033 0.0003 <.0001 0.0036 0.0006 0.0003 0.1654 0.2839 <.0001 99 99 99 99 99 37 99 60 99 Kpar -0.23911 -0.24013 0.78394 0.89941 -0.56355 1.00000 -0.76680 0.85542 0.04579 0.1541 0.1523 <.0001 <.0001 0.0003 <.0001 <.0001 0.7878 37 37 37 37 37 37 37 29 37 secchi 0.15689 -0.06504 -0.51332 -0.50336 0.14049 -0.76680 1.00000 -0.49649 -0.30918 0.1209 0.5224 <.0001 <.0001 0.1654 <.0001 <.0001 0.0018 99 99 99 99 99 37 99 60 99 alk 0.11311 0.15790 0.74021 0.47441 -0.14061 0.85542 -0.49649 1.00000 0.12410 0.3895 0.2282 <.0001 0.0001 0.2839 <.0001 <.0001 0.3448 60 60 60 60 60 29 60 60 60 Chla 0.37612 0.38699 0.21367 0.07208 0.61033 0.04579 -0.30918 0.12410 1.00000 0.0001 <.0001 0.0337 0.4783 <.0001 0.7878 0.0018 0.3448 99 99 99 99 99 37 99 60 99
13
Sometimes called distribution free statistics because they do not require that the data fit a normal distribution Many nonparametric procedures are based on ranked data. Data are ranked by ordering them from lowest to highest and assigning them, in order, the integer values from 1 to the sample size. Nonparametric statistics
14
Some Commonly Used Statistical Tests Normal theory based test Corresponding nonparametric test Purpose of test t test for independent samples Mann-Whitney U test; Wilcoxon rank- sum test Compares two independent samples Paired t test Wilcoxon matched pairs signed-rank test Examines a set of differences Pearson correlation coefficient Spearman rank correlation coefficient Assesses the linear association between two variables. One way analysis of variance (F test) Kruskal-Wallis analysis of variance by ranks Compares three or more groups Two way analysis of variance Friedman Two way analysis of variance Compares groups classified by two different factors From: http://www.tufts.edu/~gdallal/npar.htm
15
Data transformations Data transformation can “correct” deviation from normality and uneven variance (heteroscedasticity) See chapter 13 in Zar Pretty much….. Whatever works, works. Some common ones are for % or proportion use asin of square root log10 for density (#/m 2 ) Right transformation can allow you to use parametric statistics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.