Regression vs. Correlation Both: Two variables Continuous data Regression: Change in X causes change in Y Independent and dependent variables or Predict X based on Y Correlation: No dependence (causation) assumed Estimate the degree to which 2 variables vary together
Correlation: more on bivariate statistics No dependence (causation) assumed Can call variables XY or X 1 X 2 Are to variables independent, or do they covary
Nature of variables Purpose of investigator Y random, X fixed Both random Establish and estimate dependence of Y upon X, describe functional relationship or predict Y from X Model I regression Model II regression, with few exceptions, eg prediction Establish and estimate association (interdependence) between X & Y MeaninglessCorrelation co- efficient, significance only if, normally distributed Adapted from Sokal & Rolf pg 559
X1X1 Y(X 2 ) X1X1 Visualize Correlation positive negative Increase in X associated with increase in Y Increase in X associated with decrease in Y
X1X1 Y(X 2 ) X1X1 No correlation vertical horizontal
r = xy Pearson product-moment correlation coefficient Summed products of deviations of x & y x 2 y 2 (x-xbar) 2 * (y-ybar) 2 = [(x-xbar) *(y-ybar)] ss X * ss Y =
Equivalent calculations (1) r = xy (n-1) s x s y Wheres x = SD X s y = SD Y
(r 2 ) = regression SS total SS (Yi-Ybar)2 (Ŷi-Ybar)2 = Equivalent calculations (2) r= r2 = regression SS total SS
Testing significance: H 0 : r ( ) = 0 Assumes that data come from bivariate normal distribution true population parameter
t = r srsr Reject null if…… t calc > t (2), srsr = 1-r 2 n-2 SE of r
data start; infile 'C:\Documents and Settings\cmayer3\My Documents\teaching\Biostatistics\Lectures\monitoring data for corr.csv' dlm=',' DSD; input year day site $ depth temp DO spCond turb pH Kpar secchi alk Chla; options ls=180; proc print; data one; set start; options ls=100; proc corr; var temp DO spCond turb pH Kpar secchi alk Chla; Correlations on raw data data two; set start; lnturb=log(turb); Create new variables by transformation lnsecchi=log(secchi); lgturb=log10(turb); lgsecchi=log10(secchi); sqturb=sqrt(turb); sqsecchi=sqrt(secchi); proc print; data three; set two; Correlations on transformed data proc corr; var lnturb lnsecchi; proc corr; var lgturb lgsecchi; proc corr; var sqturb sqsecchi; data four; set two; Plot raw and transformed options ls=100; proc plot; plot turb*secchi; plot lnturb*lnsecchi; plot lgturb*lgsecchi; plot sqturb*sqsecchi; run;
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations temp DO spCond turb pH Kpar secchi alk Chla temp DO < < spCond < <.0001 <.0001 < turb < <.0001 < pH < < Kpar <.0001 < <.0001 < secchi <.0001 < <.0001 < alk < <.0001 < Chla < <
Sometimes called distribution free statistics because they do not require that the data fit a normal distribution Many nonparametric procedures are based on ranked data. Data are ranked by ordering them from lowest to highest and assigning them, in order, the integer values from 1 to the sample size. Nonparametric statistics
Some Commonly Used Statistical Tests Normal theory based test Corresponding nonparametric test Purpose of test t test for independent samples Mann-Whitney U test; Wilcoxon rank- sum test Compares two independent samples Paired t test Wilcoxon matched pairs signed-rank test Examines a set of differences Pearson correlation coefficient Spearman rank correlation coefficient Assesses the linear association between two variables. One way analysis of variance (F test) Kruskal-Wallis analysis of variance by ranks Compares three or more groups Two way analysis of variance Friedman Two way analysis of variance Compares groups classified by two different factors From:
Data transformations Data transformation can “correct” deviation from normality and uneven variance (heteroscedasticity) See chapter 13 in Zar Pretty much….. Whatever works, works. Some common ones are for % or proportion use asin of square root log10 for density (#/m 2 ) Right transformation can allow you to use parametric statistics