Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)

Similar presentations


Presentation on theme: "Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)"— Presentation transcript:

1 Correlation and Covariance

2 Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)

3 Correlation Covariance is High: r ~1 Covariance is Low: r ~0

4 It varies between -1 and +1 0 = no relationship It is an effect size ±.1 = small effect ±.3 = medium effect ±.5 = large effect Coefficient of determination, r 2 By squaring the value of r you get the proportion of variance in one variable shared by the other. Things to Know about the Correlation

5 Variables Y X’s Height Independent Variables Dependent Variables Y X4 X3 X2X1

6

7 Little Correlation

8 Correlation is For Linear Relationships

9 Outliers Can Skew Correlation Values

10 Correlation and Regression Are Related

11 Covariance Y X Persons 2,3, and 5 look to have similar magnitudes from their means

12 Covariance Calculate the error [deviation] between the mean and each subject’s score for the first variable (x). Calculate the error [deviation] between the mean and their score for the second variable (y). Multiply these error values. Add these values and you get the cross product deviations. The covariance is the average cross-product deviations:

13 Covariance AgeIncomeEducation 743 418 635 861 857 729 533 958 745 822 952 842 923 847 314 313 826 125 317 633 Do they VARY the same way relative to their own means? 2.47

14 It depends upon the units of measurement. E.g. the covariance of two variables measured in miles might be 4.25, but if the same scores are converted to kilometres, the covariance is 11. One solution: standardize it! normalize the data Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient. It is relatively unaffected by units of measurement. Limitations of Covariance

15 The Correlation Coefficient

16 Correlation Covariance is High: r ~1 Covariance is Low: r ~0

17 Correlation

18 Need inter-item/variable correlations >.30

19 Character Vector: b <- c("one","two","three") numeric vector character vector Numeric Vector: a <- c(1,2,5.3,6,-2,4) Matrix: y<-matrix(1:20, nrow=5,ncol=4) Dataframe: d <- c(1,2,3,4) e <- c("red", "white", "red", NA) f <- c(TRUE,TRUE,TRUE,FALSE) mydata <- data.frame(d,e,f) names(mydata) <- c("ID","Color","Passed") List: w <- list(name="Fred", age=5.3) Data Structures Framework Source: Hadley Wickham

20 Correlation Matrix

21 Correlation and Covariance

22 Revisiting the Height Dataset

23 Galton: Height Dataset cor(heights) Error in cor(heights) : 'x' must be numeric Initial workaround: Create data.frame without the Factors h2 <- data.frame(h$father,h$mother,h$avgp,h$childNum,h$kids) cor() function does not handle Factors Later we will RECODE the variable into a 0, 1 Excel correl() does not either

24 Histogram of Correlation Coefficients +1

25 Correlations Matrix: Both Types library(car) scatterplotMatrix(heights) Zoom in on Gender

26 Correlation Matrix for Continuous Variables chart.Correlation(num2) PerformanceAnalytics package

27 Categorical: Revisit Box Plot Factors/Categorical work with Boxplots; however some functions are not set up to handle Factors Note there is an equation here: Y = mx b Correlation will depend on spread of distributions

28 Manual Calculation: Note Stdev is Lower Note that with 0 and 1 the Delta from Mean are low; and Standard Deviation is Lower. Whereas the Continuous Variable has a lot of variation, spread.

29 Categorical: Recode! Gender recoded as a 0= Female 1 = Male @correl does not work with Factor Variables Formula now works!

30 Correlation: Continuous & Discrete More examples of cor.test()

31 Correlation  Regression

32 Continuous Categorical Continuous Categorical Histogram Scatter Bar Cross Table Boxplot Predictor Variable (X-Axis) Pie Mosaic Cross Table Linear Regression Logistic Regression Regression Model Parents Height Gender Frequency 0 1 Outcome, Dependent Variable (Y-Axis) Mean, Median, Standard Deviation Proportions Summary


Download ppt "Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)"

Similar presentations


Ads by Google