Correlation and Covariance

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Statistics for the Social Sciences
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
BHS Methods in Behavioral Sciences I April 21, 2003 Chapter 4 & 5 (Stanovich) Demonstrating Causation.
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Correlation and Regression Analysis
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Chapter 8: Bivariate Regression and Correlation
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Linear Regression and Correlation
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Correlation and regression 1: Correlation Coefficient
MGQ 201 WEEK 4 VICTORIA LOJACONO. Help Me Solve This Tool.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Bivariate Relationships 1 PSYC 4310/6310.
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
What factors are most responsible for height?
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
R Example Descriptive Statistics Frequency and Histogram Diagrams Standard Deviation.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Analysis of Covariance David Markham
Covariance and correlation
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Chapter 3: Variability Mean Sometimes Not Sufficient Frequency Distributions Normal Distribution Standard Deviation.
Basic Statistics Correlation Var Relationships Associations.
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
R-Studio and Revolution Analytics have built additional functionality on top of base R.
Outline Class Intros Overview of Course Example Research Project.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
Chapter 13 Multiple Regression
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Standard Deviation Lecture 18 Sec Tue, Feb 15, 2005.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.

Research Question What determines a person’s height?
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Today: Standard Deviations & Z-Scores Any questions from last time?
What factors are most responsible for height?. Model Specification ERROR??? measurement error model error analysis unexplained unknown unaccounted for.
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
CORRELATION ANALYSIS.

Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height
BPS - 5th Ed. Chapter 231 Inference for Regression.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Vectors and DataFrames. Character Vector: b
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Correlation Prof. Andy Field.
Vectors and DataFrames
Correlation and Covariance
One-Factor Experiments
Presentation transcript:

Correlation and Covariance

Overview Continuous Outcome, Dependent Variable (Y-Axis) Height Histogram Predictor Variable (X-Axis) Scatter Continuous Categorical Boxplot

Independent Variables Y Y Height X1 X2 X3 X4 Independent Variables X’s

Correlation Matrix for Continuous Variables PerformanceAnalytics package chart.Correlation(num2)

Calculating ‘Error’ A deviation is the difference between the mean and an actual data point. Deviations can be calculated by taking each score and subtracting the mean from it: Slide 5

Calculating ‘Error’

Use the Total Error? Deviation Take the error between the mean and the data and add them???? Score Mean Deviation 1 2.6 -1.6 2 -0.6 3 0.4 4 1.4 Total = Slide 7

Sum of Squared Errors Deviation We could add the deviations to find out the total error. Deviations cancel out because some are positive and others negative. Therefore, we square each deviation. If we add these squared deviations we get the sum of squared errors (SS). Why not just absolute value Slide 8

Sum of Squared Errors Score Mean Deviation Squared Deviation 1 2.6 -1.6 2.56 2 -0.6 0.36 3 0.4 0.16 4 1.4 1.96 Total 5.20 Slide 9

Standard Deviation The variance is measured in units squared. This isn’t a very meaningful metric so we take the square root value. This is the standard deviation (s). Slide 10

Variance The sum of squares is a good measure of overall variability, but is dependent on the number of scores. We calculate the average variability by dividing by the number of scores (n). This value is called the variance (s2). Slide 11

Same Mean, Different Standard Deviation Slide 12

Temperature Variation Across Cities Austin Las Vegas San Diego San Francisco http://ramnarasimhan.files.wordpress.com/2012/12/temperature_10_degrees.png Tampa Bay Count of Hours

Covariance Y X Persons 2,3, and 5 look to have similar magnitudes from their means

Covariance Calculate the error [deviation] between the mean and each subject’s score for the first variable (x). Calculate the error [deviation] between the mean and their score for the second variable (y). Multiply these error values. Add these values and you get the cross product deviations. The covariance is the average cross-product deviations:

Do they VARY the same way relative to their own means? Covariance Do they VARY the same way relative to their own means? Age Income Education 7 4 3 1 8 6 5 2 9 2.47

Limitations of Covariance It depends upon the units of measurement. E.g. the covariance of two variables measured in miles might be 4.25, but if the same scores are converted to kilometres, the covariance is 11. One solution: standardize it! normalize the data Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient. It is relatively unaffected by units of measurement.

The Correlation Coefficient

Things to Know about the Correlation It varies between -1 and +1 0 = no relationship It is an effect size ±.1 = small effect ±.3 = medium effect ±.5 = large effect Coefficient of determination, r2 By squaring the value of r you get the proportion of variance in one variable shared by the other.

Correlation Covariance is High: r ~1 Covariance is Low: r ~0

Correlation

Correlation Need inter-item/variable correlations > .30

Framework Source: Hadley Wickham Data Structures numeric vector character vector Dataframe: d <- c(1,2,3,4) e <- c("red", "white", "red", NA) f <- c(TRUE,TRUE,TRUE,FALSE) mydata <- data.frame(d,e,f) names(mydata) <- c("ID","Color","Passed") List: w <- list(name="Fred", age=5.3) Numeric Vector: a <- c(1,2,5.3,6,-2,4) Character Vector: b <- c("one","two","three") Matrix: y<-matrix(1:20, nrow=5,ncol=4) Framework Source: Hadley Wickham

Correlation Matrix

Correlation and Covariance

Revisiting the Height Dataset

Galton: Height Dataset cor() function does not handle Factors cor(heights) Excel correl() does not either Error in cor(heights) : 'x' must be numeric Initial workaround: Create data.frame without the Factors h2 <- data.frame(h$father,h$mother,h$avgp,h$childNum,h$kids) Later we will RECODE the variable into a 0, 1

Histogram of Correlation Coefficients -1 +1

Correlations Matrix: Both Types Zoom in on Gender library(car) scatterplotMatrix(heights)

Correlation Matrix for Continuous Variables PerformanceAnalytics package chart.Correlation(num2)

Categorical: Revisit Box Plot Correlation will depend on spread of distributions Note there is an equation here: Y = mx b Factors/Categorical work with Boxplots; however some functions are not set up to handle Factors

Manual Calculation: Note Stdev is Lower Note that with 0 and 1 the Delta from Mean are low; and Standard Deviation is Lower. Whereas the Continuous Variable has a lot of variation, spread.

Categorical: Recode! Gender recoded as a 0= Female Formula now works! @correl does not work with Factor Variables

Correlation: Continuous & Discrete More examples of cor.test()

Overview Too many variables is difficult to handle Computing power to handle all that data. Principal components analysis seeks to identify and quantify those components by analyzing the original, observable variables In many cases, we can wind up working with just a few— on the order of, say, three to ten—principal components or factors instead of tens or hundreds of conventionally measured variables.

Principal Components Analysis Which component explains the most variance? observable variables vectors Z1 X1 Z2 X2 Z3 X3 Image Source: http://www.quora.com/Machine-Learning/What-is-an-eigenvector-of-a-covariance-matrix

Principal Components Analysis

Principal Components

Correlation  Regression http://en.wikipedia.org/wiki/Genetics