Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)

Slides:



Advertisements
Similar presentations
Which Test? Which Test? Explorin g Data Explorin g Data Planning a Study Planning a Study Anticipat.
Advertisements

Rubric Unit Plan Univariate Bivariate Examples Resources Curriculum Statistically Thinking A study of univariate and bivariate statistics.
Kin 304 Regression Linear Regression Least Sum of Squares
Overview Correlation Regression -Definition
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Statistics Psych 231: Research Methods in Psychology.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Very Basic Statistics.
Social Research Methods
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Correlation and regression 1: Correlation Coefficient
Programming in R Describing Univariate and Multivariate data.
MGQ 201 WEEK 4 VICTORIA LOJACONO. Help Me Solve This Tool.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Bivariate Relationships 1 PSYC 4310/6310.
Correlation and Covariance
What factors are most responsible for height?
R Example Descriptive Statistics Frequency and Histogram Diagrams Standard Deviation.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Welcome to Math 6 Statistics: Use Graphs to Show Data Histograms.
Correlation.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
AP Stats Chapter 1 Review. Q1: The midpoint of the data MeanMedianMode.
Outline Class Intros Overview of Course & Series Example Research Projects Beginning R.
A Few Handful Many Time Stamps One Time Snapshot Many Time Series Number of Variables Mobile Phone Galton Height Census Titanic Survivors Stock Market.
R-Studio and Revolution Analytics have built additional functionality on top of base R.
Sec 1.5 Scatter Plots and Least Squares Lines Come in & plot your height (x-axis) and shoe size (y-axis) on the graph. Add your coordinate point to the.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

Research Question What determines a person’s height?
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
What factors are most responsible for height?. Model Specification ERROR??? measurement error model error analysis unexplained unknown unaccounted for.
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
CORRELATION ANALYSIS.

Method 3: Least squares regression. Another method for finding the equation of a straight line which is fitted to data is known as the method of least-squares.
Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height
Displaying Distribution with Graphs Section 1.1. September 18, 2015 Objectives: 1.Describe what is meant by exploratory data analysis. 2.Explain what.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Homework solution#1 Q1: Suppose you have a sample from Palestine University and the distribution of the sample as: MedicineDentistEngineeringArtsCommerce.
Vectors and DataFrames. Character Vector: b
Quiz.
Correlation Prof. Andy Field.
Anticipating Patterns Statistical Inference
MATH-138 Elementary Statistics
Review 1. Describing variables.
Correlation, Bivariate Regression, and Multiple Regression
Correlation – Regression
How could data be used in an EPQ?
Treat everyone with sincerity,
The greatest blessing in life is
Vectors and DataFrames
Part I Review Highlights, Chap 1, 2
Ten things about Descriptive Statistics
Correlation and Covariance
Presentation transcript:

Correlation and Covariance

Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)

Correlation Covariance is High: r ~1 Covariance is Low: r ~0

It varies between -1 and +1 0 = no relationship It is an effect size ±.1 = small effect ±.3 = medium effect ±.5 = large effect Coefficient of determination, r 2 By squaring the value of r you get the proportion of variance in one variable shared by the other. Things to Know about the Correlation

Variables Y X’s Height Independent Variables Dependent Variables Y X4 X3 X2X1

Little Correlation

Correlation is For Linear Relationships

Outliers Can Skew Correlation Values

Correlation and Regression Are Related

Covariance Y X Persons 2,3, and 5 look to have similar magnitudes from their means

Covariance Calculate the error [deviation] between the mean and each subject’s score for the first variable (x). Calculate the error [deviation] between the mean and their score for the second variable (y). Multiply these error values. Add these values and you get the cross product deviations. The covariance is the average cross-product deviations:

Covariance AgeIncomeEducation Do they VARY the same way relative to their own means? 2.47

It depends upon the units of measurement. E.g. the covariance of two variables measured in miles might be 4.25, but if the same scores are converted to kilometres, the covariance is 11. One solution: standardize it! normalize the data Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient. It is relatively unaffected by units of measurement. Limitations of Covariance

The Correlation Coefficient

Correlation Covariance is High: r ~1 Covariance is Low: r ~0

Correlation

Need inter-item/variable correlations >.30

Character Vector: b <- c("one","two","three") numeric vector character vector Numeric Vector: a <- c(1,2,5.3,6,-2,4) Matrix: y<-matrix(1:20, nrow=5,ncol=4) Dataframe: d <- c(1,2,3,4) e <- c("red", "white", "red", NA) f <- c(TRUE,TRUE,TRUE,FALSE) mydata <- data.frame(d,e,f) names(mydata) <- c("ID","Color","Passed") List: w <- list(name="Fred", age=5.3) Data Structures Framework Source: Hadley Wickham

Correlation Matrix

Correlation and Covariance

Revisiting the Height Dataset

Galton: Height Dataset cor(heights) Error in cor(heights) : 'x' must be numeric Initial workaround: Create data.frame without the Factors h2 <- data.frame(h$father,h$mother,h$avgp,h$childNum,h$kids) cor() function does not handle Factors Later we will RECODE the variable into a 0, 1 Excel correl() does not either

Histogram of Correlation Coefficients +1

Correlations Matrix: Both Types library(car) scatterplotMatrix(heights) Zoom in on Gender

Correlation Matrix for Continuous Variables chart.Correlation(num2) PerformanceAnalytics package

Categorical: Revisit Box Plot Factors/Categorical work with Boxplots; however some functions are not set up to handle Factors Note there is an equation here: Y = mx b Correlation will depend on spread of distributions

Manual Calculation: Note Stdev is Lower Note that with 0 and 1 the Delta from Mean are low; and Standard Deviation is Lower. Whereas the Continuous Variable has a lot of variation, spread.

Categorical: Recode! Gender recoded as a 0= Female 1 = does not work with Factor Variables Formula now works!

Correlation: Continuous & Discrete More examples of cor.test()

Correlation  Regression

Continuous Categorical Continuous Categorical Histogram Scatter Bar Cross Table Boxplot Predictor Variable (X-Axis) Pie Mosaic Cross Table Linear Regression Logistic Regression Regression Model Parents Height Gender Frequency 0 1 Outcome, Dependent Variable (Y-Axis) Mean, Median, Standard Deviation Proportions Summary