Correlation ... beware.

Slides:



Advertisements
Similar presentations
9: Examining Relationships in Quantitative Research ESSENTIALS OF MARKETING RESEARCH Hair/Wolfinbarger/Ortinau/Bush.
Advertisements

Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.
Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Regression Analysis: How to DO It Example: The “car discount” dataset.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Relationship of two variables
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
The Gotham City Motor Pool Gotham City maintains a fleet of automobiles in a special motor pool. These cars are used by the various city agencies when.
Examining Relationships in Quantitative Research
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Examining Relationships in Quantitative Research
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Methods of Presenting and Interpreting Information Class 9.
Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 4: Basic Estimation Techniques
Correlation and Linear Regression
Regression Must have Interval or Ratio scaled measures
Regression Analysis.
Chapter 4 Basic Estimation Techniques
assignment 7 solutions ► office networks ► super staffing
REGRESSION (R2).
CHAPTER 7 LINEAR RELATIONSHIPS
Basic Estimation Techniques
QM222 Class 11 Section A1 Multiple Regression
Regression Analysis.
Understanding Standards Event Higher Statistics Award
Correlation A Lecture for the Intro Stat Course
Modeling: Variable Selection
Shapley Value Regression
Simple Linear Regression
Regression and Residual Plots
Regression Analysis: How to DO It
Basic Estimation Techniques
I271B Quantitative Methods
Regression Analysis Week 4.
Chapter 3: Describing Relationships
LEARNING OUTCOMES After studying this chapter, you should be able to
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
CHAPTER 14 MULTIPLE REGRESSION
Applied Economic Analysis
Product moment correlation
An Introduction to Correlational Research
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
SIMPLE LINEAR REGRESSION
Chapter 3: Describing Relationships
Multiple Regression – Split Sample Validation
Regression Analysis.
Correlations: Correlation Coefficient:
Correlation & Trend Lines
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
BEC 30325: MANAGERIAL ECONOMICS
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Correlation ... beware

Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between 1 and -1.

Interpretation Correlation measures the strength of the linear relationship between two variables. Strength not the slope Linear misses nonlinearities completely Two shows only “shadows” of multidimensional relationships

A correlation of +1 would arise only if all of the points lined up perfectly. Stretching the diagram horizontally or vertically would change the perceived slope, but not the correlation.

A positive correlation signals that large values of one variable are typically associated with large values of the other. Correlation measures the “tightness” of the clustering about a single line.

A negative correlation signals that large values of one variable are typically associated with small values of the other.

Independent random variables have a correlation of 0.

But a correlation of 0 most certainly does not imply independence. Indeed, correlations can completely miss nonlinear relationships.

Correlations Show (only) Two-Dimensional Shadows In the motorpool case, the correlations between Age and Cost, and between Make and Cost, show precisely what the manager’s two-dimensional tables showed: There’s little linkage directly between Age and Cost. Fords had higher average costs than did Hondas. But each of these facts is due to the confounding effect of Mileage! The pure effect of each variable on its own is only revealed in the most-complete model.   Costs Mileage Age Make 1.000 0.771 0.023 -0.240 -0.496 -0.478 0.164

Tilting at Shadows (received via email from a former student, all employer references removed) “One of the pieces of the research is to identify key attributes that drive customers to choose a vendor for buying office products. “The market research guy that we have hired (he is an MBA/PhD from Wharton) says the following: “‘I can determine the relative importance of various attributes that drive overall satisfaction by running a correlation of each one of them against overall satisfaction score and then ranking them based on the (correlation) coefficient scores.’ “I am not really certain if we can do that. I would tend to think we should run a regression to get relative weightage.”

Customer Satisfaction Consider overall customer satisfaction (on a 100-point scale) with a Web-based provider of customized software as the order leadtime (in days), product acquisition cost, and availability of online order-tracking (0 = not available, 1 = available) vary. Here are the correlations: Customers forced to wait are unhappy. Those without access to online order tracking are more satisfied. Those who pay more are somewhat happier. ????? Correlations with Satisfaction leadtime -0.766 ol-tracking -0.242 cost 0.097

The Full Regression Regression: satisfaction constant leadtime cost ol-track coefficient 192.7338 -6.8856 -1.8025 8.5599 std error of coef 16.1643 0.5535 0.3137 4.0729 t-ratio 11.9234 -12.4391 -5.7453 2.1017 significance 0.0000% 4.0092% beta-weight -1.0879 -0.4571 0.1586 standard error of regression 13.9292 coefficient of determination 75.03% adjusted coef of determination 73.70% Customers dislike high cost, and like online order tracking. Why does customer satisfaction vary? Primarily because leadtimes vary; secondarily, because cost varies.

Reconciliation satisfaction leadtime cost ol-tracking 1.000 -0.766 -0.097 -0.242 -0.543 0.465 -0.230 Customers can pay extra for expedited service (shorter leadtime at moderate extra cost), or for express service (shortest leadtime at highest cost) Those who chose to save money and wait longer ended up (slightly) regretting their choice. Most customers who chose rapid service weren’t given access to order tracking. They didn’t need it, and were still happy with their fast deliveries.

Finally … The correlations between the explanatory variables can help flesh out the “story.” In a “simple” (i.e., one explanatory variable) regression: The (meaningless) beta-weight is the correlation between the two variables. The square of the correlation is the unadjusted coefficient of determination (r-squared). If you give me a correlation, I’ll interpret it by squaring it and looking at it as a coefficient of determination.

A Pharmaceutical Ad Diagnostic scores from sample of patients receiving psychiatric care So, if your patients have anxiety problems, consider prescribing our antidepressant!

Evaluation At most 49% of the variability in patients’ anxiety levels can potentially be explained by variability in depression levels. “potentially” = might actually be explained by something else which covaries with both. The regression provides no evidence that changing a patient’s depression level will cause a change in their anxiety level.

Association vs. Causality Polio and Ice Cream Regression (and correlation) deal only with association Example: Greater values for annual mileage are typically associated with higher annual maintenance costs. No matter how “good” the regression statistics look, they will not make the case that greater mileage causes greater costs. If you believe that driving more during the year causes higher costs, then it’s fine to use regression to estimate the size of the causal effect. Evidence supporting causality comes only from controlled experimentation. This is why macroeconomists continue to argue about which aspects of public policy are the key drivers of economic growth. It’s also why the cigarette companies won all the lawsuits filed against them for several decades.