Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.
7.1 Seeking Correlation LEARNING GOAL
Correlation and regression Dr. Ghada Abo-Zaid
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Scatter Diagrams and Linear Correlation
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Describing the Relation Between Two Variables
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Return, Risk, and the Security Market Line
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
REGRESSION AND CORRELATION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Regression Analysis: How to DO It Example: The “car discount” dataset.
1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.
SIMPLE LINEAR REGRESSION
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Chapter 7 Correlational Research Gay, Mills, and Airasian
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Relationship of two variables
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Basic linear regression and multiple regression Psych Fraley.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
The Gotham City Motor Pool Gotham City maintains a fleet of automobiles in a special motor pool. These cars are used by the various city agencies when.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Stat 13, Tue 5/29/ Drawing the reg. line. 2. Making predictions. 3. Interpreting b and r. 4. RMS residual. 5. r Residual plots. Final exam.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Lecture 10: Correlation and Regression Model.
Examining Relationships in Quantitative Research
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Correlation... beware. Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Regression Must have Interval or Ratio scaled measures
Basic Estimation Techniques
Correlation A Lecture for the Intro Stat Course
Modeling: Variable Selection
Regression Analysis: How to DO It
Basic Estimation Techniques
Chapter 3: Describing Relationships
Correlation ... beware.
Chapter 3: Describing Relationships
Presentation transcript:

Correlation... beware

Definition Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) The correlation between two random variables is a dimensionless number between 1 and -1.

Interpretation Correlation measures the strength of the linear relationship between two variables. Strength – not the slope Linear – misses nonlinearities completely Two – shows only “shadows” of multidimensional relationships

A correlation of +1 would arise only if all of the points lined up perfectly. Stretching the diagram horizontally or vertically would change the perceived slope, but not the correlation.

Correlation measures the “tightness” of the clustering about a single line. A positive correlation signals that large values of one variable are typically associated with large values of the other.

A negative correlation signals that large values of one variable are typically associated with small values of the other.

Independent random variables have a correlation of 0.

But a correlation of 0 most certainly does not imply independence. Indeed, correlations can completely miss nonlinear relationships.

Correlations Show (only) Two-Dimensional Shadows In the motorpool case, the correlations between Age and Cost, and between Make and Cost, show precisely what the manager’s two-dimensional tables showed: There’s little linkage directly between Age and Cost. Fords had higher average costs than did Hondas. But each of these facts is due to the confounding effect of Mileage! The pure effect of each variable on its own is only revealed in the most- complete model. CostsMileageAgeMake Costs Mileage Age Make

Potential for Misuse (received via from a former student, all employer references removed) “One of the pieces of the research is to identify key attributes that drive customers to choose a vendor for buying office products. “The market research guy that we have hired (he is an MBA/PhD from Wharton) says the following: “‘I can determine the relative importance of various attributes that drive overall satisfaction by running a correlation of each one of them against overall satisfaction score and then ranking them based on the (correlation) coefficient scores.’ “I am not really certain if we can do that. I would tend to think we should run a regression to get relative weightage.”

Correlations with Satisfaction leadtime ol-tracking cost0.097 Customer Satisfaction Consider overall customer satisfaction (on a 100-point scale) with a Web-based provider of customized software as the order leadtime (in days), product acquisition cost, and availability of online order-tracking (0 = not available, 1 = available) vary. Here are the correlations:  Customers forced to wait are unhappy.  Those without access to online order tracking are more satisfied.  Those who pay more are somewhat happier.  ?????

Regression: satisfactionconstantleadtimecostol-track coefficient std error of coef t-ratio significance0.0000% % beta-weight standard error of regression coefficient of determination75.03% adjusted coef of determination73.70% The Full Regression Customers dislike high cost, and like online order tracking. Why does customer satisfaction vary? Primarily because leadtimes vary; secondarily, because cost varies.

Reconciliation Customers can pay extra for expedited service (shorter leadtime at moderate extra cost), or for express service (shortest leadtime at highest cost) – Those who chose to save money and wait longer ended up (slightly) regretting their choice. Most customers who chose rapid service weren’t given access to order tracking. – They didn’t need it, and were still happy with their fast deliveries. satisfactionleadtimecostol-tracking satisfaction leadtime cost ol-tracking

Finally … The correlations between the explanatory variables can help flesh out the “story.” In a “simple” (i.e., one explanatory variable) regression: – The (meaningless) beta-weight is the correlation between the two variables. – The square of the correlation is the unadjusted coefficient of determination (r- squared). If you give me a correlation, I’ll interpret it by squaring it and looking at it as a coefficient of determination.

A Pharmaceutical Ad Diagnostic scores from sample of patients receiving psychiatric care So, if your patients have anxiety problems, consider prescribing our antidepressant!

Evaluation At most 49% of the variability in patients’ anxiety levels can potentially be explained by variability in depression levels. – “potentially” = might actually be explained by something else which covaries with both. The regression provides no evidence that changing a patient’s depression level will cause a change in their anxiety level.

Association vs. Causality Polio and Ice Cream Regression (and correlation) deal only with association – Example: Greater values for annual mileage are typically associated with higher annual maintenance costs. – No matter how “good” the regression statistics look, they will not make the case that greater mileage causes greater costs. – If you believe that driving more during the year causes higher costs, then it’s fine to use regression to estimate the size of the causal effect. Evidence supporting causality comes only from controlled experimentation. – This is why macroeconomists continue to argue about which aspects of public policy are the key drivers of economic growth. – It’s also why the cigarette companies won all the lawsuits filed against them for several decades.