Part 6: Correlation 6-1/49 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
© 2004 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
Advertisements

Chapter 5 Discrete Random Variables and Probability Distributions
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
© 2003 Prentice-Hall, Inc.Chap 5-1 Basic Business Statistics (9 th Edition) Chapter 5 Some Important Discrete Probability Distributions.
© 2002 Prentice-Hall, Inc.Chap 4-1 Statistics for Managers Using Microsoft Excel (3 rd Edition) Chapter 4 Basic Probability and Discrete Probability Distributions.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 MF-852 Financial Econometrics Lecture 3 Review of Probability Roy J. Epstein Fall 2003.
Chapter 2: Probability Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an.
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter.
Chapter 4 Discrete Random Variables and Probability Distributions
Discrete Random Variables and Probability Distributions
Chapter 6 Continuous Random Variables and Probability Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Statistics.
1 Pertemuan 04 Peubah Acak dan Sebaran Peluang Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Chapter 5 Continuous Random Variables and Probability Distributions
Week 51 Theorem For g: R  R If X is a discrete random variable then If X is a continuous random variable Proof: We proof it for the discrete case. Let.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Covariance And portfolio variance Review question  Define the internal rate of return.
Review of Probability and Statistics
The joint probability distribution function of X and Y is denoted by f XY (x,y). The marginal probability distribution function of X, f X (x) is obtained.
CEEN-2131 Business Statistics: A Decision-Making Approach CEEN-2130/31/32 Using Probability and Probability Distributions.
1A.1 Copyright© 1977 John Wiley & Son, Inc. All rights reserved Review Some Basic Statistical Concepts Appendix 1A.
1 Random Variables and Discrete probability Distributions SESSION 2.
Class 7 Portfolio Analysis. Risk and Uncertainty n Almost all business decisions are made in the face of risk and uncertainty. n So far we have side-stepped.
RISK AND RETURN Rajan B. Paudel. Learning Outcomes By studying this unit, you will be able to: – Understand various concepts of return and risk – Measure.
Probability and Probability Distributions
Measuring Returns Converting Dollar Returns to Percentage Returns
Risk, Return, and Security Market Line
1 MBF 2263 Portfolio Management & Security Analysis Lecture 2 Risk and Return.
Lecture Note 5 Discrete Random Variables and Probability Distributions ©
Portfolio Management-Learning Objective
Calculating Expected Return
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.
Chapter 5 Discrete Random Variables and Probability Distributions ©
Investment Analysis and Portfolio Management Chapter 7.
 Lecture #9.  The course assumes little prior applied knowledge in the area of finance.  References  Kristina (2010) ‘Investment Analysis and Portfolio.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Risk and Capital Budgeting Chapter 13. Chapter 13 - Outline What is Risk? Risk Related Measurements Coefficient of Correlation The Efficient Frontier.
Two Random Variables W&W, Chapter 5. Joint Distributions So far we have been talking about the probability of a single variable, or a variable conditional.
Review of Probability Concepts ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes SECOND.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
Copyright © 2011 Pearson Education, Inc. Association between Random Variables Chapter 10.
1 Risk Learning Module. 2 Measures of Risk Risk reflects the chance that the actual return on an investment may be different than the expected return.
Review of Probability Concepts ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Chapters 7 and 10: Expected Values of Two or More Random Variables
PROBABILITY CONCEPTS Key concepts are described Probability rules are introduced Expected values, standard deviation, covariance and correlation for individual.
Chapter 16 Random Variables
Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.
Chap 4-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 4 Using Probability and Probability.
1 Lecture 14: Jointly Distributed Random Variables Devore, Ch. 5.1 and 5.2.
CHAPTER SEVEN Risk, Return, and Portfolio Theory J.D. Han.
Finance 300 Financial Markets Lecture 3 Fall, 2001© Professor J. Petry
Statistics for Business & Economics
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Probability and Statistical Inference (9th Edition) Chapter 4 Bivariate Distributions November 4, 2015.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
Chapter 16 Random Variables math2200. Life insurance A life insurance policy: –Pay $10,000 when the client dies –Pay $5,000 if the client is permanently.
Probability Review for Financial Engineers
Keller: Stats for Mgmt & Econ, 7th Ed
Review of Probability Concepts
Statistics and Data Analysis
How accurately can you (1) predict Y from X, and (2) predict X from Y?
Independence of random variables
AP Statistics Chapter 16 Notes.
Discrete Random Variables and Probability Distributions
Mathematical Expectation
Presentation transcript:

Part 6: Correlation 6-1/49 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 6: Correlation 6-2/49 Statistics and Data Analysis Part 6 – Correlation

Part 6: Correlation 6-3/49 Correlated Variables

Part 6: Correlation 6-4/49 Correlated Variables

Part 6: Correlation 6-5/49 Correlation Agenda  Two ‘Related’ Random Variables Dependence and Independence Conditional Distributions  We’re interested in correlation We have to look at covariance first Regression is correlation  Correlated Asset Returns

Part 6: Correlation 6-6/49 Probabilities for Two Events, A,B  Marginal Probability = The probability of an event not considering any other events. P(A)  Joint Probability = The probability that two events happen at the same time. P(A,B)  Conditional Probability = The probability that one event happens given that another event has happened. P(A|B)

Part 6: Correlation 6-7/49 Probabilities: Inherited Color Blindness*  Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it.  Experiment: pick an individual at random from the population. CB = has inherited color blindness MALE = gender, Not-Male = FEMALE  Marginal: P(CB) = 2.75% P(MALE)= 50.0%  Joint: P(CB and MALE) = 2.5% P(CB and FEMALE) = 0.25%  Conditional:P(CB|MALE) = 5.0% (1 in 20 men) P(CB|FEMALE) = 0.5% (1 in 200 women) * There are several types of color blindness and large variation in the incidence across different demographic groups. These are broad averages that are roughly in the neighborhood of the true incidence for particular groups.

Part 6: Correlation 6-8/49 Dependent Events Color Blind GenderNoYesTotal Male Female Total P(Color blind, Male) =.0250 P(Male) =.5000 P(Color blind) =.0275 P(Color blind) x P(Male) =.0275 x.500 = is not equal to.025 Gender and color blindness are not independent. Random variables X and Y are dependent if P XY (X,Y) ≠ P X (X)P Y (Y).

Part 6: Correlation 6-9/49 Independent Events Ace HeartYes=1No=0Total Yes=11/5212/5213/52 =1/4 No=03/5236/5239/52 Total4/52 =1/13 48/5252/52 P(Ace,Heart) = 1/52 P(Ace) = 1/13 P(Heart) = 1/4 P(Ace) x P(Heart) = (1/13)(1/4) = 1/52. Ace and Heart are independent Random variables X and Y are independent if P XY (X,Y) = P X (X)P Y (Y). “The joint probability equals the product of the marginal probabilities.”

Part 6: Correlation 6-10/49 Dependent Random Variables  Random variables are dependent if the occurrence of one affects the probability distribution of the other.  If P(Y|X) changes when X changes, then the variables are dependent.  If P(Y|X) does not change when X changes, then the variables are independent.

Part 6: Correlation 6-11/49 Conditional Probability Prob(A | B) = P(A,B) / P(B) Prob(Color Blind | Male) = Prob(Color Blind,Male) P(Male) =.025 /.50 =.05 Color Blind GenderNoYesTotal Male Female Total What is P(Male | Color Blind)? A Theorem: For two random variables, P(X,Y) = P(X|Y) P(Y) P(Color blind, Male) = P(Color blind|Male)P(Male) =.05 x.5 =.025

Part 6: Correlation 6-12/49 Conditional Distributions  Marginal Distribution of Color Blindness Color Blind Not Color Blind  Distribution Among Men (Conditioned on Male) Color Blind|Male Not Color Blind|Male  Distribution Among Women (Conditioned on Female) Color Blind|Female Not Color Blind|Female The distributions for the two genders are different. The variables are dependent.

Part 6: Correlation 6-13/49 Independent Random Variables Ace HeartYes=1No=0Total Yes=11/5212/5213/52 No=03/5236/5239/52 Total4/5248/5252/52 P(Ace|Heart) = 1/13 P(Ace|Not-Heart) = 3/39 = 1/13 P(Ace) = 4/52 = 1/13 P(Ace) does not depend on whether the card is a heart or not. P(Heart|Ace) = 1/4 P(Heart|Not-Ace) = 12/48 = 1/4 P(Heart) = 13/52 = 1/4 P(Heart) does not depend on whether the card is an ace or not. One card is drawn randomly from a deck of 52 cards A Theorem: For two independent random variables, P(X,Y) = P(X) P(Y) P(Ace, Heart) = P(Ace)P(Heart) = 1/13 x 1/4 = 1/52

Part 6: Correlation 6-14/49 Covariation and Expected Value  Pick 10,325 people at random from the population. Predict how many will be color blind: 10,325 x.0275 = 284  Pick 10,325 MEN at random from the population. Predict how many will be color blind: 10,325 x.05 = 516  Pick 10,325 WOMEN at random from the population. Predict how many will be color blind: 10,325 x.005 = 52  The expected number of color blind people, given gender, depends on gender.  Color Blindness covaries with Gender

Part 6: Correlation 6-15/49 Positive Covariation: The distribution of one variable depends on another variable. Distribution of fuel bills changes (moves upward) as the number of rooms changes (increases). The per capita number of cars varies (positively) with per capita income. The relationship varies by country as well.

Part 6: Correlation 6-16/49 Application – Legal Case Mix: Two kinds of cases show up each month, real estate (R) and financial (F) (sometimes together, usually separately). Real Estate Financial0123 P(F) P(R) Marginal Distribution for Real Estate Cases Marginal Distribution for Financial Cases Joint Distribution R = Real estate cases F = Financial cases Two Related Random Variables* * Adapted from example 4.16, p. 159 in your text

Part 6: Correlation 6-17/49 Legal Services Case Mix: Joint Probabilities Joint Discrete Distribution R = Real estate cases F = Financial cases Real Estate (R) Financial (F)0123P(F) P(R) Joint Distribution Prob(F=f and R=r) Marginal probabilities are obtained by summing across or down.

Part 6: Correlation 6-18/49 Legal Services Case Mix: Conditional Probabilities Real Estate (R) Financial (F) 0123P(f) 0.02/.20 =.10.05/.20 =.25.05/.20 =.25.08/.20 = /.33 =.10.05/.33 =.15.08/.33 =.24.17/.33 = /.47 =.09.06/.47 =.13.09/.47 =.19.28/.47 = Conditional Distributions Conditional probabilities are Prob(R=r and F=f)/P(F=f) Probabilities for R given the value of F Read across the rows. Joint Discrete Distribution R = Real estate cases F = Financial cases

Part 6: Correlation 6-19/49 Conditional Distributions  The probability distribution of Real estate cases (R) given Financial cases (F) varies with the number of Financial cases.  The probability that (R=3)|F goes up as F increases from 0 to 2.  This means that the variables are dependent.

Part 6: Correlation 6-20/49 Covariation in Legal Services Real Estate Cases Financial= Financial= Financial= These are the conditional distributions P(R|F) How many real estate cases should the office expect if it knows (or predicts) the number of financial cases? E[R if F=0] = 0(.10) + 1(.25) + 2(.25) + 3(.40) = 1.95 (less than 2) E[R if F=1] = 0(.10) + 1(.15) + 2(.24) + 3(.51) = 2.16 (more than 2) E[R if F=2] = 0(.09) + 1(.13) + 2(.19) + 3(.59) = 2.28 (more than 2) This is how R and F covary.

Part 6: Correlation 6-21/49 Covariation and Regression Financial Cases Expected Number of Real Estate Cases Given Number of Financial Cases This is the “regression of R on F”

Part 6: Correlation 6-22/49 (Linear) Regression of Bills on Rooms

Part 6: Correlation 6-23/49 Measuring How Variables Move Together: Covariance Covariance can be positive or negative The measure will be positive if it is likely that Y is above its mean when X is above its mean. It is usually denoted σ XY.

Part 6: Correlation 6-24/49 Legal Services Case Mix Covariance Real Estate Cases Financial Cases 0123P(F) P(R) The two means are μ R = 0(.09)+1(.16)+2(.22)+3(.53) = 2.19 μ F = 0(.20)+1(.33)+2(.47) = 1.27 Compute the Covariance Σ F Σ R P(F,R)(F-1.27)(R-2.19)= (0-1.27)(0-2.19).02= (0-1.27)(1-2.19).05= (0-1.27)(2-2.19).05= (0-1.27)(3-2.19).08= (1-1.27)(0-2.19).03= (1-1.27)(1-2.19).05= (1-1.27)(2-2.19).08= (1-1.27)(3-2.19).17= (2-1.27)(0-2.19).04= (2-1.27)(1-2.19).06= (2-1.27)(2-2.19).09= (2-1.27)(3-2.19).28= Sum =

Part 6: Correlation 6-25/49 A Shortcut for Covariance

Part 6: Correlation 6-26/49 Computing the Covariance Using the Shortcut Compute the Covariance Σ F Σ R [(F-1.27)(R-2.19) * P(F,R)] = (0-1.27)(0-2.19).02= (0-1.27)(1-2.19).05= (0-1.27)(2-2.19).05= (0-1.27)(3-2.19).08= (1-1.27)(0-2.19).03= (1-1.27)(1-2.19).05= (1-1.27)(2-2.19).08= (1-1.27)(3-2.19).17= (2-1.27)(0-2.19).04= (2-1.27)(1-2.19).06= (2-1.27)(2-2.19).09= (2-1.27)(3-2.19).28= Sum = Compute the Covariance [Σ F Σ R FR * P(F,R)] – [μ F μ R ] (0)(0).02= 0 (0)(1).05= 0 (0)(2).05= 0 (0)(3).08= 0 (1)(0).03= 0 (1)(1).05=.05 (1)(2).08=.16 (1)(3).17=.51 (2)(0).04= 0 (2)(1).06=.12 (2)(2).09=.36 (2)(3).28= 1.68 Sum = – (1.27)(2.19) =

Part 6: Correlation 6-27/49 Independent Random Variables Have Zero Covariance A=Ace H=HeartYes=1No=0Total Yes=11/5212/5213/52 No=03/5236/5239/52 Total4/5248/5252/52 E[H] = 1(13/52)+0(49/52) = 1/4 E[A] = 1(4/52)+0(48/52) = 1/13 Covariance = Σ H Σ A P(H,A) (H –  H )(A –  A ) 1/52 (1 – 1/4)(1 – 1/13) = +36/52 2 3/52 (0 – 1/4)(1 – 1/13) = – 36/ /52 (1 – 1/4)(0 – 1/13) = – 36/ /52 (0 – 1/4)(0 – 1/13) = +36/52 2 SUM = 0 !! One card drawn randomly from a deck of 52 cards

Part 6: Correlation 6-28/49 Covariance and Units of Measurement  Covariance takes the units of (units of X) times (units of Y)  Consider Cov($Price of X,$Price of Y). Now, measure both prices in GBP, roughly $1.60 per £. The prices are divided by 1.60, and the covariance is divided by  This is an unattractive result.

Part 6: Correlation 6-29/49 Covariance and Scaling Real Estate Lawyers Financial Lawyers 0 (was 0) 2 (was 1) 4 (was 2) 6 (was 3) P(F) 0 (was 0) (was 1) (was 2) P(R) μ NR = 0(.09)+1(.16)+2(.22)+3(.53 ) = 4.38 μ NF = 0(.20)+1(.33)+2(.47) = 3.81 We computed the covariance Cov(R,F) = What does the covariance mean? Suppose each real estate case requires 2 lawyers and each financial case requires 3 lawyers. Then the number of lawyers is N R = 2R and N F = 3F. The covariance of N R and N F will be 3(2)(.0987) = But, the “relationship” is the same. We just changed the units of measurement.

Part 6: Correlation 6-30/49 Correlation is Units Free

Part 6: Correlation 6-31/49 Correlation Real Estate Financial0123P(F) P(R) μ R = 2.19 μ F = 1.27 Var(F) = 0 2 (.20)+1 2 (.33)+2 2 (.47) = Standard deviation = Var(R) = 0 2 (.09)+1 2 (.16)+2 2 (.22) +3 2 (.53) – = Standard deviation = Covariance =

Part 6: Correlation 6-32/49 Uncorrelated Variables Independence implies zero correlation. If the variables are independent, then the numerator of the correlation coefficient is zero.

Part 6: Correlation 6-33/49 Sums of Two Random Variables  Example 1: Total number of cases = F+R  Example 2: Personnel needed = 3F+2R  Find for Sums Expected Value Variance and Standard Deviation  Application from Finance: Portfolio

Part 6: Correlation 6-34/49 Math Facts 1 – Mean of a Sum  Mean of a sum. The Mean of X+Y = E[X+Y] = E[X]+E[Y]  Mean of a weighted sum Mean of aX + bY = E[aX] + E[bY] = aE[X] + bE[Y]

Part 6: Correlation 6-35/49 Mean of a Sum Real Estate Financial0123P(F) P(R) μ R = 2.19 μ F = 1.27 What is the mean (expected) number of cases each month, R+F? E[R + F] = E[R] + E[F] = = 3.46

Part 6: Correlation 6-36/49 Mean of a Weighted Sum μ R = 2.19 μ F = 1.27 Suppose each Real Estate case requires 2 lawyers and each Financial case requires 3 lawyers. Then N R = 2R and N F = 3F. If N R = 2R and N F = 3F, then the mean number of lawyers is the mean of 2R+3F. E[2R + 3F] = 2E[R] + 3E[F] = 2(2.19) + 3(1.27) = 8.19 lawyers required.

Part 6: Correlation 6-37/49 Math Facts 2 – Variance of a Sum Variance of a Sum Var[x+y] = Var[x] + Var[y] +2Cov(x,y) Variance of a sum equals the sum of the variances only if the variables are uncorrelated. Standard deviation of a sum The standard deviation of x+y is not equal to the sum of the standard deviations.

Part 6: Correlation 6-38/49 Variance of a Sum μ R = 2.19, σ R 2 = μ F = 1.27, σ F 2 = σ RF = What is the variance of the total number of cases that occur each month? This is the variance of F+R = ( (.0987)) = The standard deviation is

Part 6: Correlation 6-39/49 Math Facts 3 – Variance of a Weighted Sum Var[ax+by] = Var[ax] + Var[by] +2Cov(ax,by) = a 2 Var[x] + b 2 Var[y] + 2ab Cov(x,y). Also, Cov(x,y) is the numerator in ρ xy, so Cov(x,y) = ρ xy σ x σ y.

Part 6: Correlation 6-40/49 Variance of a Weighted Sum What is the variance of the total number of lawyers needed each month? What is the standard deviation? This is the variance of 2R+3F = 2 2 (1.0139) ( ) + 2(2)(3)(.12416)( )( )= The standard deviation is the square root, Suppose each real estate case requires 2 lawyers and each financial case requires 3 lawyers. Then N R = 2R and N F = 3F. μ R = 2.19, σ R 2 = μ F = 1.27, σ F 2 = σ RF = ,  RF =.14216

Part 6: Correlation 6-41/49 Correlated Variables: Returns on Two Stocks* * Averaged yearly return

Part 6: Correlation 6-42/49 The two returns are positively correlated.

Part 6: Correlation 6-43/49

Part 6: Correlation 6-44/49 Application - Portfolio  You have $1000 to allocate between assets A and B. The yearly returns on the two assets are random variables r A and r B.  The means of the two returns are E[r A ] = μ A and E[r B ] = μ B  The standard deviations (risks) of the returns are σ A and σ B.  The correlation of the two returns is ρ AB

Part 6: Correlation 6-45/49 Portfolio  You have $1000 to allocate to A and B.  You will allocate proportions w of your $1000 to A and (1-w) to B.

Part 6: Correlation 6-46/49 Return and Risk  Your expected return on each dollar is E[wr A + (1-w)r B ] = wμ A + (1-w)μ B  The variance your return on each dollar is Var[wr A + (1-w)r B ] = w 2 σ A 2 + (1-w) 2 σ B 2 + 2w(1-w)ρ AB σ A σ B  The standard deviation is the square root.

Part 6: Correlation 6-47/49 Risk and Return: Example  Suppose you know μ A, μ B, ρ AB, σ A, and σ B (You have watched these stocks for over 6 years.)  The mean and standard deviation are then just functions of w.  I will then compute the mean and standard deviation for different values of w.  For our Microsoft and Walmart example, μ A = , μ B, = σ A = , σ B,= , ρ AB = E[return] = w( ) + (1-w)( ) = w SD[return] = sqr[w 2 ( )+ (1-w) 2 ( ) + 2w(1-w)(.249)(.114)(.086)] = sqr[.013w (1-w) w(1-w)]

Part 6: Correlation 6-48/49 For different values of w, risk = sqr[.013w (1-w) w(1-w)] is on the horizontal axis return = w is on the vertical axis. W=1 W=0

Part 6: Correlation 6-49/49 Summary  Random Variables – Dependent and Independent  Conditional probabilities change with the values of dependent variables.  Covariation and the covariance as a measure. (The regression)  Correlation as a units free measure of covariation  Math results Mean of a weighted sum Variance of a weighted sum Application to a portfolio problem.