Covariance and Correlation Assaf Oron, May 2008

Covariance and Correlation Assaf Oron, May 2008
Stat 391 – Lecture 13a Covariance and Correlation Assaf Oron, May 2008

Relationships between Two r.v.’s
We have seen some relationships between events: Independence, inclusion, exclusion (=“disjoint”) Translated to relationships between r.v.’s, we get: Independence - or determinism (=“degeneracy”) But these are special, extreme cases. We need a quantitative measure for the strength of relationship between r.v.’s This is where covariance and correlation come in (we will start mathematically, and concentrate on the continuous case)

Covariance Assume X, Y r.v.’s with expectations μX, μY
Recall that Var[X] is defined as The variance measures how X “co-varies with itself” What if we wanted to see how it co-varies with someone else?

Covariance (2) The covariance Cov[ X, Y ] is defined as
With the variance, the integrand was guaranteed to be non-negative, always With the covariance, it is negative whenever one r.v. ventures above its mean while the other goes below Hands-on: prove the shortcut formula

Covariance (3) Hands-on, continued:
Use the shortcut formula to show that (hint: write out E[XY] as an integral, then use independence for the densities) Now, show that (a is some constant)

Covariance and Correlation
It can be shown that Cov[ X, Y ] is bounded: Which means that it can be normalized to yield the population correlation coefficient: Determinism: ρ=±1, Independence: ρ=0 We got what we wanted: a quantitative measure of dependence

Covariance and Correlation (2)
Covariance is used more often in modeling and theory Correlation is more common in data analysis It is estimated from the data in a straightforward, MME manner:

Properties of Correlation
Covariance and correlation are symmetric between X and Y They are not linear: The covariance is a second moment of sorts Correlation is not even a moment However, they quantify the mean linear association between X and Y If the association is nonlinear, they may miss it They may also fare poorly if the data are clustered, or due to outliers (example:)

Correlation: Resolving Issues
If you suspect a nonlinear association, g(x)~h(y), then look at Cor(g(x),h(y)) Visually, we transform the data Outliers need to be addressed (here’s a repeat of how): Determine if they are errors/ ”a foreign element” / “part of the story” Then, accordingly, remove/”ignore”/include (can also use more robust methods)

Cov/Cor with Several r.v.’s
We then have a covariance matrix Always symmetric On the diagonal: each r.v.’s variance Similarly, we have a correlation matrix, symmetric, with 1’s on the diagonal Independent r.v.’s will have zeroes on the off-diagonal

Regression, Part A or “Black-Box Statistics 101” Assaf Oron, May 2008
Stat 391 – Lecture 13b Regression, Part A or “Black-Box Statistics 101” Assaf Oron, May 2008

Overview and Disclaimer
We are entering different waters Regression is the gateway to pattern recognition (answers the question “how to draw a line through the data?”) (A subject for countless Ph.D. theses) We cannot expect to cover it at the same depth we did probability, MLE’s, etc. As said earlier, this requires a separate course What I’ll do here is help you become an informed and responsible user This, too, is a tall task, because using regression is deceptively easy

Overview and Disclaimer (2)
To use a familiar analogy: up until now in the course we have been walking, running and riding bicycles Doing regression is more like driving a car It should be licensed… but, unfortunately, it is not So think of the next 3 hours as a crash drivers-permit course Here we go, starting from the simplest case

Simple Linear Regression
We have n data points ((X,Y) pairs), and limit ourselves to a straight line (Still a subject for countless Ph.D. theses) Unlike correlation, here we typically abandon symmetry: We use X to explain Y and not vice versa (one reason: this makes the math simpler) (another reason: often, this is the practical question we want to answer)

Simple Linear Regression (2)
If you just ask for “regression”, you will get the standard, least-squares solution: A linear formula minimizing the sum: (what’s with the betas and the hats?) (We’ll see next lecture) The line minimize the sum of the squares of these

So a regression’s main product is just two numbers, but if you run regression on a standard statistical package, you’ll get something like this: Call: lm(formula = y1 ~ x1, data = anscombe) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * x ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value: The two numbers …all the rest are various quality measures…

The numbers outlined below have to do with hypothesis tests, and we’ll discuss them next lecture: Call: lm(formula = y1 ~ x1, data = anscombe) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * x ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value:

Let’s focus on R-squared: Call: lm(formula = y1 ~ x1, data = anscombe) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * x ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value: (“Multiple R-squared” is usually known as just “R-squared”)

R-squared Recall: our solution minimizes
(under the linearity constraint) The value of this sum given our solution, is known as Sum of Square Errors (SSerr.) or the residual sum of squares Furthermore, it can be shown that SSy: Overall variability of y (n-1 times the sample variance SSreg.: Amount explained by the regression model SSerr.: Amount left unexplained

R-squared (2) Let’s illustrate this:
R-squared is the sum of all the squares of these distances, divided by the sum of all the original square distances from the mean Let’s illustrate this:

R-squared, Regression, Correlation
R-squared is defined as with r being the sample correlation coefficient defined last hour (“Adjusted R-squared” penalizes for the number of parameters used; more detail perhaps next time) The relationship between correlation and regression goes further:

Back to the Printout All that’s left are summary stats related to residuals: Call: lm(formula = y1 ~ x1, data = anscombe) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) * x ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value: Seems like we have everything about residuals except their mean (which is always 0); but this is a classic example where “one picture is worth a thousand numbers”. Let’s do the Anscombe examples!

Diagnostics and Residual Analysis
Regression should always begin with vigorous plotting/tabulation of all variables, alone and vs. each other AND be followed up by equally-vigorous visual inspection of residuals Failing to do both, is just like driving by looking only at the dashboard Residuals should: Be symmetrically distributed with thin tails and no outliers Show no pattern when plotted vs. the responses, the fitted values, any explanatory variable or observation order

Covariance and Correlation Assaf Oron, May 2008

Similar presentations

Presentation on theme: "Covariance and Correlation Assaf Oron, May 2008"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Covariance and Correlation Assaf Oron, May 2008

Similar presentations

Presentation on theme: "Covariance and Correlation Assaf Oron, May 2008"— Presentation transcript:

Similar presentations

About project

Feedback