Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time
Last Time: Least Squares Regression (Simple Linear Regression) Correlation
In Least-Squares Regression: Computational Formula
Can we do this? Totals:
Calculating the Least Squares Regression Line contd.
Slope is 1.09 Intercept is -9 You can’t see it in this graph TRIAL = 1.09 PRACTICE - 9 Regression Equation
A view from further away….
Look at the residuals: We want a shot-gun blast shape, i.e., a random blob
Look at Residuals & Line Fit Residual Plot Line Fit Plot Problem: Relationship is not linear
Look at Residuals & Line Fit Residual Plot Problem: Predictions are very precise for small predicted values, but very unprecise for large predicted values. (Not good)
Problem: Lurking (third) variables (?) Here: Seasonal Trend? Look at Residuals Residual Plot
Correlation How strong is the linear relationship between two variables X and Y? Slope in regression of standardized variables This slope tells me How much a given change (in standardized units) of X translates into a change (in standardized units) of Y
Correlation How strong is the linear relationship between two variables X and Y? Correlation Coefficient Computational Formula:
Properties of Correlation Symmetric Measure (You can exchange X and Y and get the same value) -1 ≤ r ≤ 1 -1 is “perfect” negative correlation 1 is “perfect” positive correlation Not dependent on linear transformations of X and Y Measures linear relationship only
Let’s try it out on our X = PRACTICE, Y = TRIAL Data Set Check this calculation at home!
Today Finish Theory on Regression Pathologies and Traps in Linear Regression and Correlation Relationships between Categorical Variables
Regression on Standardized Variables
?
What is the variance of ?
Variance of predicted Y’s Variance of observed Y’s Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained
Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained Note: If you exchange X and Y in the regression, you find the same r and r squared
Correlation only checks magnitude of Linear Relationships! It can happen that r=0, even though X and Y are highly related to each other! Need to look at Scatter Plot and Residual Plot to make sure that you don’t miss an obvious relationship overlooked by linear regression!
How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a = -45 b = 16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 r = 0 Correlation is Zero: No LINEAR Relationship Is there “no relationship” between X and Y? There is an extremely strong (nonlinear) relationship here!
How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a =.54 b =.16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
Correlation is not Causation! Correlation between the size of your big toe and your performance on reading tasks is highly positive! ?? Lurking Third Variable: AGE
Correlation is not Causation! experimentation Only experimentation allows us to attribute causation to the relationship between independent and dependent variables.
Ecological Correlation: Correlations between averages are higher than correlations between individuals X Y X Group averages Y Group averages
Problem of Restricted Range GRE scores Success in Graduate School Strong Linear Relationship No Linear Relationship
Extrapolations are Dangerous Year Number of Passengers
Regression toward the Mean The term “Regression” is associated with Sir Francis Galton (1822 – 1911) Picture taken from Galton (1885) “Regression towards Mediocrity In Hereditary Stature” Journal of the Anthropological Institute
Regression toward the Mean Suppose:
Regression toward Mediocrity?? Predictions are closer to zero (the mean) then the observations!!
r=
r= Among families where the father is approximately 2 standard deviations above the mean, the average son is only about 1.2 standard deviations above the mean.
Regression toward Mediocrity?? Do the sons just become more similar to each other than their fathers were?
Regression toward Mediocrity?? Variability of the Z scores is the same! No slide into mediocrity!!
Regression toward the mean When you have a lucky and exceptionally good performance in an exam, you expect to do worse next time, because there is no reason to believe that you will be so exceptionally lucky again. When you have a mental block and exceptionally bad performance in an exam, you expect to do better next time, because there is no reason to believe that you will be so exceptionally unlucky again. This does not mean that you are becoming more and more average as time progresses. It means that your average performance, as a reasonable predictor for future performance, will lead to such a pattern of relationships between observed and predicted performance
Regression toward the mean Your room mate makes a huge mess in your room. You complain. The next few days are cleaner. Your room mate has cleaned up the room. You praise your room mate. The next few days the room gets dirtier. Does this mean that punishment leads to better performance and reward leads to worse performance? No….
Regression toward the mean Your room mate makes a huge mess in your room. You do nothing. The next few days are cleaner. Your room mate has cleaned up the room. You do nothing. The next few days the room gets dirtier. Your room mate simply makes messes, cleans them, makes messes, cleans them … Your best guess for the future is an “average” level of messiness
Implications for Research It is very risky to study anything based on selection of extreme groups Test Retest Extremes become less extreme May look like a treatment effect!
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left21225 Right Marginal Distributions
Theory “Mothers tend to hold their babies with the non-dominant hand, so that the dominant hand is available to do stuff.”
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right.826 (82.6%).174 (17.4%).889 (88.9%).111 (11.1%) Marginal Proportions (Percentages) Vast majority of babies held left Vast majority of mothers right-handed
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right (100%) Conditional proportions, given side on which the baby is held Absolute size not taken into account
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right (100%) Conditional proportions, given dexterity of mother Absolute size not taken into account
Relationships between Categorical Variables 1 (100%) For any given dexterity of the mother, there is an overwhelming tendency to hold the baby on the left hand side. Absolute size not taken into account Baby Held Right- Handed Mother Left- Handed Mother Left Right
Segmented Bargraphs
Conclusion?? Lurking Third Variable? Heart beat helps baby calm down
Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female Business School Law School
Simpson’s Paradox AdmitDeny Male Female AdmitDeny Male.7030 Female Overall: Overall conditional proportions per gender Men Priviliged!! Gender Discr.!!
Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female AdmitDeny Male Female AdmitDeny Male Female Women Priviliged!?! Women Priviliged!?!
Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female AdmitDeny Male Female AdmitDeny Male Female However: Higher admission rate for male dominated discipline