Download presentation
Presentation is loading. Please wait.
Published byJohn Dixon Modified over 9 years ago
1
Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time
2
Last Time: Least Squares Regression (Simple Linear Regression) Correlation
3
In Least-Squares Regression: Computational Formula
4
Can we do this? Totals:
5
Calculating the Least Squares Regression Line contd.
6
10 10.9 Slope is 1.09 Intercept is -9 You can’t see it in this graph TRIAL = 1.09 PRACTICE - 9 Regression Equation
7
A view from further away….
8
Look at the residuals: We want a shot-gun blast shape, i.e., a random blob
9
Look at Residuals & Line Fit Residual Plot Line Fit Plot Problem: Relationship is not linear
10
Look at Residuals & Line Fit Residual Plot Problem: Predictions are very precise for small predicted values, but very unprecise for large predicted values. (Not good)
11
123456789101112 Problem: Lurking (third) variables (?) Here: Seasonal Trend? Look at Residuals Residual Plot
12
Correlation How strong is the linear relationship between two variables X and Y? Slope in regression of standardized variables This slope tells me How much a given change (in standardized units) of X translates into a change (in standardized units) of Y
13
Correlation How strong is the linear relationship between two variables X and Y? Correlation Coefficient Computational Formula:
14
Properties of Correlation Symmetric Measure (You can exchange X and Y and get the same value) -1 ≤ r ≤ 1 -1 is “perfect” negative correlation 1 is “perfect” positive correlation Not dependent on linear transformations of X and Y Measures linear relationship only
15
Let’s try it out on our X = PRACTICE, Y = TRIAL Data Set Check this calculation at home!
16
Today Finish Theory on Regression Pathologies and Traps in Linear Regression and Correlation Relationships between Categorical Variables
17
Regression on Standardized Variables
18
?
19
What is the variance of ?
20
Variance of predicted Y’s Variance of observed Y’s Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained
21
Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained Note: If you exchange X and Y in the regression, you find the same r and r squared
22
Correlation only checks magnitude of Linear Relationships! It can happen that r=0, even though X and Y are highly related to each other! Need to look at Scatter Plot and Residual Plot to make sure that you don’t miss an obvious relationship overlooked by linear regression!
23
How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a = -45 b = 16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
24
How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
25
How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 r = 0 Correlation is Zero: No LINEAR Relationship Is there “no relationship” between X and Y? There is an extremely strong (nonlinear) relationship here!
26
How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a =.54 b =.16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!
27
Correlation is not Causation! Correlation between the size of your big toe and your performance on reading tasks is highly positive! ?? Lurking Third Variable: AGE
28
Correlation is not Causation! experimentation Only experimentation allows us to attribute causation to the relationship between independent and dependent variables.
29
Ecological Correlation: Correlations between averages are higher than correlations between individuals X Y X Group averages Y Group averages
30
Problem of Restricted Range GRE scores Success in Graduate School Strong Linear Relationship No Linear Relationship
31
Extrapolations are Dangerous Year Number of Passengers
32
Regression toward the Mean The term “Regression” is associated with Sir Francis Galton (1822 – 1911) Picture taken from http://www.gene.ucl.ac.uk/http://www.gene.ucl.ac.uk/ Galton (1885) “Regression towards Mediocrity In Hereditary Stature” Journal of the Anthropological Institute
33
Regression toward the Mean Suppose:
34
Regression toward Mediocrity?? Predictions are closer to zero (the mean) then the observations!!
35
r=.60 2.0 1.2 2.0 1.2
36
r=.60 2.0 1.2 Among families where the father is approximately 2 standard deviations above the mean, the average son is only about 1.2 standard deviations above the mean.
37
Regression toward Mediocrity?? Do the sons just become more similar to each other than their fathers were?
38
Regression toward Mediocrity?? Variability of the Z scores is the same! No slide into mediocrity!!
39
Regression toward the mean When you have a lucky and exceptionally good performance in an exam, you expect to do worse next time, because there is no reason to believe that you will be so exceptionally lucky again. When you have a mental block and exceptionally bad performance in an exam, you expect to do better next time, because there is no reason to believe that you will be so exceptionally unlucky again. This does not mean that you are becoming more and more average as time progresses. It means that your average performance, as a reasonable predictor for future performance, will lead to such a pattern of relationships between observed and predicted performance
40
Regression toward the mean Your room mate makes a huge mess in your room. You complain. The next few days are cleaner. Your room mate has cleaned up the room. You praise your room mate. The next few days the room gets dirtier. Does this mean that punishment leads to better performance and reward leads to worse performance? No….
41
Regression toward the mean Your room mate makes a huge mess in your room. You do nothing. The next few days are cleaner. Your room mate has cleaned up the room. You do nothing. The next few days the room gets dirtier. Your room mate simply makes messes, cleans them, makes messes, cleans them … Your best guess for the future is an “average” level of messiness
42
Implications for Research It is very risky to study anything based on selection of extreme groups Test Retest Extremes become less extreme May look like a treatment effect!
43
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left21225 Right437 237 50 255 32 287 Marginal Distributions
44
Theory “Mothers tend to hold their babies with the non-dominant hand, so that the dominant hand is available to do stuff.”
45
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right.826 (82.6%).174 (17.4%).889 (88.9%).111 (11.1%) Marginal Proportions (Percentages) Vast majority of babies held left Vast majority of mothers right-handed
46
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left.894.105 Right.860.140 1 (100%) Conditional proportions, given side on which the baby is held Absolute size not taken into account
47
Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left.831.781 Right.169.219 1 (100%) Conditional proportions, given dexterity of mother Absolute size not taken into account
48
Relationships between Categorical Variables 1 (100%) For any given dexterity of the mother, there is an overwhelming tendency to hold the baby on the left hand side. Absolute size not taken into account Baby Held Right- Handed Mother Left- Handed Mother Left.831.781 Right.169.219
49
Segmented Bargraphs
51
Conclusion?? Lurking Third Variable? Heart beat helps baby calm down
52
Simpson’s Paradox AdmitDeny Male480120 Female18020 AdmitDeny Male1090 Female100200 Business School Law School
53
Simpson’s Paradox AdmitDeny Male490210 Female280220 AdmitDeny Male.7030 Female.56.44 Overall: Overall conditional proportions per gender 700 500 Men Priviliged!! Gender Discr.!!
54
Simpson’s Paradox AdmitDeny Male480120 Female18020 AdmitDeny Male1090 Female100200 AdmitDeny Male.80.20 Female.90.10 AdmitDeny Male.10.90 Female.33.67 600 200 100 300 Women Priviliged!?! Women Priviliged!?!
55
Simpson’s Paradox AdmitDeny Male480120 Female18020 AdmitDeny Male1090 Female100200 AdmitDeny Male.80.20 Female.90.10 AdmitDeny Male.10.90 Female.33.67 600 200 100 300 However: Higher admission rate for male dominated discipline
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.