Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Slides:



Advertisements
Similar presentations
Correlation and regression
Advertisements

Chapter 8 Linear regression
Chapter 8 Linear regression
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 4 The Relation between Two Variables
Regression What is regression to the mean?
Chapter 3 Bivariate Data
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CHAPTER 8: LINEAR REGRESSION
Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Describing the Relation Between Two Variables
Regression and Correlation
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
LINEAR REGRESSIONS: About lines Line as a model: Understanding the slope Predicted values Residuals How to pick a line? Least squares criterion “Point.
Ch 2 and 9.1 Relationships Between 2 Variables
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Relationships Among Variables
Descriptive Methods in Regression and Correlation
Relationship of two variables
Correlation Scatter Plots Correlation Coefficients Significance Test.
Correlation and regression 1: Correlation Coefficient
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
CORRELATION & REGRESSION
Biostatistics Unit 9 – Regression and Correlation.
Inferences for Regression
CHAPTER 7: Exploring Data: Part I Review
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Chapter 10 Correlation and Regression
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Summarizing Bivariate Data
Finished Theory on Simple Linear Regression Pathologies and Traps in Linear Regression and Correlation Relationships between Categorical Variables Last.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Regression Regression relationship = trend + scatter
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Last Time:. 2/3 2/3 of all Type A respondents had measurements between 55 and 69.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Correlation  We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
The simple linear regression model and parameter estimation
Sections Review.
Regression and Correlation
Chapter 2 Looking at Data— Relationships
Chapter 8 Part 2 Linear Regression
CHAPTER 3 Describing Relationships
Chapter 2 Looking at Data— Relationships
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time

Last Time: Least Squares Regression (Simple Linear Regression) Correlation

In Least-Squares Regression: Computational Formula

Can we do this? Totals:

Calculating the Least Squares Regression Line contd.

Slope is 1.09 Intercept is -9 You can’t see it in this graph TRIAL = 1.09 PRACTICE - 9 Regression Equation

A view from further away….

Look at the residuals: We want a shot-gun blast shape, i.e., a random blob

Look at Residuals & Line Fit Residual Plot Line Fit Plot Problem: Relationship is not linear

Look at Residuals & Line Fit Residual Plot Problem: Predictions are very precise for small predicted values, but very unprecise for large predicted values. (Not good)

Problem: Lurking (third) variables (?) Here: Seasonal Trend? Look at Residuals Residual Plot

Correlation How strong is the linear relationship between two variables X and Y? Slope in regression of standardized variables This slope tells me How much a given change (in standardized units) of X translates into a change (in standardized units) of Y

Correlation How strong is the linear relationship between two variables X and Y? Correlation Coefficient Computational Formula:

Properties of Correlation Symmetric Measure (You can exchange X and Y and get the same value) -1 ≤ r ≤ 1 -1 is “perfect” negative correlation 1 is “perfect” positive correlation Not dependent on linear transformations of X and Y Measures linear relationship only

Let’s try it out on our X = PRACTICE, Y = TRIAL Data Set Check this calculation at home!

Today Finish Theory on Regression Pathologies and Traps in Linear Regression and Correlation Relationships between Categorical Variables

Regression on Standardized Variables

?

What is the variance of ?

Variance of predicted Y’s Variance of observed Y’s Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained

Proportion of Variance of observed Y’s that is accounted for by the regression Proportion of Variance explained Note: If you exchange X and Y in the regression, you find the same r and r squared

Correlation only checks magnitude of Linear Relationships! It can happen that r=0, even though X and Y are highly related to each other! Need to look at Scatter Plot and Residual Plot to make sure that you don’t miss an obvious relationship overlooked by linear regression!

How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a = -45 b = 16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!

How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 The residuals have a systematic trend!! This Linear Regression is inappropriate!!

How does a Linear Regression Model approximate (for X=-8,-7,…,7,8) For these particular data the regression model finds a = 24 b = 0 r = 0 Correlation is Zero: No LINEAR Relationship Is there “no relationship” between X and Y? There is an extremely strong (nonlinear) relationship here!

How does a Linear Regression Model approximate (for X=1,2,…,15) For these particular data the regression model finds a =.54 b =.16 The residuals have a systematic trend!! This Linear Regression is inappropriate!!

Correlation is not Causation! Correlation between the size of your big toe and your performance on reading tasks is highly positive! ?? Lurking Third Variable: AGE

Correlation is not Causation! experimentation Only experimentation allows us to attribute causation to the relationship between independent and dependent variables.

Ecological Correlation: Correlations between averages are higher than correlations between individuals X Y X Group averages Y Group averages

Problem of Restricted Range GRE scores Success in Graduate School Strong Linear Relationship No Linear Relationship

Extrapolations are Dangerous Year Number of Passengers

Regression toward the Mean The term “Regression” is associated with Sir Francis Galton (1822 – 1911) Picture taken from Galton (1885) “Regression towards Mediocrity In Hereditary Stature” Journal of the Anthropological Institute

Regression toward the Mean Suppose:

Regression toward Mediocrity?? Predictions are closer to zero (the mean) then the observations!!

r=

r= Among families where the father is approximately 2 standard deviations above the mean, the average son is only about 1.2 standard deviations above the mean.

Regression toward Mediocrity?? Do the sons just become more similar to each other than their fathers were?

Regression toward Mediocrity?? Variability of the Z scores is the same! No slide into mediocrity!!

Regression toward the mean When you have a lucky and exceptionally good performance in an exam, you expect to do worse next time, because there is no reason to believe that you will be so exceptionally lucky again. When you have a mental block and exceptionally bad performance in an exam, you expect to do better next time, because there is no reason to believe that you will be so exceptionally unlucky again. This does not mean that you are becoming more and more average as time progresses. It means that your average performance, as a reasonable predictor for future performance, will lead to such a pattern of relationships between observed and predicted performance

Regression toward the mean Your room mate makes a huge mess in your room. You complain. The next few days are cleaner. Your room mate has cleaned up the room. You praise your room mate. The next few days the room gets dirtier. Does this mean that punishment leads to better performance and reward leads to worse performance? No….

Regression toward the mean Your room mate makes a huge mess in your room. You do nothing. The next few days are cleaner. Your room mate has cleaned up the room. You do nothing. The next few days the room gets dirtier. Your room mate simply makes messes, cleans them, makes messes, cleans them … Your best guess for the future is an “average” level of messiness

Implications for Research It is very risky to study anything based on selection of extreme groups Test   Retest Extremes become less extreme May look like a treatment effect!

Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left21225 Right Marginal Distributions

Theory “Mothers tend to hold their babies with the non-dominant hand, so that the dominant hand is available to do stuff.”

Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right.826 (82.6%).174 (17.4%).889 (88.9%).111 (11.1%) Marginal Proportions (Percentages) Vast majority of babies held left Vast majority of mothers right-handed

Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right (100%) Conditional proportions, given side on which the baby is held Absolute size not taken into account

Relationships between Categorical Variables Baby Held Right- Handed Mother Left- Handed Mother Left Right (100%) Conditional proportions, given dexterity of mother Absolute size not taken into account

Relationships between Categorical Variables 1 (100%) For any given dexterity of the mother, there is an overwhelming tendency to hold the baby on the left hand side. Absolute size not taken into account Baby Held Right- Handed Mother Left- Handed Mother Left Right

Segmented Bargraphs

Conclusion?? Lurking Third Variable? Heart beat helps baby calm down

Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female Business School Law School

Simpson’s Paradox AdmitDeny Male Female AdmitDeny Male.7030 Female Overall: Overall conditional proportions per gender Men Priviliged!! Gender Discr.!!

Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female AdmitDeny Male Female AdmitDeny Male Female Women Priviliged!?! Women Priviliged!?!

Simpson’s Paradox AdmitDeny Male Female18020 AdmitDeny Male1090 Female AdmitDeny Male Female AdmitDeny Male Female However: Higher admission rate for male dominated discipline