Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.

Slides:



Advertisements
Similar presentations
Chapter 6: Exploring Data: Relationships Lesson Plan
Advertisements

AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Chapter 5: Regression1 Chapter 5 Relationships: Regression.
Looking at Data-Relationships 2.1 –Scatter plots.
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
CHAPTER 3 Describing Relationships
Ch 2 and 9.1 Relationships Between 2 Variables
Basic Practice of Statistics - 3rd Edition
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Objectives (BPS chapter 5)
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
Essential Statistics Chapter 41 Scatterplots and Correlation.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 2 Looking at Data: Relationships.
Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Describing Relationships
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Chapter 5: 02/17/ Chapter 5 Regression. 2 Chapter 5: 02/17/2004 Objective: To quantify the linear relationship between an explanatory variable (x)
Essential Statistics Chapter 41 Scatterplots and Correlation.
Lecture 9 Sections 3.3 Objectives:
Essential Statistics Regression
Examining Relationships Least-Squares Regression & Cautions about Correlation and Regression PSBE Chapters 2.3 and 2.4 © 2011 W. H. Freeman and Company.
Cautions about Correlation and Regression
Basic Practice of Statistics - 3rd Edition Lecture PowerPoint Slides
Chapter 2 Looking at Data— Relationships
Chapter 2: Looking at Data — Relationships
Daniela Stan Raicu School of CTI, DePaul University
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 2 Looking at Data— Relationships
Examining Relationships
Basic Practice of Statistics - 5th Edition Regression
HS 67 (Intro Health Stat) Regression
Objectives (IPS Chapter 2.3)
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Daniela Stan Raicu School of CTI, DePaul University
Basic Practice of Statistics - 3rd Edition Regression
Correlation/regression using averages
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
9/27/ A Least-Squares Regression.
Correlation/regression using averages
Presentation transcript:

Chapter 5 Regression

u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict the average response for all subjects with a given value of the explanatory variable. Linear Regression

Prediction via Regression Line Number of new birds and Percent returning Example: predicting number (y) of new adult birds that join the colony based on the percent (x) of adult birds that return to the colony from the previous year.

Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data?

Least Squares u Used to determine the “best” line u We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict) u Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line

Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added (Pythagoras). The regression line The least-squares regression line is the unique line such that the sum of the squared vertical (y) distances between the data points and the line is the smallest possible.

Least Squares Regression Line u Regression equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat” is the average value of the response variable (predicted response for a value of x) –note that a and b are just the intercept and slope of a straight line –note that r and b are not the same thing, but their signs will agree

Prediction via Regression Line Number of new birds and Percent returning u The regression equation is y-hat =  x –y-hat is the average number of new birds for all colonies with percent x returning u For all colonies with 60% returning, we predict the average number of new birds to be 13.69:  (0.3040)(60) = birds u Suppose we know that an individual colony has 60% returning. What would we predict the number of new birds to be for just that colony?

^ u Regression equation: y = a + bx Regression Line Calculation where s x and s y are the standard deviations of the two variables, and r is their correlation

Regression Calculation Case Study Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe

CountryPer Capita GDP (x)Life Expectancy (y) Austria Belgium Finland France Germany Ireland Italy Netherlands Switzerland United Kingdom Regression Calculation Case Study

Linear regression equation: y = x ^ Regression Calculation Case Study

Facts about least-squares regression 1. The distinction between explanatory and response variables is essential in regression. 2. There is a close connection between correlation and the slope of the least-squares line. 3. The least-squares regression line always passes through the point 4. The correlation r describes the strength of a straight-line relationship. The square of the correlation, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.

Coefficient of Determination (R 2 ) u Measures usefulness of regression prediction u R 2 (or r 2, the square of the correlation): measures what fraction of the variation in the values of the response variable (y) is explained by the regression line v r=1: R 2 =1:regression line explains all (100%) of the variation in y v r=.7: R 2 =.49:regression line explains almost half (50%) of the variation in y

Residuals u A residual is the difference between an observed value of the response variable and the value predicted by the regression line: residual = y  y ^

Residuals u A residual plot is a scatterplot of the regression residuals against the explanatory variable –used to assess the fit of a regression line –look for a “random” scatter around zero

Case Study Gesell Adaptive Score and Age at First Word

Residual Plot: Case Study Gesell Adaptive Score and Age at First Word

 The x-axis in a residual plot is the same as on the scatterplot.  The line on both plots is the regression line. Only the y-axis is different.

Residuals are randomly scattered—good! A curved pattern—means the relationship you are looking at is not linear. A change in variability across plot is a warning sign. You need to find out why it is and remember that predictions made in areas of larger variability will not be as good.

Outliers and Influential Points u An outlier is an observation that lies far away from the other observations –outliers in the y direction have large residuals –outliers in the x direction are often influential for the least-squares regression line, meaning that the removal of such points would markedly change the equation of the line

Outliers: Case Study Gesell Adaptive Score and Age at First Word From all the data r 2 = 41% r 2 = 11% After removing child 18

Cautions about Correlation and Regression u only describe linear relationships u are both affected by outliers u always plot the data before interpreting u beware of extrapolation –predicting outside of the range of x u beware of lurking variables –have important effect on the relationship among the variables in a study, but are not included in the study u association does not imply causation

Caution: Beware of Extrapolation u Sarah’s height was plotted against her age u Can you predict her height at age 42 months? u Can you predict her height at age 30 years (360 months)?

Caution: Beware of Extrapolation u Regression line: y-hat = x u height at age 42 months? y-hat = 88 u height at age 30 years? y-hat = –She is predicted to be 6’ 10.5” at age 30.

Caution: Beware of Lurking Variables A lurking variable is a variable not included in the study design that does have an effect on the variables studied. Lurking variables can falsely suggest a relationship. What is the lurking variable in these examples? How could you answer if you didn’t know anything about the topic? u Strong positive association between the number firefighters at a fire site and the amount of damage a fire does –Negative association between moderate amounts of wine drinking and death rates from heart disease in developed nations

Even very strong correlations may not correspond to a real causal relationship (changes in x actually causing changes in y). (correlation may be explained by a lurking variable) Caution: Correlation Does Not Imply Causation

Social Relationships and Health u Does lack of social relationships cause people to become ill? (there was a strong correlation) u Or, are unhealthy people less likely to establish and maintain social relationships? (reversed relationship) u Or, is there some other factor that predisposes people both to have lower social activity and become ill? Caution: Correlation Does Not Imply Causation

Evidence of Causation u A properly conducted experiment establishes the connection (chapter 8) u Other considerations: –The association is strong –The association is consistent v The connection happens in repeated trials v The connection happens under varying conditions –Higher doses are associated with stronger responses –Alleged cause precedes the effect in time –Alleged cause is plausible (reasonable explanation)