Lecture 5 Chapter 4. Relationships: Regression Student version.

Slides:



Advertisements
Similar presentations
Chapter 3 Bivariate Data
Advertisements

Chapter 6: Exploring Data: Relationships Lesson Plan
Scatterplots and Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Looking at Data-Relationships 2.1 –Scatter plots.
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Basic Practice of Statistics - 3rd Edition
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Objectives (BPS chapter 5)
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
 Pg : 3b, 6b (form and strength)  Page : 10b, 12a, 16c, 16e.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Relationships Regression BPS chapter 5 © 2006 W.H. Freeman and Company.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.2 Least-Squares Regression.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Lecture 7 Simple Linear Regression. Least squares regression. Review of the basics: Sections The regression line Making predictions Coefficient.
Chapter 5: 02/17/ Chapter 5 Regression. 2 Chapter 5: 02/17/2004 Objective: To quantify the linear relationship between an explanatory variable (x)
4. Relationships: Regression
4. Relationships: Regression
CHAPTER 3 Describing Relationships
Essential Statistics Regression
Chapter 3: Describing Relationships
The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Examining Relationships
Objectives (IPS Chapter 2.3)
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Basic Practice of Statistics - 3rd Edition Regression
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Correlation/regression using averages
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Chapter 3: Describing Relationships
Honors Statistics Review Chapters 7 & 8
Correlation/regression using averages
Presentation transcript:

Lecture 5 Chapter 4. Relationships: Regression Student version

Objectives (PSLS Chapter 4) Regression  The least-squares regression line  Finding the least-squares regression line  The coefficient of determination, r 2  Outliers and influential observations  Making predictions  Association does not imply causation

The least-squares regression line The least-squares regression line is the unique line such that the sum of the vertical distances between the data points and the line is zero, and the sum of the squared vertical distances is the smallest possible.

Not all calculators/software use this convention. Other notations include: is the predicted y value on the regression line slope 0 Notation

Interpretation The slope of the regression line describes how much we expect y to change, on average, for every unit change in x. The intercept is a necessary mathematical descriptor of the regression line. It does not describe a specific property of the data.

The slope of the regression line, b, equals: r is the correlation coefficient between x and y s y is the standard deviation of the response variable y s x is the standard deviation of the explanatory variable x Finding the least-squares regression line Key Feature: The mean of x and y are always on the regression line.

Plotting the least-square regression line Use the regression equation to find the value of y for two distinct values of x, and draw the line that goes through those two points. The points used for drawing the regression line are derived from the equation. They are NOT actual points from the data set (except by pure coincidence).

Don’t compute the regression line until you have confirmed that there is a linear relationship between x and y. ALWAYS PLOT THE RAW DATA Least-squares regression is only for linear associations These data sets all give a linear regression equation of about y ̂ = x. But don’t report that until you have plotted the data.

Moderate linear association; regression OK. Obvious nonlinear relationship; 1 st order linear regression inappropriate. One extreme outlier, the line is a poor descriptor of the association. Only two values for x; extrapolation over the domain of x and heavy leverage by a single observation. y ̂ = x

The coefficient of determination, r 2 r 2 represents the fraction of the variance in y that can be explained by the regression model. r 2, the coefficient of determination, is the square of the correlation coefficient. r = 0.87, so r 2 = 0.76 This model explains 76% of individual variations in BAC

r = –0.3, r 2 = 0.09, or 9% The regression model explains not even 10% of the variations in y. r = –0.7, r 2 = 0.49, or 49% The regression model explains nearly half of the variations in y. r = –0.99, r 2 = , or ~98% The regression model explains almost all of the variations in y.

Outlier: An observation that lies outside the overall pattern. “Influential individual”: An observation that markedly changes the regression much more than other points, if removed. This is often an isolated point. Outliers and influential points Child 19 = outlier (large residual) Child 18 = potential influential individual Child 19 is an outlier of the relationship (it is unusually far from the regression line, vertically). Child 18 is isolated from the rest of the points, and might be an influential point.

The vertical distances from each point to the least-squares regression line are called residuals. The sum of all the residuals is by definition 0. Outliers have unusually large residuals (in absolute value). Observed y ^ Predicted y Residuals Points above the line have a positive residual (under estimation). Points below the line have a negative residual (over estimation).

All data Without child 18 Without child 19 Outlier Influential Child 18 changes the regression line substantially when it is removed. So, Child 18 is indeed an influential point. Child 19 is an outlier of the relationship, but it is not influential (regression line changed very little by its removal).

Making predictions Use the equation of the least-squares regression to predict y for any value of x within the range studied. Predication outside the range is extrapolation. Extrapolation relies on assumptions that are not supported by the data being analyzed. What would we expect for the BAC after drinking 6.5 beers?

The least-squares regression line is:  Roughly 21 manatee deaths. If Florida were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths in a year? Thousands powerboats Manatee deaths Could we use this regression line to predict the number of manatee deaths for a year with 200,000 powerboat registrations?

In each example, what is most likely the lurking variable? Notice that some cases are more obvious than others. Strong positive association between the number firefighters at a fire site and the amount of damage a fire does. Negative association between moderate amounts of wine-drinking and death rates from heart disease in developed nations. Strong positive association between the shoe size and reading skills in young children.

Evidence of causation from an observed association can be built if: 1.Higher doses are associated with stronger responses. 2.The alleged cause precedes the effect. 3.The biological mechanism is understood and is consistent. 4.The association is strong. 5.The association is consistent. Establishing causation Lung cancer is clearly associated with smoking. What if a genetic mutation (lurking variable) caused people to both get lung cancer and become addicted to smoking? It took years of research and accumulated indirect/non-experimental evidence to reach the conclusion that smoking causes lung cancer in humans.