1 Functions and Applications

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

1 Functions and Applications
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Linear Regression.
Copyright © Cengage Learning. All rights reserved. 1 Functions and Their Graphs.
Copyright © Cengage Learning. All rights reserved.
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1.6 Linear Regression & the Correlation Coefficient.
Linear Regression Least Squares Method: the Meaning of r 2.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
PreCalculus 1-7 Linear Models. Our goal is to create a scatter plot to look for a mathematical correlation to this data.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Multiple Regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Copyright © Cengage Learning. All rights reserved.
Department of Mathematics
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
The Lease Squares Line Finite 1.3.
1 Functions and Applications
2.1 Equations of Lines Write the point-slope and slope-intercept forms
Statistics Correlation
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Simple Linear Regression
13 Functions of Several Variables
Multiple Regression.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Descriptive Analysis and Presentation of Bivariate Data
Least Squares Method: the Meaning of r2
Chapter 3: Describing Relationships
Functions and Their Graphs
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Copyright © Cengage Learning. All rights reserved.
Correlation and Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Homework: pg. 180 #6, 7 6.) A. B. The scatterplot shows a negative, linear, fairly weak relationship. C. long-lived territorial species.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Introduction to Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
MATH 2311 Section 5.3.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

1 Functions and Applications Copyright © Cengage Learning. All rights reserved.

Copyright © Cengage Learning. All rights reserved. Linear Regression 1.4 Copyright © Cengage Learning. All rights reserved.

Linear Regression To find a linear model given two data points: We find the equation of the line that passes through them. However, we often have more than two data points, and they will rarely all lie on a single straight line, but may often come close to doing so. The problem is to find the line coming closest to passing through all of the points.

Linear Regression Suppose, for example, that we are conducting research for a company interested in expanding into Mexico. Of interest to us would be current and projected growth in that country’s economy. The following table shows past and projected per capita gross domestic product (GDP) of Mexico for 2000–2014.

Linear Regression A plot of these data suggests a roughly linear growth of the GDP (Figure 27(a)). These points suggest a roughly linear relationship between t and y, although they clearly do not all lie on a single straight line. Figure 27(a)

Linear Regression Figure 27(b) shows the points together with several lines, some fitting better than others. Can we precisely measure which lines fit better than others? For instance, which of the two lines labeled as “good” fits in Figure 27(b) models the data more accurately? Figure 27(b)

Linear Regression We begin by considering, for each value of t, the difference between the actual GDP (the observed value) and the GDP predicted by a linear equation (the predicted value). The difference between the predicted value and the observed value is called the residual. Residual = Observed Value – Predicted Value

Linear Regression On the graph, the residuals measure the vertical distances between the (observed) data points and the line (Figure 28) and they tell us how far the linear model is from predicting the actual GDP. Figure 28

Linear Regression The more accurate our model, the smaller the residuals should be. We can combine all the residuals into a single measure of accuracy by adding their squares. (We square the residuals in part to make them all positive.) The sum of the squares of the residuals is called the sum-of-squares error, SSE. Smaller values of SSE indicate more accurate models.

Linear Regression Observed and Predicted Values Suppose we are given a collection of data points (x1, y1), …, (xn, yn). The n quantities y1, y2, …, yn are called the observed y-values. If we model these data with a linear equation ŷ = mx + b, then the y-values we get by substituting the given x-values into the equation are called the predicted y-values: ŷ1 = mx1 + b ŷ2 = mx2 + b … ŷn = mxn + b. ŷ stands for “estimated y” or “predicted y.” Substitute x1 for x. Substitute x2 for x. Substitute xn for x.

Linear Regression Quick Example Consider the three data points (0, 2), (2, 5), and (3, 6). The observed y-values are y1 = 2, y2 = 5, and y3 = 6. If we model these data with the equation ŷ = x + 2.5, then the predicted values are: ŷ1 = x1 + 2.5 = 0 + 2.5 = 2.5 ŷ2 = x2 + 2.5 = 2 + 2.5 = 4.5 ŷ3 = x3 + 2.5 = 3 + 2.5 = 5.5.

Linear Regression Residuals and Sum-of-Squares Error (SSE) If we model a collection of data (x1, y1), …, (xn, yn) with a linear equation ŷ = mx + b, then the residuals are the n quantities (Observed Value – Predicted Value): (y1 – ŷ1), (y2 – ŷ2), …, (yn – ŷn). The sum-of-squares error (SSE) is the sum of the squares of the residuals: SSE = (y1 – ŷ1)2 + (y2 – ŷ2)2 + … +(yn – ŷn)2.

Linear Regression Quick Example For the data and linear approximation given above, the residuals are: y1 – ŷ1 = 2 – 2.5 = –0.5 y2 – ŷ2 = 5 – 4.5 = 0.5 y3 – ŷ3 = 6 – 5.5 = 0.5 and so SSE = (–0.5)2 + (0.5)2 + (0.5)2 = 0.75.

Example 1 – Computing SSE Using the data above on the GDP in Mexico, compute SSE for the linear models y = 0.5t + 8 and y = 0.25t + 9. Which model is the better fit? Solution: We begin by creating a table showing the values of t, the observed (given) values of y, and the values predicted by the first model.

Example 1 – Solution cont’d We now add two new columns for the residuals and their squares. SSE, the sum of the squares of the residuals, is then the sum of the entries in the last column, SSE = 8.

Example 1 – Solution cont’d Repeating the process using the second model, 0.25t + 9, yields the following table: This time, SSE = 2 and so the second model is a better fit.

Example 1 – Solution cont’d Figure 29 shows the data points and the two linear models in question. Figure 29

Linear Regression Among all possible lines, there ought to be one with the least possible value of SSE—that is, the greatest possible accuracy as a model. The line (and there is only one such line) that minimizes the sum of the squares of the residuals is called the regression line, the least-squares line, or the best-fit line. To find the regression line, we need a way to find values of m and b that give the smallest possible value of SSE.

Linear Regression Regression Line The regression line (least squares line, best-fit line) associated with the points (x1, y1), (x2, y2), …, (xn, yn) is the line that gives the minimum (SSE).

Linear Regression The regression line is y = mx + b, where m and b are computed as follows: n = number of data points. The quantities m and b are called the regression coefficients.

Linear Regression Here, “” means “the sum of.” Thus, for example, x = Sum of the x-values = x1 + x2 + …+xn xy = Sum of products = x1y1 + x2y2 + …+ xnyn x2 = Sum of the squares of the x-values = x12 + x22 + …+ xn2. On the other hand, (x)2 = Square of x = Square of the sum of the x-values.

Coefficient of Correlation

Coefficient of Correlation If all the data points do not lie on one straight line, we would like to be able to measure how closely they can be approximated by a straight line. We know that SSE measures the sum of the squares of the deviations from the regression line; therefore it constitutes a measurement of what is called “goodness of fit.” (For instance, if SSE = 0, then all the points lie on a straight line.) However, SSE depends on the units we use to measure y, and also on the number of data points (the more data points we use, the larger SSE tends to be).

Coefficient of Correlation Thus, while we can (and do) use SSE to compare the goodness of fit of two lines to the same data, we cannot use it to compare the goodness of fit of one line to one set of data with that of another to a different set of data. To remove this dependency, statisticians have found a related quantity that can be used to compare the goodness of fit of lines to different sets of data. This quantity, called the coefficient of correlation or correlation coefficient, and usually denoted r, is between –1 and 1. The closer r is to –1 or 1, the better the fit.

Coefficient of Correlation For an exact fit, we would have r = –1 (for a line with negative slope) or r = 1 (for a line with positive slope). For a bad fit, we would have r close to 0. Figure 31 shows several collections of data points with least squares lines and the corresponding values of r. Figure 31

Coefficient of Correlation Correlation Coefficient The coefficient of correlation of the n data points (x1, y1), (x2, y2), …, (xn, yn) is It measures how closely the data points (x1, y1), (x2, y2), …, (xn, yn) fit the regression line. (The value r2 is sometimes called the coefficient of determination.)

Coefficient of Correlation Interpretation If r is positive, the regression line has positive slope; if r is negative, the regression line has negative slope. If r = 1 or –1, then all the data points lie exactly on the regression line; if it is close to ±1, then all the data points are close to the regression line. On the other hand, if r is not close to ±1, then the data points are not close to the regression line, so the fit is not a good one. As a general rule of thumb, a value of | r | less than around 0.8 indicates a poor fit of the data to the regression line.

Example 3 – Computing the Coefficient of Correlation Use the following table that shows past and projected per capita gross domestic product (GDP) of Mexico for 2000–2014 and find the correlation coefficient for the same. Is the regression line a good fit?

Example 3 – Solution The formula for r requires x, x2, xy, y, and y2. Let’s organize our work in the form of a table, where the original data are entered in the first two columns and the bottom row contains the column sums.

Example 3 – Solution Substituting these values into the formula we get cont’d Substituting these values into the formula we get As r is close to 1, the fit is a fairly good one; that is, the original points lie nearly along a straight line.