Least Squares Method: the Meaning of r2

Slides:



Advertisements
Similar presentations
Coefficient of Determination- R²
Advertisements

AP Statistics Section 3.2 C Coefficient of Determination
The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,
R Squared. r = r = -.79 y = x y = x if x = 15, y = ? y = (15) y = if x = 6, y = ? y = (6)
1 Functions and Applications
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Greg C Elvers.
Statistics Measures of Regression and Prediction Intervals.
 Coefficient of Determination Section 4.3 Alan Craig
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Simple Linear Regression
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Linear Regression and Correlation Topic 18. Linear Regression  Is the link between two factors i.e. one value depends on the other.  E.g. Drivers age.
Simple Linear Regression
Biostatistics Unit 9 – Regression and Correlation.
Ch 3 – Examining Relationships YMS – 3.1
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Linear Regression Least Squares Method: the Meaning of r 2.
Linear Regression Least Squares Method: an introduction.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Regression Regression relationship = trend + scatter
Regression Lines. Today’s Aim: To learn the method for calculating the most accurate Line of Best Fit for a set of data.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
11/26/2015 V. J. Motto 1 Chapter 1: Linear Models V. J. Motto M110 Modeling with Elementary Functions 1.5 Best-Fit Lines and Residuals.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
POD 09/19/ B #5P a)Describe the relationship between speed and pulse as shown in the scatterplot to the right. b)The correlation coefficient, r,
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Regression and Correlation of Data Summary
1 Functions and Applications
LEAST – SQUARES REGRESSION
Statistics 101 Chapter 3 Section 3.
The Lease Squares Line Finite 1.3.
10.3 Coefficient of Determination and Standard Error of the Estimate
Ch12.1 Simple Linear Regression
Econ 3790: Business and Economics Statistics
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Coefficient of Determination
Simple Linear Regression
Linear Regression.
residual = observed y – predicted y residual = y - ŷ
AP STATISTICS LESSON 3 – 3 (DAY 2)
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
13. Mathematics and health
R Squared.
Least-Squares Regression
Least-Squares Regression
HW# : Complete the last slide
Ch 4.1 & 4.2 Two dimensions concept
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Regression and Correlation of Data
Ch 9.
The Squared Correlation r2 – What Does It Tell Us?
Regression and Correlation of Data
Chapter 14, part C Goodness of Fit..
The Squared Correlation r2 – What Does It Tell Us?
Presentation transcript:

Least Squares Method: the Meaning of r2 Linear Regression Least Squares Method: the Meaning of r2

We are given the following ordered pairs: (1. 2,1), (1. 3,1. 6), (1 We are given the following ordered pairs: (1.2,1), (1.3,1.6), (1.7,2.7), (2,2), (3,1.8), (3,3), (3.8,3.3), (4,4.2). They are shown in the scatterplot below:

Now we show the line . . This is the mean of the y values.

The line segments show the deviations from the mean for each data point.

The squares of the deviations are shown geometrically The squares of the deviations are shown geometrically. Squaring has the consequence of making each difference positive. The greater the variation in y, the larger are the squares. If the y values are close together, the squares will be small.

This is the geometric representation of the sum of the squares from the previous slide.

Now the best fit line is shown.

The directed distance is called the residual The directed distance is called the residual. For each point this is the difference between the actual y value and the predicted y value. As with deviations, some residuals are positive, some are negative. Together they add to zero.

This graph gives a geometric representation of the squares of the residuals. As with the squares of the deviations this produces all positive quantities.

This is the geometric representation of the sum of the squares of the residuals. This quantity is minimized in the least squares method of linear regression. We use the line that produces the smallest sum of the squares of the residuals.

We now see both the squares of the deviations from the mean (green squares) and squares of residuals (red squares).

This geometric representation of the sum of the squares of the residuals (in red) shows that this quantity is a portion of the total sum of the squares of the deviations (imagine the entire green square). All of the variation in y is represented by the larger green square, and the part that is not explainable by the regression equation is in red. The green square is called SST, the sum of the squares, total, about the mean. The red square is called SSE, the sum of the squares of the error about the line. SST SSE

If we now measure the quantities SSE and SST, we can make a useful calculation, the coefficient of determination, or r2. Recall that SST is the total sum of the squares of the deviations about the mean value of y, and SSE is the sum of the squares of the error (residuals) about the line. n.b. This r2 is the square of the correlation coefficient.

In our example, SST=7.6 and SSE= 2.44. Therefore, This means that 68% of the variation in y is explained by the regression line. The meaning of r2 is extremely important in Statistics.