Least Squares Regression

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 10 Curve Fitting and Regression Analysis
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Simple Linear Regression
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Simple Linear Regression Analysis
Chapter 2 – Simple Linear Regression - How. Here is a perfect scenario of what we want reality to look like for simple linear regression. Our two variables.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Chapter 6 & 7 Linear Regression & Correlation
Linear Regression Least Squares Method: the Meaning of r 2.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Correlation and Regression Chapter 9. § 9.3 Measures of Regression and Prediction Intervals.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Chapter 6 Simple Regression Introduction Fundamental questions – Is there a relationship between two random variables and how strong is it? – Can.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Chapter 3 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Chapter 5 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 2 Linear regression.
The simple linear regression model and parameter estimation
Copyright © Cengage Learning. All rights reserved.
CHAPTER 3 Describing Relationships
Simple Linear Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Unit 4 LSRL.
LSRL.
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
Least Squares Regression Line.
LEAST – SQUARES REGRESSION
Statistics 101 Chapter 3 Section 3.
Linear Regression Special Topics.
Correlation and Simple Linear Regression
Basic Estimation Techniques
Chapter 5 LSRL.
Multiple Regression.
Chapter 3.2 LSRL.
The Least-Squares Regression Line
Lecture Slides Elementary Statistics Thirteenth Edition
Basic Estimation Techniques
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 26: Inference for Regression
LEAST – SQUARES REGRESSION
Simple Linear Regression
Simple Linear Regression
Least Squares Regression Line LSRL Chapter 7-continued
AP STATISTICS LESSON 3 – 3 (DAY 2)
Section 3.3 Linear Regression
AP Statistics, Section 3.3, Part 1
CHAPTER 3 Describing Relationships
Least Squares Method: the Meaning of r2
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Least-Squares Regression
Section 6.2 Prediction.
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Introduction to Regression
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Least Squares Regression a view by Henry Mesa

Use the arrow keys on your keyboard to move the slides forward and backward.

Here is another view that explains the reason for regression. We are interested in a particular variable, which we name Y for convenience, the response variable. We are only considering the situation in which Y is normally distributed. The distribution of Y has some mean and standard deviation.

What we would like to do is introduce another variable, which we call the explanatory variable, X, in the hopes that by doing so I can minimize the standard deviation of Y, .

Here is a cute site, showing a ball drop that creates a normal distribution. http://www.ms.uky.edu/~mai/java/stat/GaltonMachine.html Ok, go to the next slide.

Let us look at three scenarios so you can understand what we are doing. Oh, my gosh, cute ball thing! Y Lets say that Y is the height of a male adult. The height of male adult is normally distributed, and a population standard deviation exists, yet the value is unknown to us. We will introduce another variable, called X, in the hope to reduce the variability of the original standard deviation. x1 X

So, let us consider two different scenarios. What we are hoping for is that for each value of the variable X, we can only associate a small range of the Y variable, thus decreasing our variation. Y So, let us consider two different scenarios. Possible Range of values for Y associated with x1; 2 standard deviations long x1 X

A poor situation- There is a linear association but for each value of x, the possible values of Y associated with x is too large; in this case r will be a low value. Y Let us day that X, represents the age of an individual adult. That would not be a good predictor of adult height. Possible Range of values for Y associated with x1; 2 standard deviations long v x1 x2 x3 x4 x5 X

A great situation- There is a linear association but for each value of x, and the possible values of Y associated with each x value is very small; in this case r will have a high value. Y Let us say that X represents the length of an adult male foot. The height of an individual and their foot size should be highly correlated. vc v Thus, in this case, the variable X should allow for a better prediction of the variable Y. So for example, a foot print if found at a crime scene, we would expect a decent predictor of height for the person whose print we recovered. x1 x2 x3 x4 x5 X

The Reason For the Title “Least Squares Regression Line”

You will notice that the variable Y, has a mean value, Y, but its value is not known to us. x1 x2 x3 x4 x5 X

But we can use the y values from the ordered pairs to calculate y-hat, . x1 x2 x3 x4 x5 X

To remind ourselves of what is occurring, here is a picture of the model. The variables X and Y are not perfectly associated, but there is a linear trend. The line consist of joining the infinitely many normal distributions, by going through each center. X x2 x1 Y

The hope is that the means of each distribution is different (the larger the difference the better) and that the overall standard deviation (the same for each distribution) is small, so as to emphasize the difference between the means. X x2 x1 Y

Furthermore, Y is a different value from all of the individual means, except for one of the means.

Lastly the equation of the line is given by Where “beta-one” is the slope of the line, but in reality none of these values is known to us, so we must rely on the data to approximate the slope. Thus, the statistical formula is Where b1 is the slope of the line, and it is a statistic, and y-hat is the approximation for the individual means for each distribution. v X x2 x1 The red line is the result of calculating the least squares regression line. While the green line is the “true” line. Y

So is our best estimate of Y and y-hat, So is our best estimate of Y and y-hat, . is our best estimate for the individual means for each distribution, yxi Y v x1 x2 x3 x4 x5 X

So basically, we know nothing about our population, and unlike chapter 6 section 1 of Moore McCabe, we also do not know . Y v x1 x2 x3 x4 x5 X

On that note, we introduce the “error” which is not an error in the everyday sense. x1 x2 x3 x4 x5 X

Error = observed y – predicted y. Predicted data Observed data v x2 x1 x3 x4 x5 X

Y Predicted data v Observed data x1 x2 x3 x4 x5 X Now calculate all errors for each observed data values and sum the errors squared. Y Predicted data Observed data v x1 x2 x3 x4 x5 X

Y Predicted data v Observed data x1 x2 x3 x4 x5 X I think you would agree that we would like the least possible error (strongest association). Y Predicted data Observed data v x1 x2 x3 x4 x5 X

The “least squares” regression line. Notice that we are “squaring” each error and we have established we would like the sum of squared errors to be as small as possible in order to have a “strong” association. That is what the calculation for the equation is trying to do, minimize the squared errors! The least squares regression line.

The Meaning of R-squared

Here is a fun fact. The mean of the observed y’s is The mean of is also , and I mean the same number! The average is the same for both sets. Predicted data Observed data v x1 x2 x3 x4 x5 X

So, r-squared is the ratio of our approximation of the variation of the observed y’s, versus the predicted y’s. Think of it this way. Variability is what we are trying to model; we want the variability of the predicted y’s (the model) to closely match the variability of the observed y’s

Y Measures variation in observed y’s. Do this for all observed y’s, square the difference, and add. v x1 x2 x3 x4 x5 X

Y Measures variation in predicted y’s. Do this for all predicted y’s, square the difference, and add. v x1 x2 x3 x4 x5 X

Y v Thus, the ratio, says what fraction of the total variation (approximation of the actual situation), can be matched using the least squares regression line and the variable X as the model. x1 x2 x3 x4 x5 X

Least Squares Regression a view by Henry Mesa