Download presentation
Presentation is loading. Please wait.
Published byWalter Alexander Modified over 9 years ago
1
Least Squares Regression: y on x © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules
2
Least Squares Regression We often want to know whether there is a relationship between one variable and another. e.g. Does the number of driving accidents increase with the age of the driver? e.g. Can we predict a student’s mark in a French exam if we know it in an English exam? e.g. Is the weight of a baby at birth related to the height of the father? You met sets of data like these at GCSE and you’ve drawn scatter diagrams and also drawn a line of best fit “by eye”. This line is called the regression line. In this presentation we will see how to calculate a regression line.
3
Least Squares Regression The data I’m going to use is a random sample from the Census at School database. I’ve chosen a random sample from the data for height and foot size of 99 children from the UK.
4
Least Squares Regression This is a scatter diagram of the data. We will find the equation of the line that could be used to predict the foot length of a child whose height is known. Foot length and height of UK children Height (cm) Foot length (cm)
5
Least Squares Regression This is a scatter diagram of the data. Foot length and height of UK children Height (cm) Foot length (cm) e.g. This length... is squared
6
Least Squares Regression This is a scatter diagram of the data. Foot length and height of UK children Height (cm) Foot length (cm) e.g. This length... is squared and added to the other squares. Points below the line result in negative “lengths”, so would cancel out those above if we didn’t square.
7
Least Squares Regression Foot length and height of UK children Foot length (cm) Height (cm) This is a scatter diagram of the data. The line is positioned so that the sum of the squares of the distances of all the points from the line is as small as possible. This makes the line run through the middle of the points.
8
Least Squares Regression Foot length and height of UK children Foot length (cm) Height (cm) This is a scatter diagram of the data. To find the equation of the regression line we need the values of the gradient and the intercept on the y -axis. This line is called the least squares regression line of y on x.
9
Least Squares Regression SUMMARY To estimate a value of y for a given value of x, we need the least squares regression line of y on x. Suppose we have a set of values of 2 variables, x and y. The equation of the line is of the form where b is the gradient and a is the intercept on the y -axis. To find the values of the gradient and intercept on my calculator I... ( note down here what you need to do ) The gradient is given by b and called the regression coefficient. The intercept is given by a. The regression line always passes through the point where and are the means of the x - and y - values respectively.
10
Least Squares Regression Taking Exams The problem with using a calculator to find the regression line and then directly writing down the answer is that one small error entering the data could mean that in an exam you lose several marks. To avoid this problem we always check the data carefully after entering it. The formulae are in your formulae booklet but we’ll now see what the terms in the formulae mean. If you you are given summary data instead of raw data, you will need to use the formulae as it isn’t then possible to use the calculator regression function.
11
Least Squares Regression I’ll use the simple data set again to illustrate the method. Formulae for the regression line x123 y541 The gradient of the regression line for y on x is given by is called the covariance and
12
Least Squares Regression Formulae for the regression line x123 y541 The gradient of the regression line for y on x is given by is called the covariance and I’ll use the simple data set again to illustrate the method.
13
Least Squares Regression Formulae for the regression line x123 y541 The gradient of the regression line for y on x is given by As before, we use the 2 nd form I’ll use the simple data set again to illustrate the method.
14
Least Squares Regression Formulae for the regression line x123 y541 The gradient of the regression line for y on x is given by I’ll use the simple data set again to illustrate the method.
15
Least Squares Regression Formulae for the regression line The equation of the line is We now use the fact that the regression line passes through the point so these coordinates satisfy the equation where, So, Now enter the data into your calculator and use the regression function to check the result. I’ll use the simple data set again to illustrate the method. x123 y541
16
Least Squares Regression Using Summary Data The equation of the regression line of y on x is The gradient of the line is called the regression coefficient and is given by satisfies the equation so, ( The 2 nd formula given in your formulae booklet for b is not in the most convenient form. It’s best to work out and then divide them as above.)
17
Least Squares Regression e.g.1 The following results are given for 10 pairs of observations relating 2 variables x and y : Find the regression coefficient of y on x and the equation of the regression line of y on x. Solution: The regression coefficient is b, the gradient of the regression line of y on x.
18
Least Squares Regression e.g.1 The following results are given for 10 pairs of observations relating 2 variables x and y : Find the regression coefficient of y on x and the equation of the regression line of y on x. Solution: The equation of the regression line of y on x is
19
Least Squares Regression Exercise Find the regression coefficient of y on x and the equation of the regression line of y on x for each of the following sets of data: 1. 2.
20
Least Squares Regression Solutions 1. Regression coef. of y on x Your answers may be slightly different from mine as I stored each value as I calculated it and used the fully correct values rather than rounded ones when I did subsequent calculations. This is good practice but not essential at this stage.
21
Least Squares Regression 2. Solution: Regression coef. of y on x
22
Least Squares Regression Exercise 1.Find the equation of the least squares regression line of y on x, for the following sets of data: (a) 2.Using the answer to 1(b), estimate the values of y for x = 12 and x = 21, giving your answers to 1 d.p. Are these values reliable? If not, why not? 98754421y 1411986431x (b) 81418105171318315y 23171619251815222820x ( Give the gradient and intercept to 2 d.p. )
23
Least Squares Regression Solutions: 1(a) (b) 2. The 1 st answer is not reliable since 12 lies outside the range of values used to calculate the regression line. The 2 nd gives a reasonable estimate.
25
Least Squares Regression The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied. For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.
26
Least Squares Regression SUMMARY To estimate a value of y for a given value of x, we need the least squares regression line of y on x. Suppose we have a set of values of 2 variables, x and y. The equation of the line is of the form where b is the gradient and a is the intercept on the y -axis. To find the values of the gradient and intercept on my calculator I... ( note down here what you need to do ) The gradient is given by b and called the regression coefficient. The intercept is given by a. The regression line always passes through the point where and are the means of the x - and y - values respectively.
27
Least Squares Regression 135y 321x e.g. We can enter the x and y values into the calculator and get The equation of the y on x regression line is
28
Least Squares Regression Using summary data The equation of the regression line of y on x is The gradient of the line is called the regression coefficient and is given by satisfies the equation so, ( The 2 nd formula given in your formula booklet for b is not in the most convenient form. It’s best to work out and then divide them as above.)
29
Least Squares Regression e.g.1 The following results are given for 10 pairs of observations relating 2 variables x and y : Find the regression coefficient of y on x and the equation of the regression line of y on x. Solution: The regression coefficient is b, the gradient of the regression line of y on x.
30
Least Squares Regression The equation of the regression line of y on x is
31
Least Squares Regression Suppose we have data showing that there is a strong linear relationship between the amount of fertilizer used on some plants and the yield from the plants. The yield clearly depends on the amount of fertilizer, not the other way round. The yield is responding to the fertilizer. In this example, the yield is called the response, or dependent, variable. Explanatory and Response Variables The amount of fertilizer used is the explanatory, or independent, variable. It will have been controlled in the trial from which the data have been taken.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.