Download presentation
Presentation is loading. Please wait.
Published byAlexina Aileen Cross Modified over 9 years ago
1
Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com
S1: Chapter 7 Regression Dr J Frost Last modified: 22nd January 2016
2
What is regression? Exam mark (π¦) π¦=20+3π₯ Time spent revising (π₯)
I record peopleβs exam marks as well as the time they spent revision. I want to predict how well someone will do based on the time they spent revision. How would I do this? What weβve done here is come up with a model to explain the data, i.e. a line π=π+ππ. Weβve then tried to set π and π such that the resulting π value matches the actual exam marks as close as possible. The βregressionβ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data.
3
What is regression? Rabbit population (π¦) Time (π₯)
In this chapter we only cover linear regression, where our chosen model is a straight line. But in general we could use any model that might best explain the data. Population tends to grow exponentially rather than linearly, so we might make our model π¦=πΓ π π₯ and then try to use regression to work out the best π and π to use.
4
Explanatory and Response Variables
Exam mark (π¦) Time spent revising (π₯) ! An independent (or explanatory) variable is one that is set independently of other variables. It goes on the x-axis. ! A dependent (or response) variable is one whose values are determined by the values of the independent variable. It goes on the y-axis.
5
So how do we numerically find the line of best fit?
π¦ The residuals are the errors between the π¦ value predicted by the model and the y value of each data point. π 1 π 2 π 3 π 4 π 5 π 6 π 7 π₯ We minimise the total of the squares of the residuals. Ξ£ π π 2 Why squared? This is known as a least squares regression line.
6
So how do we numerically find the line of best fit?
Notice that in regression, we write the terms in ascending powers of π₯, contrary to algebraic convention. Hence π is the π¦-intercept, not the gradient. π¦ π 1 π 2 π 3 π 4 π 5 π 6 π 7 π=π+ππ The mean of x and y is on the line, i.e. π¦ =π+π π₯ . Hence this gives us π. To remember the gradient, I think chromosomes of men and women. Men come out top! π₯ It turns out (using differentiation techniques youβll see in C2) that the π and π we use to minimise the total (squared) error is: π= πΊ ππ πΊ ππ π= π βπ π
7
Example Mass, π (kg) 20 40 60 80 100 Length, π (cm) 48 55.1 56.3 61.2 68 Calculate π π₯π₯ and π π¦π¦ (You may use that π΄π₯=300, π΄ π₯ 2 =22 000, π₯ =60, π΄π₯π¦=18 238, π΄ π¦ 2 = , π΄π¦=288.6, π¦ =57.72) π π₯π₯ = π π₯π¦ =922 b) Calculate the regression line of π¦ on π₯. π= π=43.89 ππ π¦= π₯ ? ? ? ? ? π= πΊ ππ πΊ ππ π= π βπ π Broculator Tip: Your calculator will calculate π and π while in STATS mode (under the Reg menu)
8
Test Your Understanding
May 2009 Q5 Note that once finding π and π, you still need to write the equation at the end for the final mark! A common error is to do π π€π π π€π€ . The first row (the explanatory variable) is always the βπ₯β one. For βcomment on reliability of estimateβ questions, always one of: ! Reliable (1) because inside the range of the data/interpolating (1) Unreliable (1) because outside the range of the data/extrapolating (1). Reliable (1) because just outside the range of the data (1). ? ? ?
9
Exercises On provided sheet. Answers on next slides. ? ? ? ? ?
(Note that Q7 and 8 uses βcodingβ. We will cover this next lesson) Help with wordy questions: βExplain why this diagram would support the fitting of a regression line of π¦ onto π₯.β The variables have a linear relationship, i.e. the points are close to the implied straight line of best fit. βInterpret the gradient/slope of the line/interpret πβ As (x) increases by 1, (y) increases/decreases by ___. βInterpret the y-intercept/interpret πβ The value (y) takes when (x) is 0. βWhich is the explanatory variable? Explain your answer.β (x) is the explanatory variable because (x) influences (y) Explain method of least squares. "We minimise the square of the residuals" (draw a diagram) ? ? ? ? ?
10
Exercises ? ? ? ?
11
Exercises ? ? ? ? ? ?
12
Exercises ? ? ? ? ? ?
13
Exercises ? ? ? ? ?
14
Exercises ? ? ? ? ?
15
Exercises ? ? ? ? ? ?
16
Coding Weβve previously considered how coding affects a means, variances and the PMCC. So how do they affect the regression line? Eight samples of carbon steel were produced with different percentages, π of carbon in them. Each sample was heated in a furnace until it melted and the temperature, π in Β°C, at which it melted was recorded. The results were coded such that π₯=10π and π¦= πβ Suppose that we found the regression line of π¦ on π₯ was π¦=36.216β4.048π₯. Then what is the regression line in terms of the original variables π and π? ? Just replace the variables using the substitution and rearrange. Thatβs it! πβ700 5 =36.216β π π=881.08β202.4π
17
More Examples The length π₯ and height π¦ of an Ewok was coded using π=π₯β30 and π=2π¦+11. If the equation of the regression line of π on π is: π=β3+20π what is the equation of the regression line of π¦ on π₯? ? ππ+ππ=ππ πβππ βπ π=βπππ+πππ The maths mark π₯ and English mark π¦ of some stormtroopers is coded using π= π₯ 2 and π=π¦β10. If the equation of the regression line of π on π is: π=4+5π What is the equation of the regression line of π¦ on π₯? ? πβππ=π π π +π π=ππ+π.ππ
18
Exercises (continued)
? ?
19
Exercises ? ? ?
20
Just For Funβ¦
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.