Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com S1: Chapter 7 Regression Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com.

Similar presentations


Presentation on theme: "Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com S1: Chapter 7 Regression Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com."β€” Presentation transcript:

1 Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com
S1: Chapter 7 Regression Dr J Frost Last modified: 22nd January 2016

2 What is regression? Exam mark (𝑦) 𝑦=20+3π‘₯ Time spent revising (π‘₯)
I record people’s exam marks as well as the time they spent revision. I want to predict how well someone will do based on the time they spent revision. How would I do this? What we’ve done here is come up with a model to explain the data, i.e. a line π’š=𝒂+𝒃𝒙. We’ve then tried to set 𝒂 and 𝒃 such that the resulting π’š value matches the actual exam marks as close as possible. The β€˜regression’ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data.

3 What is regression? Rabbit population (𝑦) Time (π‘₯)
In this chapter we only cover linear regression, where our chosen model is a straight line. But in general we could use any model that might best explain the data. Population tends to grow exponentially rather than linearly, so we might make our model 𝑦=π‘ŽΓ— 𝑏 π‘₯ and then try to use regression to work out the best π‘Ž and 𝑏 to use.

4 Explanatory and Response Variables
Exam mark (𝑦) Time spent revising (π‘₯) ! An independent (or explanatory) variable is one that is set independently of other variables. It goes on the x-axis. ! A dependent (or response) variable is one whose values are determined by the values of the independent variable. It goes on the y-axis.

5 So how do we numerically find the line of best fit?
𝑦 The residuals are the errors between the 𝑦 value predicted by the model and the y value of each data point. 𝑒 1 𝑒 2 𝑒 3 𝑒 4 𝑒 5 𝑒 6 𝑒 7 π‘₯ We minimise the total of the squares of the residuals. Ξ£ 𝑒 𝑖 2 Why squared? This is known as a least squares regression line.

6 So how do we numerically find the line of best fit?
Notice that in regression, we write the terms in ascending powers of π‘₯, contrary to algebraic convention. Hence π‘Ž is the 𝑦-intercept, not the gradient. 𝑦 𝑒 1 𝑒 2 𝑒 3 𝑒 4 𝑒 5 𝑒 6 𝑒 7 π’š=𝒂+𝒃𝒙 The mean of x and y is on the line, i.e. 𝑦 =π‘Ž+𝑏 π‘₯ . Hence this gives us π‘Ž. To remember the gradient, I think chromosomes of men and women. Men come out top! π‘₯ It turns out (using differentiation techniques you’ll see in C2) that the π‘Ž and 𝑏 we use to minimise the total (squared) error is: 𝒃= 𝑺 π’™π’š 𝑺 𝒙𝒙 𝒂= π’š βˆ’π’ƒ 𝒙

7 Example Mass, 𝒙 (kg) 20 40 60 80 100 Length, π’š (cm) 48 55.1 56.3 61.2 68 Calculate 𝑆 π‘₯π‘₯ and 𝑆 𝑦𝑦 (You may use that 𝛴π‘₯=300, 𝛴 π‘₯ 2 =22 000, π‘₯ =60, 𝛴π‘₯𝑦=18 238, 𝛴 𝑦 2 = , 𝛴𝑦=288.6, 𝑦 =57.72) 𝑆 π‘₯π‘₯ = 𝑆 π‘₯𝑦 =922 b) Calculate the regression line of 𝑦 on π‘₯. 𝑏= π‘Ž=43.89 π‘†π‘œ 𝑦= π‘₯ ? ? ? ? ? 𝒃= 𝑺 π’™π’š 𝑺 𝒙𝒙 𝒂= π’š βˆ’π’ƒ 𝒙 Broculator Tip: Your calculator will calculate π‘Ž and 𝑏 while in STATS mode (under the Reg menu)

8 Test Your Understanding
May 2009 Q5 Note that once finding π‘Ž and 𝑏, you still need to write the equation at the end for the final mark! A common error is to do 𝑆 𝑀𝑙 𝑆 𝑀𝑀 . The first row (the explanatory variable) is always the β€˜π‘₯’ one. For β€˜comment on reliability of estimate’ questions, always one of: ! Reliable (1) because inside the range of the data/interpolating (1) Unreliable (1) because outside the range of the data/extrapolating (1). Reliable (1) because just outside the range of the data (1). ? ? ?

9 Exercises On provided sheet. Answers on next slides. ? ? ? ? ?
(Note that Q7 and 8 uses β€˜coding’. We will cover this next lesson) Help with wordy questions: β€œExplain why this diagram would support the fitting of a regression line of 𝑦 onto π‘₯.” The variables have a linear relationship, i.e. the points are close to the implied straight line of best fit. β€œInterpret the gradient/slope of the line/interpret 𝑏” As (x) increases by 1, (y) increases/decreases by ___. β€œInterpret the y-intercept/interpret π‘Žβ€ The value (y) takes when (x) is 0. β€œWhich is the explanatory variable? Explain your answer.” (x) is the explanatory variable because (x) influences (y) Explain method of least squares. "We minimise the square of the residuals" (draw a diagram) ? ? ? ? ?

10 Exercises ? ? ? ?

11 Exercises ? ? ? ? ? ?

12 Exercises ? ? ? ? ? ?

13 Exercises ? ? ? ? ?

14 Exercises ? ? ? ? ?

15 Exercises ? ? ? ? ? ?

16 Coding We’ve previously considered how coding affects a means, variances and the PMCC. So how do they affect the regression line? Eight samples of carbon steel were produced with different percentages, 𝑐 of carbon in them. Each sample was heated in a furnace until it melted and the temperature, π‘š in Β°C, at which it melted was recorded. The results were coded such that π‘₯=10𝑐 and 𝑦= π‘šβˆ’ Suppose that we found the regression line of 𝑦 on π‘₯ was 𝑦=36.216βˆ’4.048π‘₯. Then what is the regression line in terms of the original variables 𝑐 and π‘š? ? Just replace the variables using the substitution and rearrange. That’s it! π‘šβˆ’700 5 =36.216βˆ’ 𝑐 π‘š=881.08βˆ’202.4𝑐

17 More Examples The length π‘₯ and height 𝑦 of an Ewok was coded using π‘ž=π‘₯βˆ’30 and π‘Ÿ=2𝑦+11. If the equation of the regression line of π‘Ÿ on π‘ž is: π‘Ÿ=βˆ’3+20π‘ž what is the equation of the regression line of 𝑦 on π‘₯? ? πŸπ’š+𝟏𝟏=𝟐𝟎 π’™βˆ’πŸ‘πŸŽ βˆ’πŸ‘ π’š=βˆ’πŸ‘πŸŽπŸ•+πŸπŸŽπ’™ The maths mark π‘₯ and English mark 𝑦 of some stormtroopers is coded using π‘Ž= π‘₯ 2 and 𝑏=π‘¦βˆ’10. If the equation of the regression line of 𝑏 on π‘Ž is: 𝑏=4+5π‘Ž What is the equation of the regression line of 𝑦 on π‘₯? ? π’šβˆ’πŸπŸŽ=πŸ“ 𝒙 𝟐 +πŸ’ π’š=πŸπŸ’+𝟐.πŸ“π’™

18 Exercises (continued)
? ?

19 Exercises ? ? ?

20 Just For Fun…


Download ppt "Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com S1: Chapter 7 Regression Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com."

Similar presentations


Ads by Google