Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com S1: Chapter 7 Regression Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com.

Slides:



Advertisements
Similar presentations
C1: Chapter 5 Coordinate Geometry Dr J Frost Last modified: 30 th September 2013.
Advertisements

Warm Up To qualify for security officers’ training, recruits are tested for stress tolerance. The scores are normally distributed, with a mean of 62 and.
Chapter 3 Bivariate Data
Blue Day – 1/8/2014 Gold Day – 1/9/2014.  Finish domain and Range worksheet from last class.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Statistics 350 Lecture 23. Today Today: Exam next day Good Chapter 7 questions: 7.1, 7.2, 7.3, 7.28, 7.29.
Statistics 350 Lecture 1. Today Course outline Stuff Section
Statistics 303 Chapter 10 Least Squares Regression Analysis.
Forecasting Outside the Range of the Explanatory Variable: Chapter
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
CHAPTER 5 REGRESSION Discovering Statistics Using SPSS.
Lesson Nonlinear Regression: Transformations.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Objectives (BPS chapter 5)
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Relationship of two variables
Independent vs. Dependent Variables What is the difference?
Graphing in Science. Types of Charts  Most scientific graphs are made as line graphs.  However, occasionally bar graphs, pie charts, or scatter plots.
Review for Exam 2 (Ch.6,7,8,12) Ch. 6 Sampling Distribution
Body Parts – Lab Write-up A hiker was walking in the woods and found a 54 cm fibula bone buried in the ground Forensic investigators were called to the.
New Seats – Block 1. New Seats – Block 2 Warm-up with Scatterplot Notes 1) 2) 3) 4) 5)
S1: Chapter 6 Correlation Dr J Frost Last modified: 21 st November 2013.
S1: Chapters 2-3 Data: Location and Spread Dr J Frost Last modified: 5 th September 2014.
Least Squares Regression: y on x © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Year 8 Scatter Diagrams Dr J Frost Last modified: 24 th November 2013 Objectives: Understand the purpose of a scatter diagram,
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Regression.
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. DosageHeart rate
C4 Chapter 1: Partial Fractions Dr J Frost Last modified: 30 th August 2015.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
GCSE: Further Simultaneous Equations Dr J Frost Last modified: 31 st August 2015.
Section 9.4 Multiple Regression. Section 9.4 Objectives Use a multiple regression equation to predict y-values.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 7 Linear Regression. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable.
Method 3: Least squares regression. Another method for finding the equation of a straight line which is fitted to data is known as the method of least-squares.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
REGRESSION Stats 1 with Liz. AIMS By the end of the lesson, you should be able to… o Understand the method of least squares to find a regression line.
GCSE: Solids, Plans and Elevation Dr J Frost Last modified: 19 th January 2014.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
4.2 – Linear Regression and the Coefficient of Determination Sometimes we will need an exact equation for the line of best fit. Vocabulary Least-Squares.
GCSE: Transformations of Functions Dr J Frost Last updated: 31 st August 2015.
Dr J Frost KS3: Straight Lines Dr J Frost Last.
Chapter 1 Connections to Algebra Review
Math 2: Unit 6 Day 3 How do we use curve fitting?
LEAST – SQUARES REGRESSION
Year 7 Brackets Dr J Frost
GCSE: Quadratic Simultaneous Equations
Chapter 5 LSRL.
S2 Chapter 6: Populations and Samples
Chapter 5 STATISTICS (PART 4).
Chapter 3.2 LSRL.
S1 :: Chapter 6 Correlation
Stats1 Chapter 4 :: Correlation
Least Squares Regression Line LSRL Chapter 7-continued
Lesson 5.7 Predict with Linear Models The Zeros of a Function
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Day 39 Making predictions
Multiple Linear Regression
Correlation and Regression
CALCULATING EQUATION OF LEAST SQUARES REGRESSION LINE
C1 Discriminants (for Year 11s)
Least Squares Regression Chapter 3.2
Year 7 Brackets Dr J Frost
Presentation transcript:

Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com S1: Chapter 7 Regression Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com Last modified: 22nd January 2016

What is regression? Exam mark (𝑦) 𝑦=20+3𝑥 Time spent revising (𝑥) I record people’s exam marks as well as the time they spent revision. I want to predict how well someone will do based on the time they spent revision. How would I do this? What we’ve done here is come up with a model to explain the data, i.e. a line 𝒚=𝒂+𝒃𝒙. We’ve then tried to set 𝒂 and 𝒃 such that the resulting 𝒚 value matches the actual exam marks as close as possible. The ‘regression’ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data.

What is regression? Rabbit population (𝑦) Time (𝑥) In this chapter we only cover linear regression, where our chosen model is a straight line. But in general we could use any model that might best explain the data. Population tends to grow exponentially rather than linearly, so we might make our model 𝑦=𝑎× 𝑏 𝑥 and then try to use regression to work out the best 𝑎 and 𝑏 to use.

Explanatory and Response Variables Exam mark (𝑦) Time spent revising (𝑥) ! An independent (or explanatory) variable is one that is set independently of other variables. It goes on the x-axis. ! A dependent (or response) variable is one whose values are determined by the values of the independent variable. It goes on the y-axis.

So how do we numerically find the line of best fit? 𝑦 The residuals are the errors between the 𝑦 value predicted by the model and the y value of each data point. 𝑒 1 𝑒 2 𝑒 3 𝑒 4 𝑒 5 𝑒 6 𝑒 7 𝑥 We minimise the total of the squares of the residuals. Σ 𝑒 𝑖 2 Why squared? This is known as a least squares regression line.

So how do we numerically find the line of best fit? Notice that in regression, we write the terms in ascending powers of 𝑥, contrary to algebraic convention. Hence 𝑎 is the 𝑦-intercept, not the gradient. 𝑦 𝑒 1 𝑒 2 𝑒 3 𝑒 4 𝑒 5 𝑒 6 𝑒 7 𝒚=𝒂+𝒃𝒙 The mean of x and y is on the line, i.e. 𝑦 =𝑎+𝑏 𝑥 . Hence this gives us 𝑎. To remember the gradient, I think chromosomes of men and women. Men come out top! 𝑥 It turns out (using differentiation techniques you’ll see in C2) that the 𝑎 and 𝑏 we use to minimise the total (squared) error is: 𝒃= 𝑺 𝒙𝒚 𝑺 𝒙𝒙 𝒂= 𝒚 −𝒃 𝒙

Example Mass, 𝒙 (kg) 20 40 60 80 100 Length, 𝒚 (cm) 48 55.1 56.3 61.2 68 Calculate 𝑆 𝑥𝑥 and 𝑆 𝑦𝑦 (You may use that 𝛴𝑥=300, 𝛴 𝑥 2 =22 000, 𝑥 =60, 𝛴𝑥𝑦=18 238, 𝛴 𝑦 2 =16 879.14, 𝛴𝑦=288.6, 𝑦 =57.72) 𝑆 𝑥𝑥 =4000 𝑆 𝑥𝑦 =922 b) Calculate the regression line of 𝑦 on 𝑥. 𝑏=0.2305 𝑎=43.89 𝑆𝑜 𝑦=43.89+0.2305𝑥 ? ? ? ? ? 𝒃= 𝑺 𝒙𝒚 𝑺 𝒙𝒙 𝒂= 𝒚 −𝒃 𝒙 Broculator Tip: Your calculator will calculate 𝑎 and 𝑏 while in STATS mode (under the Reg menu)

Test Your Understanding May 2009 Q5 Note that once finding 𝑎 and 𝑏, you still need to write the equation at the end for the final mark! A common error is to do 𝑆 𝑤𝑙 𝑆 𝑤𝑤 . The first row (the explanatory variable) is always the ‘𝑥’ one. For ‘comment on reliability of estimate’ questions, always one of: ! Reliable (1) because inside the range of the data/interpolating (1) Unreliable (1) because outside the range of the data/extrapolating (1). Reliable (1) because just outside the range of the data (1). ? ? ?

Exercises On provided sheet. Answers on next slides. ? ? ? ? ? (Note that Q7 and 8 uses ‘coding’. We will cover this next lesson) Help with wordy questions: “Explain why this diagram would support the fitting of a regression line of 𝑦 onto 𝑥.” The variables have a linear relationship, i.e. the points are close to the implied straight line of best fit. “Interpret the gradient/slope of the line/interpret 𝑏” As (x) increases by 1, (y) increases/decreases by ___. “Interpret the y-intercept/interpret 𝑎” The value (y) takes when (x) is 0. “Which is the explanatory variable? Explain your answer.” (x) is the explanatory variable because (x) influences (y) Explain method of least squares. "We minimise the square of the residuals" (draw a diagram) ? ? ? ? ?

Exercises ? ? ? ?

Exercises ? ? ? ? ? ?

Exercises ? ? ? ? ? ?

Exercises ? ? ? ? ?

Exercises ? ? ? ? ?

Exercises ? ? ? ? ? ?

Coding We’ve previously considered how coding affects a means, variances and the PMCC. So how do they affect the regression line? Eight samples of carbon steel were produced with different percentages, 𝑐 of carbon in them. Each sample was heated in a furnace until it melted and the temperature, 𝑚 in °C, at which it melted was recorded. The results were coded such that 𝑥=10𝑐 and 𝑦= 𝑚−700 5 . Suppose that we found the regression line of 𝑦 on 𝑥 was 𝑦=36.216−4.048𝑥. Then what is the regression line in terms of the original variables 𝑐 and 𝑚? ? Just replace the variables using the substitution and rearrange. That’s it! 𝑚−700 5 =36.216−4.048 10𝑐 𝑚=881.08−202.4𝑐

More Examples The length 𝑥 and height 𝑦 of an Ewok was coded using 𝑞=𝑥−30 and 𝑟=2𝑦+11. If the equation of the regression line of 𝑟 on 𝑞 is: 𝑟=−3+20𝑞 what is the equation of the regression line of 𝑦 on 𝑥? ? 𝟐𝒚+𝟏𝟏=𝟐𝟎 𝒙−𝟑𝟎 −𝟑 𝒚=−𝟑𝟎𝟕+𝟏𝟎𝒙 The maths mark 𝑥 and English mark 𝑦 of some stormtroopers is coded using 𝑎= 𝑥 2 and 𝑏=𝑦−10. If the equation of the regression line of 𝑏 on 𝑎 is: 𝑏=4+5𝑎 What is the equation of the regression line of 𝑦 on 𝑥? ? 𝒚−𝟏𝟎=𝟓 𝒙 𝟐 +𝟒 𝒚=𝟏𝟒+𝟐.𝟓𝒙

Exercises (continued) ? ?

Exercises ? ? ?

Just For Fun…