Stats1 Chapter 4 :: Correlation

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Business Statistics - QBM117 Least squares regression.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
Linear Regression and Correlation
Correlation and Regression David Young Department of Statistics and Modelling Science, University of Strathclyde Royal Hospital for Sick Children, Yorkhill.
Covariance and correlation
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 6 & 7 Linear Regression & Correlation
Quantitative Skills 1: Graphing
Year 8 Scatter Diagrams Dr J Frost Last modified: 24 th November 2013 Objectives: Understand the purpose of a scatter diagram,
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Regression David Young & Louise Kelly Department of Mathematics and Statistics, University of Strathclyde Royal Hospital for Sick Children, Yorkhill NHS.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Copyright © Cengage Learning. All rights reserved.
GS/PPAL Section N Research Methods and Information Systems
Inference for Regression
GCSE: Scatter Diagrams
Regression and Correlation
Predictions 3.9 Bivariate Data.
Lesson 4.5 Topic/ Objective: To use residuals to determine how well lines of fit model data. To use linear regression to find lines of best fit. To distinguish.
Elementary Statistics
Correlation & Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Sections Review.
LSRL Least Squares Regression Line
Chapter 5 STATISTICS (PART 4).
Chapter 4 Correlation.
Simple Linear Regression
7.3 Best-Fit Lines and Prediction
Understanding Standards Event Higher Statistics Award
Unit 4 EOCT Review.
Section 13.7 Linear Correlation and Regression
S1 :: Chapter 6 Correlation
CHAPTER 10 Correlation and Regression (Objectives)
Simple Linear Regression
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
CHAPTER 26: Inference for Regression
Chapter 10 Correlation and Regression
Regression.
Lecture Notes The Relation between Two Variables Q Q
7.3 Best-Fit Lines and Prediction
P2 Chapter 8 :: Parametric Equations
Bivariate Data credits.
Objectives (IPS Chapter 2.3)
Correlation and Regression
CHAPTER 12 More About Regression
Product moment correlation
Further Stats 1 Chapter 5 :: Central Limit Theorem
Lesson 2.2 Linear Regression.
Descriptive Statistics Univariate Data
9/27/ A Least-Squares Regression.
Correlation & Regression
Stats1 Chapter 7 :: Hypothesis Testing
BEC 30325: MANAGERIAL ECONOMICS
Honors Statistics Review Chapters 7 & 8
Stats Yr2 Chapter 1 :: Regression, Correlation & Hypothesis Tests
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Stats1 Chapter 4 :: Correlation jfrost@tiffin.kingston.sch.uk www.drfrostmaths.com @DrFrostMaths Last modified: 3rd March 2018

Use of DrFrostMaths for practice Register for free at: www.drfrostmaths.com/homework Practise questions by chapter, including past paper Edexcel questions and extension questions (e.g. MAT). Teachers: you can create student accounts (or students can register themselves).

Experimental Theoretical Chp2: Measures of Location/Spread i.e. Dealing with collected data. Chp2: Measures of Location/Spread Chp3: Representation of Data Chp1: Data Collection Statistics used to summarise data, including mean, standard deviation, quartiles, percentiles. Use of linear interpolation for estimating medians/quartiles. Producing and interpreting visual representations of data, including box plots and histograms. Methods of sampling, types of data, and populations vs samples. Chp4: Correlation Measuring how related two variables are, and using linear regression to predict values. Theoretical Deal with probabilities and modelling to make inferences about what we ‘expect’ to see or make predictions, often using this to reason about/contrast with experimentally collected data. Chp5: Probability Chp6: Statistical Distributions Chp7: Hypothesis Testing Venn Diagrams, mutually exclusive + independent events, tree diagrams. Common distributions used to easily find probabilities under certain modelling conditions, e.g. binomial distribution. Determining how likely observed data would have happened ‘by chance’, and making subsequent deductions.

This Chapter Overview Previously we have only considered one variable at a time. When we introduce a second variable (e.g. height with age), we might want to consider the relationship between them. This is a short chapter! Hourly pay at 25 “Describe the type of correlation.” 20 15 5 Hourly Pay (£) “The daily mean windspeed, 𝑤 knots, and the daily maximum gust, 𝑔 knots, were recorded. The equation of the regression line of 𝑔 on 𝑤 for these 15 days is 𝑔=7.23+1.82𝑤. Given an interpretation of the value of the gradient of this regression line. Justify the use of a linear regression line in this instance.” 15 20 25 Age (years) Changes since the old ‘S1’ syllabus: This chapter has been scaled back significantly since the S1 ‘Correlation’ and ‘Regression’ chapters. You no longer need to determine the equation of the line of best fit (the regression line), or calculate measures of correlation, but merely have to interpret an equation already given or the limitations of estimates made or comment on the suitability of a linear regression model.

Recap of correlation ? ? ? ? ? ? ? Weak negative correlation Correlation gives the strength of the relationship (and the type of relationship) between two variables. Data with two variables is known as bivariate data. Weak negative correlation ? ? Type of correlation: Weak positive correlation ? ? strength type No correlation ? ? Strong positive correlation ? The vertical-axis variable usually depends on the horizontal-axis value. For this reason distance would be the independent/explanatory variable and cost the dependent/response variable.

Important correlation concepts Important Point 1 To interpret the correlation between two variables is to give a worded description in the context of the problem. State the correlation shown. Describe/interpret the relationship between age and weekly time on the internet. Negative correlation. As age increases, the weekly time on the internet tends to decrease. ? ? Important Point 2 [Textbook] Two variables have a causal relationship if a change in one variable directly causes a change in the other. Just because two variables show correlation it does not necessarily mean that they have a causal relationship. Hourly pay at 25 20 15 5 Hideko was interested to see if there was a relationship between what people earn and the age which they left education or training. She says her data supports the conclusion that more education causes people to earn a lower hourly rate of pay. Give one reason why Hideko’s conclusion might not be valid. Hourly Pay (£) 15 20 25 “Respondents who left education later would have significantly less work experience than those who left education earlier. This could be the cause of the reduced income shown in her results.” ? Age (years)

Exercise 4A Pearson Statistics/Mechanics Year 1/AS Pages 61-62

What is regression? Exam mark (𝑦) 𝑦=20+3𝑥 Time spent revising (𝑥) What we’ve done here is come up with a model to explain the data, in this case, a line 𝑦=𝑎+𝑏𝑥. We’ve then tried to set 𝑎 and 𝑏 such that the resulting 𝑦 value matches the actual exam marks as closely as possible. The ‘regression’ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data. 𝑑 1 𝑑 2 𝑑 3 𝑑 4 𝑑 5 𝑑 6 𝑑 7 Exam mark (𝑦) 𝑦=20+3𝑥 One type of line of best fit is the least squares regression line. This minimises the sum of the square of these ‘errors’, i.e. 𝑑 1 2 + 𝑑 2 2 +…=Σ 𝑑 𝑖 2 Part of the reason we square these errors is so that each distance is treated as a positive value. Unlike in the old S1, you are no longer required to work out the equation of the least squares regression line yourself; you will be given the equation. Time spent revising (𝑥) I record people’s exam marks as well as the time they spent revising. I want to predict how well someone will do based on the time they spent revising. How would I do this?

What is regression? Linear regression line not a good fit for the data. 𝑦=𝑎+𝑏𝑥 Rabbit population (𝑦) Exponential line much better fit. 𝑦=𝑎 𝑏 𝑥 Time (𝑥) In this chapter we only cover linear regression, where our chosen model is a straight line. But in general we could use any model that might best explain the data. Population tends to grow exponentially rather than linearly, so we might make our model 𝑦=𝑎× 𝑏 𝑥 and then try to use regression to work out the best 𝑎 and 𝑏 to use. You will do exponential regression in Chapter 14 of Pure Year 1.

Interpreting 𝑎 and 𝑏. 𝑦=20+3𝑥 ? ? Exam mark (𝑦) Time spent revising in hours (𝑥) How do we interpret the gradient of 3? For each extra hour spent revising, 3 marks are gained. (i.e. the gradient tells you the change in 𝑦 for each unit increase in 𝑥) ? How do we interpret the 𝑦-intercept of 20? 20 marks would be obtained turning up to the exam with no revision. (i.e. the value of 𝑦 we get when 𝑥=0) ?

Example From the large data set, the daily mean windspeed, 𝑤 knots, and the daily maximum gust, 𝑔 knots, were recorded for the first 15 days in May in Camborne in 2015. The data was plotted on a scatter diagram. 𝒘 14 13 9 18 7 15 10 11 8 𝒈 33 37 29 23 43 38 17 30 28 21 20 © Met Office Describe the correlation between daily mean windspeed and daily maximum gust. The equation of the regression line of 𝑔 on 𝑤 for these 15 days is 𝑔=7.23+1.82𝑤 (b) Give an interpretation of the value of the gradient of this regression line. (c) Justify the use of a linear regression line in this instance. 50 40 30 20 10 Daily maximum gust, 𝑔 (knots) 0 5 10 15 20 Daily mean windspeed, 𝑤 (knots) ? There is a strong positive correlation between daily mean windspeed and daily maximum gust. If the daily mean windspeed increases by 10 knots the daily maximum gust increases by approximately 18 knots. The correlation suggests that there is a linear relationship between 𝑔 and 𝑤 so a linear regression line is a suitable model. a b ? The stronger the (linear) correlation, the more suitable a linear regression line is. c ?

Interpolating and Extrapolating You should only use the regression line to make predictions for values of the dependent variable that are within the range of the given data. Estimating a value inside the data range is known as interpolating. Estimating a value outside the data range is known as extrapolating (as per the cartoon on the left!) xkcd.com [Textbook] The head circumference, 𝑦 cm, and gestation period, 𝑥 weeks, for a random sample of eight newborn babies at a clinic are recorded. The scatter graph shows the results. The equation of the regression line of 𝑦 on 𝑥 is 𝑦=8.91+0.624𝑥. The regression equation is used to estimate the head circumference of a baby born at 39 weeks and a baby born at 30 weeks. Comment on the reliability of these estimates. A nurse wants to estimate the gestation period for a baby born with a head circumference of 31.6cm. (b) Explain why the regression equation given above is not suitable for this estimate. 36 34 32 30 28 Head circumference (cm) 30 32 34 36 38 40 42x Gestation period (weeks) The prediction for 39 weeks is within the range of the data so is more likely to be correct. The prediction for 30 weeks is outside the range of the data so is less likely to be accurate. The independent variable in this model is the gestation period, 𝑥. You should not use this model to predict a value of 𝑥 for a given value of 𝑦. a ? b ?

Exercise 4B Pearson Statistics/Mechanics Year 1/AS Pages 65-66 𝒘 14 13 9 18 7 15 10 11 8 𝒈 33 37 29 23 43 38 17 30 28 21 20 “Use of Technology” Monkey says: Dr Frost’s ‘Interactive Guide to the Classwiz’ has a guide to determine the equation of the regression line (even though you are not required to do so for the exam). Try verifying the regression line of 𝑔 on 𝑤 for the data above has equation 𝑔=7.23+1.82𝑤 . http://www.drfrostmaths.com/resources/resource.php?rid=262