Correlation and Regression David Young Department of Statistics and Modelling Science, University of Strathclyde Royal Hospital for Sick Children, Yorkhill.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Correlation and regression Dr. Ghada Abo-Zaid
Chapter 4 The Relation between Two Variables
Simple Linear Regression and Correlation
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
CORRELATON & REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Regression and Correlation
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 3-2 Summarizing Relationships among variables ©
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Correlation and Linear Regression
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
Chapter 6 & 7 Linear Regression & Correlation
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Introduction to Linear Regression
Chapter 10 Correlation and Regression
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Regression David Young & Louise Kelly Department of Mathematics and Statistics, University of Strathclyde Royal Hospital for Sick Children, Yorkhill NHS.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Stats Methods at IC Lecture 3: Regression.
Topic 10 - Linear Regression
Correlation and Simple Linear Regression
Chapter 5 STATISTICS (PART 4).
Understanding Standards Event Higher Statistics Award
Correlation and Simple Linear Regression
Correlation and Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
Scientific Practice Regression.
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Topic 8 Correlation and Regression Analysis
Correlation and Simple Linear Regression
Presentation transcript:

Correlation and Regression David Young Department of Statistics and Modelling Science, University of Strathclyde Royal Hospital for Sick Children, Yorkhill NHS Trust

Relationships between Variables common aim of research is to try to relate a variable of interest to one or more other variables demonstrate an association between variables (not necessarily causal) e.g. dose of drug and resulting systolic blood pressure, advertising expenditure of a company and end of year sales figures, etc. establish a theoretical model to predict the value of one variable from a number of known factors

Dependent and Independent Variables the independent variable is the variable which is under the investigator’s control (denoted x) the dependent variable is the one which the investigator is trying to estimate or predict (denoted y) can a relationship be used to predict what happens to y as x changes (i.e. what happens to the dependent variable as the independent variable changes)?

Exploring Relationships

Systolic Blood Pressure vs Age

Scatter Plots

Correlation correlation is a measure of the degree of linear association between two numerical variables the correlation is said to be positive if ‘large’ values of both variables occur together and negative if ‘large’ values of one variable tend to occur with ‘small’ values of the other the range of possible values is from -1 to +1 the correlation is high if observations lie close to a straight line (i.e. values close to +1 or -1) and low if observations are widely scattered (correlation value close to 0) it does not indicate a causal effect between the variables

Correlation

No Correlation

Correlation the correlation coefficient is computed as –

Interpretation of Correlation there may be no direct connection between highly correlated variables (known as spurious correlation) coincidental correlation – e.g. the price of M&S sandwiches and the size of the hole in the ozone layer over the past 5 years indirect correlation – e.g. high correlation between infant mortality and the extend of overcrowding in a certain town between 1 st and 2 nd WW (depend not on each other but on a third variable i.e. low income level) N.B. low correlation does not necessarily mean a low degree of association (relationship may be non-linear)

Problem 6 A research study has reported a correlation of –0.59 between the eye colour (brown, green, blue) of experimental animals and the amount of nicotine that is fatal to the animal when consumed. This indicates … (a)nicotine is less harmful to one eye colour than others (b) lethal dose decreases as eye colour changes (c) eye colour of animals must always be considered in assessing the effect of nicotine consumption (d) further study required to explain this correlation (e) correlation is not an appropriate measure of association

Problem 7 If the correlation between body weight and annual income were high and positive, we would conclude that … (a)high incomes cause people to eat more food (b)low incomes cause people to eat less food (c)high income people tend to spend a greater proportion of their income on food than low income people, on average (d)high income people tend to be heavier than low income people, on average (e)high incomes cause people to gain weight

Example

Hypothesis Testing the linear relationship between two variables is significant if there is evidence to suggest that  is significantly different from zero for the hypothesis test observed value of r is and associated p-value is Stat >Basic Statistics >Correlation) Conclusion: reject H 0 and conclude that there is evidence to suggest that there is a linear relationship of size on age

Summary correlation (  ) is a measure of the linear relationship between two variables it gives an actual measure of the strength of the linear relationship seen in a scatter plot values range from –1 to +1 the closer  is to ±1, the stronger the linear relationship N.B. high correlation does not indicate a causal effect! correlation is not an appropriate measure for testing the equivalence of two methods

Regression develop an equation to predict the dependent variable from knowledge of predictor variable(s) linear regression fits a straight line to the data general equation of a straight line … where a is the intercept and b is the slope, or gradient, of the line fit this line by eye – subjective method of least squares

Practical Example data from a study of foetal development date of conception (and hence age) of the foetus is known accurately height of the foetus (excluding the legs) is known from ultrasound scan age and length of the foetus are clearly related aim is to model the length and age data and use this to assess whether a foetus of known age is growing at an appropriate rate

Descriptive Statistics Descriptive Statistics: age, length Variable N Mean StDev Minimum Q1 Median Q3 Maximum age length

Graphical Assessment of Data

Linear Regression Model from the plot it would appear that age and length are strongly related, possibly in a linear way a straight line can be expressed mathematically in the form where b is the slope, or gradient of the line, and a is the intercept of the line with the y-axis

Modelling a Straight Line a 1 unit b x y=a+bx y

Fitting a Regression Line if the data lay on a straight line and there was no random variation about that line, it would be simple to draw an approximate straight line on the scatter-plot this is not the case with real data for a given value of the explanatory variable there will be a range of observed values for the response variable different assessors would estimate different regression lines in order to have objective results, it is necessary to define some criteria in order to produce the ‘best fitting straight line’ for a given set of data

Method of Least Squares define the position of the line which, on average, has all the points as close to it as possible the method of least squares finds the straight line which minimises the sum of the squared vertical deviations from the fitted line the best fitting line is called the least squares linear regression line the vertical distances between each point and the fitted line are called the residuals and are used in estimating the variability about the fitted line

Least Squares Line x a y=a+bx y eiei

Parameter Estimates the least squares estimates of a and b are obtained by choosing the values which minimise the sum of squared deviations e i the sum of squared deviations is given by … which is a function of the unknown parameters a and b

Interpretation of Results the regression equation is … this implies that as the age of the foetus increases by one day, the length increases by 0.12mm for a foetus of age 85 days, the estimated length would be a prediction interval gives the range of values between which the value for an individual is likely to lie: (7.01 to 8.08mm)

Model Predictions use the regression model to assess whether a foetus of known age is growing at an appropriate rate for example, consider a foetus of age 85 days does the measured length lie within the normal range i.e. between 7.01 and 8.08mm? if measured length is <7.01mm, there is evidence that the foetus is not growing as it should if measured length if >8.08mm, is the foetus larger than expected? Is the actual age (and due date) wrong?

Uses of Regression Lines the least squares regression line may be used to estimate a value of the dependent variable given a value of the independent variable the value of the independent variable (x) should be within in the range of the given data the predicted value of the dependent variable (y) is only an estimate even though the fit of the regression line is good, it does not prove there is a relationship between the variables outside of the values from the given experiment (use care in predicting)

Coefficient of Determination the R-Sq value of 97.6% in the output is called the coefficient of determination this is a measure of the amount of variability in the data which is explained by the regression line it therefore indicates how well the linear regression model fits the data (and therefore how accurate the predicted values using the model will be) the correlation (r) between foetal size and age is the coefficient of determination is r 2 (i.e ×0.988)

Fitted Line

Exercise Use the calculated least squares linear regression line to estimate the size of a foetus at the following gestation times: (a) 2 days (b) 60 days (c) 100 days (d) 300 days For each of your estimated lengths, state whether or not you believe the estimate to be accurate.

Interpretation why are all the dependent variable observations (size) not the same? depends on the independent variable (age) total variation present is the total variation from the mean line (‘best’ estimate of size if no linear relationship) 97.6% of the variation in size is explained by the linear relationship with the variable age 2.4% of the variation is not explained – could be random (since not every foetus of the same age will be the same size) or caused by another variable (e.g. smoking habits of mother, diet, etc.)