Download presentation
Presentation is loading. Please wait.
1
Scientific Practice Regression
2
Where We Are/Where We Are Going
We have looked at how correlation shows how two things might be associated eg between arm length and leg length no causality implied ie leg length is not responsible for arm length! The correlation coefficient, r, assumes a linear ‘fit’ between the data and reflects how far away from that ‘line of best’ fit the data lie Regression takes this one step further describes fit mathematically generally implies a causal link; eg increased BP causes increased mortality
3
Independent/Dependent Variables
As a causal link is implied, then… the independent variable is the thing doing the influencing the dependent variable is the one being influenced By convention, if we were to plot this graphically… the x-axis represents the independent variable the y-axis represents the dependent variable eg… blood pressure on the x-axis cardiovascular mortality on the y-axis
4
The Equation of the Straight Line
Any linear relationship can be described as… y = mx + c y can be calculated (predicted) for any x using… c, the intercept where the line crosses the y-axis when x=0 m, the slope of the line change in y per x non-zero if relationship
5
The Line of Best Fit For a given set of data, linear regression derives the straight line equation that best describes the data by minimising the overall distance of data points to that straight line
6
The Power of Linear Regression
Eg, data from a practical class looked at relationship between latency of the Achilles Heel stretch reflex (ms) and height (cm) the taller the person, the longer the nerve pathway mediating the reflex
7
The Power of Linear Regression
The class results (n=85)
8
The Power of Linear Regression
The line of best fit : y = x the non-zero slope suggests a relationship
9
Testing the Line of Best Fit
But even if we used random data, the line of best fit would have a non-zero slope! so how do we know if the slope is significantly different to zero? The reported slope is the best estimate based on data that doesn’t perfectly fit a straight line when we use a collection of data to estimate a single value, then that estimate comes with a standard error eg data mean +/- SE of the mean in our case data slope +/- SE of the slope we can predict the 95% CI of the slope! does the 95% CI encompass zero?
10
The Power of Linear Regression
In reality, Minitab will do the significance calculation for us a t-test on the slope and its SE Null Hypo is that slope is zero The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% p < 0.05, so the slope is sig diff to zero reflex latency increases with height (0.134 ms/cm)
11
The Power of Linear Regression
The intercept also reported as a non-zero value is it significantly different to zero? (the Null Hypo) a t-test on the intercept and its SE The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% p > 0.05, so the intercept not sig diff to zero predictive equation is latency=0.134 height
12
The Power of Linear Regression
The analysis also yields r-squared this is the square of the correlation coefficient proportion of variation in y-axis variable that can be explained by variation in x-axis variable The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% at 10%, this is very low but it is still highly significant!
13
R-squared and Significance
Just because something has a low r-squared does not mean it is not significant means it has a low predictive power Eg the more clothes worn, the heavier is a person’s weight clothes significantly influence weight But it can only account for a small amount of variation in weight that we see ie r-squared is small
14
Summary Linear regression extends correlation by reporting the mathematical ‘line of best fit’ y = mx + c The slope needs to be tested statistically to ‘prove’ the relationship is ‘real’ eg the Null Hypo is that m = 0 The intercept should also be tested to see if it is non-zero The equation of the ‘line of best fit’ is predictive ie given a value of x, you can predict y the usefulness of this depends on r-squared proportion of variation in y ‘explained’ by x (0-100%)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.