Scientific Practice Regression.

Scientific Practice Regression

Where We Are/Where We Are Going
We have looked at how correlation shows how two things might be associated eg between arm length and leg length no causality implied ie leg length is not responsible for arm length! The correlation coefficient, r, assumes a linear ‘fit’ between the data and reflects how far away from that ‘line of best’ fit the data lie Regression takes this one step further describes fit mathematically generally implies a causal link; eg increased BP causes increased mortality

Independent/Dependent Variables
As a causal link is implied, then… the independent variable is the thing doing the influencing the dependent variable is the one being influenced By convention, if we were to plot this graphically… the x-axis represents the independent variable the y-axis represents the dependent variable eg… blood pressure on the x-axis cardiovascular mortality on the y-axis

The Equation of the Straight Line
Any linear relationship can be described as… y = mx + c y can be calculated (predicted) for any x using… c, the intercept where the line crosses the y-axis when x=0 m, the slope of the line change in y per x non-zero if relationship

The Line of Best Fit For a given set of data, linear regression derives the straight line equation that best describes the data by minimising the overall distance of data points to that straight line

The Power of Linear Regression
Eg, data from a practical class looked at relationship between latency of the Achilles Heel stretch reflex (ms) and height (cm) the taller the person, the longer the nerve pathway mediating the reflex

The class results (n=85)

The line of best fit : y = x the non-zero slope suggests a relationship

Testing the Line of Best Fit
But even if we used random data, the line of best fit would have a non-zero slope! so how do we know if the slope is significantly different to zero? The reported slope is the best estimate based on data that doesn’t perfectly fit a straight line when we use a collection of data to estimate a single value, then that estimate comes with a standard error eg data  mean +/- SE of the mean in our case data  slope +/- SE of the slope we can predict the 95% CI of the slope! does the 95% CI encompass zero?

In reality, Minitab will do the significance calculation for us a t-test on the slope and its SE Null Hypo is that slope is zero The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% p < 0.05, so the slope is sig diff to zero reflex latency increases with height (0.134 ms/cm)

The intercept also reported as a non-zero value is it significantly different to zero? (the Null Hypo) a t-test on the intercept and its SE The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% p > 0.05, so the intercept not sig diff to zero predictive equation is latency=0.134 height

The analysis also yields r-squared this is the square of the correlation coefficient proportion of variation in y-axis variable that can be explained by variation in x-axis variable The regression equation is latency = height Predictor Coef SE Coef T P Constant height S = R-Sq = 11.1% R-Sq(adj) = 10.1% at 10%, this is very low but it is still highly significant!

R-squared and Significance
Just because something has a low r-squared does not mean it is not significant means it has a low predictive power Eg the more clothes worn, the heavier is a person’s weight clothes significantly influence weight But it can only account for a small amount of variation in weight that we see ie r-squared is small

Summary Linear regression extends correlation by reporting the mathematical ‘line of best fit’ y = mx + c The slope needs to be tested statistically to ‘prove’ the relationship is ‘real’ eg the Null Hypo is that m = 0 The intercept should also be tested to see if it is non-zero The equation of the ‘line of best fit’ is predictive ie given a value of x, you can predict y the usefulness of this depends on r-squared proportion of variation in y ‘explained’ by x (0-100%)

Scientific Practice Regression.

Similar presentations

Presentation on theme: "Scientific Practice Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scientific Practice Regression.

Similar presentations

Presentation on theme: "Scientific Practice Regression."— Presentation transcript:

Similar presentations

About project

Feedback