Download presentation
Presentation is loading. Please wait.
Published byEvangeline Collins Modified over 6 years ago
1
How regression works The Right Questions about Statistics:
Maths Learning Centre The University of Adelaide Regression is a method designed to create a FORMULA that uses some information to PREDICT/EXPLAIN an outcome, using DATA. You calculate the formula that most closely matches your data.
2
Regression begins with a research question about a numerical outcome ...
... you’re interested in exactly how one or more things affect that outcome. Variable NUMERICAL Variable NUMERICAL “OUTCOME VARIABLE” “RESPONSE VARIABLE” “PREDICTOR VARIABLES” “DEPENDENT VARIABLE” “EXPLANATORY VARIABLES” Variable NUMERICAL “CRITERION VARIABLE” “INDEPENDENT VARIABLES”
3
For example, you might be interested in how a person’s body temperature is affected by the number of grams of chilli in a meal. Chilli (g) NUMERICAL Temp (°C) NUMERICAL
4
What the regression will produce is a FORMULA that will let you calculate the outcome based on the explanatories (the formula is also called a MODEL). Y = β0 + β1X1 + β2X2 Variable NUMERICAL X1 Variable NUMERICAL Y X2 Variable NUMERICAL
5
For example, the formula might look like this:
temp = (chilli) Chilli (g) NUMERICAL Temp (°C) NUMERICAL The process of regression finds the numbers in this formula so that it gives answers closest to the actual data.
6
What sort of relationship might be there?
A “DESCRIBE” question! The shape will tell you what sort of formula you ought to use. “SCATTERPLOT”
7
LINEAR Y = β0 + β1X
8
EXPONENTIAL Y = αeβX
9
LOGARITHMIC Y = β0 + β1ln(X)
10
The easiest one to work with is LINEAR regression because the formula is simplest
Y = β0 + β1X
11
Ignores how steep the slope is –
With LINEAR relationships, to DESCRIBE how strong this relationship is, you can calculate the CORRELATION (r). - 1.0 - 0.5 0.5 1.0 r = -1 r = 0 r = 1 Ignores how steep the slope is – Just tells you how close to a line the points are.
12
Have a “what’s the formula?” question.
The process so far... Have a “what’s the formula?” question. Look at the pattern – usually with scatterplots – to help choose a formula. Variable NUMERICAL Variable NUMERICAL
13
The next step is to find the numbers in the formula itself.
The basic idea is to pick the numbers that will produce a line closest to your data. “LINE OF BEST FIT”
14
The next step is to find the numbers in the formula itself.
Y = β0 + β1X1 + β2X2 + β2X2 “INTERCEPT” “COEFFICIENTS” “CONSTANT” “SLOPES” There’s some complicated-looking equations to figure out what these are, based on calculus and matrix algebra... BUT the computer program will do all that for you.
15
temp = 36.06 + 0.45(chilli) What the computer will produce in Excel:
Formula numbers Original data Regression output temp = (chilli)
16
temp = 36.06 + 0.45(chilli) What the formula means:
How much temperature changes on average for a change of 1 g of chilli. 1g extra of chilli puts your temperature up by 0.45 degrees on average. Does getting this number in my data mean that chilli really does affect temperature?
17
temp = 36.06 + 0.45(chilli) Is there really a relationship?
A “DECIDE” question! temp = (chilli) If no relationship, then this number would be most likely to be zero. Assuming some things, we can calculate a test statistic that comes from a t-distribution, and find a p-value. P-value = “SIGNIFICANT EFFECT”
18
Is there really a relationship (for multiple regression)?
Y = X1 – 0.24X2 If no relationship at all, then both of these numbers would be most likely to be zero. Assuming some things, we can calculate a test statistic that comes from an F distribution, and find a p-value. P-value = “SIGNIFICANT RELATIONSHIP”
19
Y = 18.3 + 3.0X1 – 0.24X2 Is there really a relationship with X1?
If no relationship with X1, then this number would be most likely to be zero. P-value = “SIGNIFICANT EFFECT”
20
Y = 18.3 + 3.0X1 – 0.24X2 Is there really a relationship with X2?
If no relationship with X2, then this number would be most likely to be zero. P-value = 0.26 “NOT SIGNIFICANT EFFECT” At this stage you would normally remove the X2 from the formula and do a new regression
21
temp = 36.06 + 0.45(chilli) How big could the effect be?
An “ESTIMATE” question! temp = (chilli) We are asking what options for this number are consistent with our data. Assuming some things, we can calculate a confidence interval for this number. 95% CI is from 0.38 to 0.52
22
DISCLAIMER: There are a whole lot of things you need to check in order to make sure your regression is acceptable statistically (especially if you are using p-values or confidence intervals). I have not mentioned any of these today. You will need to look them up in a book like Medical Statistics at a Glance or Intro Stats.
23
So this is how you perform regression:
Have a “what’s the formula?” question. Collect data. Look at the pattern to choose a formula. Get a computer to calculate the numbers and p-values. Check the p-values. Choose your final formula.
24
And this is what regression means:
It tells you a formula for how to calculate an outcome based on other information. It does NOT tell you if some things CAUSE others, only how to calculate them as accurately as possible. The computer output will tell you p-values and confidence intervals to answer other types of questions.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.