CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 11 Dr Duncan Borman Linear Regression -Techniques to assess how good the regression is 1) Coefficient of determination 2) Examine residuals 3) Significance testing on the residuals -Non-linear regression – transformations -Multiple linear regression ©Claudio Nunez 2010, sourced from _Building_destroyed_in_Concepci%C3%B3n.jpg?uselang=en-gb Available under creative commons license
Regression analysis Aim: To predict y from x (‘regressing y on x’) y x Residual variation ~N(0,σ 2 )
Dependant variable (y) Independent or Control variable (x) What’s the “best fit” line? “minimises squared residuals”
How do we calculate the values of a and b (the least squares estimators) ? and and are just the means of the sample data x and y
Idling time (s) Emissions (PPM)
How good is the regression? How well does our data fit the straight line we have produced?
How good is the regression? 3 techniques to be aware of – 1) Coefficient of Determination (R 2 ) 2) Examine the residuals 3) Significance tests (value between 0 and 1 (i.e %)
3) Significance testing on regression parameters e.g how well does X predict Y in the model statistical testing on and i.e. Does x explain any of the variability in y? We can perform a hypothesis test to see whether or not a variable X actually explains any of the variability in Y hypothesis test If there was no relationship between X and Y - we would expect the slope of the best fit line to be zero. Take, the gradient of the line (estimated by b) H0:H0:Null hypothesis
H0:H0: Statistics give us a tool for saying what is large and what is small. Technically this is done by converting the measured slope into a t-statistic : Where s b is the equivalent to the standard error when we were forming comparing sample means
H0:H0:Null hypothesis Statistics give us a tool for saying what is large and what is small. Technically this is done by converting the measured slope into a t-statistic : p90 of notes - Excel/SPSS can be made to generate these values automatically. !!!!!!!!!!
H0:H0:Null hypothesis Statistics give us a tool for saying what is large and what is small. Technically this is done by converting the measured slope into a t-statistic : Once we have a t-statistic we can use t-tables to look up a P-value (remembering to double the value from tables for a 2TT) We compare the P-value with the significance level (say 5%) If P>5% KEEP H 0 If P<5% REJECT H 0 This means that the slope of the graph has NOT been proven significantly different from 0. Model is not a good predictor of y. This means that the slope of the graph has IS significantly different from 0. Model is a good predictor of y.
H0:H0: Null hypothesis We can do something very similar for Repeat a similar process p92 of notes Again generate P-values to compare with 1% or 5% significance level
This lets us find with 95% Confidence an interval either side of a and b that we will find and. i.e. if we had taken a different sample of data we would have got a different set of values for a and b - so how good are the ones we’ve found p90 of notes. Confidence intervals for and Also we can find....
Clickers
Regression questions
Don’t be confused: High R 2 Low R 2 High b Low b
Some data… Amplitude of vibrations measured on a bridge support vs number of cars driving across at any one time Graphs>interactive>scatter> Number of cars Amplitude of vibration (mm) ©Terraplanner 2007, sourced from Available under creative commons license
Assumptions of regression The independent (x) variable is measured without error (!) ‘Errors’ in dependent (y) variable are normally distributed Variance in dependent variable is constant Relationship between variables is linear
Assumptions of regression Seldom true… but nearly true when experimental treatments are used. Can (should?) be tested by remeasuring x variable Only a serious problem if errors in x are nearly as large as those in y If so: use other techniques The independent (x) variable is measured without error (!) ‘Errors’ in dependent (y) variable are normally distributed Variance in dependent variable is constant Relationship between variables is linear
Assumptions of regression To check: try a scatterplot OKBAD More formally: can plot residuals, or save them and test for normality The independent (x) variable is measured without error (!) ‘Errors’ in dependent (y) variable are normally distributed Variance in dependent variable is constant Relationship between variables is linear
Assumptions of regression Again: try a scatterplot OKBAD (homoscedastic) (heteroscedastic) There are tests available But generally: your eye will be more sensitive than most tests! The independent (x) variable is measured without error (!) ‘Errors’ in dependent (y) variable are normally distributed Variance in dependent variable is constant Relationship between variables is linear
Assumptions of regression Try a scatterplot (again!) or plot residuals… Residuals OKBAD The independent (x) variable is measured without error (!) ‘Errors’ in dependent (y) variable are normally distributed Variance in dependent variable is constant Relationship between variables is linear
If we suspect that relation between the x and y is NOT linear we can try to apply transforms to x and/or y to see if we can find a relationship
Testing the assumptions Variance: not OK Linearity: OK So let’s log transform y variable… Number of cars Amplitude of vibration (mm) R 2 =0.74
What happens… R 2 =0.54 Residuals: Not OK Variance: OK Linearity: not OK Number of cars Vibrations (mm) ln(y) log transform y variable applied
Transformation affects linearity…. Log (x) Log (y) Log(x), Log(y) BeforeAfter There are lots of other transforms you can try e.g. squaring or cubing x or y or both etc
May need to transform x variable as well… R 2 =0.79 Residuals: OK Variance: OK Linearity: OK Number of cars (ln(x)) Vibrations (mm) (ln(y)) ln(y) = ln(1.88)+1.47ln(x) ln(y) = ln(1.88)+ln(x 1.47 ) ln(y) = ln(1.88x 1.47 ) y= 1.88x 1.47
R 2 =0.82 Residuals: OK Variance: OK Linearity: OK Sample diameter(ln(x)) Failure strength (KN) (ln(y)) ln(y) = ln(2)+1.5ln(x) ln(y) = ln(2)+ln(x 1.5 ) ln(y) = ln(2x 1.5 ) y= 2x 1.5 Another log log graph What is the equation for y in terms of x? ln(y) = 3 + 5x
More than 1 independent (‘predictor’) variable: Multiple Regression e.g. bridge vibrations (z) as a function of number of cars (x) and wind speed (y) Z (vibrations) No. of cars Wind speed
Multiple Regression Fit best Plane (x,y) to explain z - minimise (residual)² just as in linear Conceptually identical to linear model Can similarly use any number of predictor variables - fitting hyperplanes in multidimensional space… z (amplitude of vibrations) x (no. of cars) Model : z = a + b 1 x + b 2 y Note: effects of x & y are both linear, and are additive y (wind speed)
CIVE Engineering Mathematics 2.2 feedback TURN ON 1)Enter Character in <> brackets seen on yellow bar (if it asks for a user ID put a “0” – this make you anonymous) 2)When asked a question you can enter a letter or number and press enter (green button) Use the scale below for A-E to answer the TEST question ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree
CIVE Engineering Mathematics 2.2 Feedback on resources for the whole level 2 Engineering Maths Module (Stats and other maths) ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 1 In general I have found the range of online resources useful for this module useful. (e.g. VLE material, Mathlab, support, links to online resources etc)
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 2 I have found the Mathlab tasks have helped with my understanding of the module (Eng Math and Stats).
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 2b I have found the weekly Examples classes useful for the Statistics part of the module.
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 3 I have (or intend to) make use of the online lecture slides or lecture videos that are posted on the VLE (the ones of the lectures).
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 4 I have made use of some of the other online links to Maths resources that have been linked to from the VLE page.
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 5 I feel the approaches used in the Engineering Maths module (which include working through examples in lecture and using directed out of lecture tasks) has helped to improve my understanding of the maths material.
CIVE Engineering Mathematics 2.2 Module feedback ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 6 It is difficult to read/follow the text added to the slides using the Tablet
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 7 I find it easier to follow mathematical material when it is written during the lecture on the tablet
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 8 The use of the A, B, C, D cards during a lecture is helpful for feeding back understanding of lecture material
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 9 The use of the A, B, C, D cards and can be useful for helping me to engage with a lecture
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 10 The general interactive elements in the lectures help me to engage with the lecture material.
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 11 I feel I understand the majority of the material covered in this module.
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 12 I would have liked this module to have covered more Civil Engineering Maths Examples
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 13 I feel I am confident with my maths ability
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 4 I have found the weekly Examples classes useful for the Statistics part of the module.
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 5 For the Limits and series section of the module I found the Problem Sheets useful for developing my understanding of the material.
CIVE Engineering Mathematics 2.2 Module feedback ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 6 Having material hand written using the Tablet computer was difficult to read.
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 7 I find it easier to follow maths material when it is written during the lecture on the tablet
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 8 I can see the value of using the A, B, C, D cards during the lecture.
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 9 Interactive elements in lectures have helped me to engage with the material
CIVE Engineering Mathematics 2.2 ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 10 I can see the relevance of the mathematical content of this module to my degree course.
CIVE Engineering Mathematics 2.2 Feedback on resources ABCDE Strongly agree AgreeNeither agree or disagree DisagreeStrongly disagree Question 11 I feel I understand the majority of the material covered in this module.
Coursework Due in on Tuesday 16 th March 12 sides maximum Need to submit online (VLE) and hardcopy by 4pm (late penalties apply until both submitted) Major coursework rules apply. Please take care not to plagiarise! –It will be taken very seriously, group work on this major coursework would constitute plagiarism (plagiarism software is a now used as normal practice on all submissions) Final lecture tomorrow is in Computer cluster 504. If you have any questions regarding coursework etc- I will make time to answer them) (examples class continue into next week)
More than one answer is allowed!
Definition of a mutually exclusive event If event A happens, then event B cannot, or vice-versa. The two events "it rained on Tuesday" and "it did not rain on Tuesday" are mutually exclusive events. Independent events The outcome of event A, has no effect on the outcome of event B. Such as "It rained on Tuesday" and "My chair broke at work".
Where S x is the larger
Is F 20,20 < F calculated if no, then NO significant difference
z or t