Download presentation
Presentation is loading. Please wait.
Published byLandon Fraser Modified over 11 years ago
1
Didier Concordet d.concordet@envt.fr
Ecole Nationale Vétérinaire de Toulouse Linear Regression Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
2
An example
3
About the straight line
Y= a + b x Y x a b>0 b<0 b=0 a=0 a = intercept b = slope
4
Questions How to obtain the best straight line ?
Is this straight line the best curve to use ? How to use this straight line ?
5
How to obtain the best straight line ?
Proceed in three main steps write a (statistical) model estimate the parameters graphical inspection of data
6
A statistical model Write a model Mean model :
functionnal relationship Variance model : Assumptions on the residuals
7
Write a model Mean model = residual (error term)
8
Assumptions on the residuals
the xi 's are not random variables they are known with a high precision the ei 's have a constant variance homoscedasticity the ei 's are independent the ei 's are normally distributed normality
9
Homoscedasticity homoscedasticity heteroscedasticity
10
Normality Y x
11
Estimate the parameters
A criterion is needed to estimate parameters A statistical model A criterion
12
How to estimate the "best" a et b ?
Intuitive criterion : minimum compensation Reasonnable criterion : minimum Linear model Homoscedasticity Normality Least squares criterion (L.S.)
13
The least squares criterion
14
Result of optimisation
and change with samples and are random variables
15
Balance sheet True mean straight line Estimated straight line or
Mean predicted value for the ith observation ith residual
16
Example Estimated straight line Dep Var: HPLC N: 18
Effect Coefficient Std Error t P(2 Tail) CONSTANT CONCENT Intercept Estimated straight line Slope
17
Example
18
Example
19
Residual variance by construction but
The residual variance is defined by standard error of estimate
20
Example Dep Var: HPLC N: 18
Multiple R: Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate : 8.282 Effect Coefficient Std Error t P(2 Tail) CONSTANT CONCENT
21
Questions How to obtain the best straight line ?
Is this straight line the best curve to use ? How to use this straight line ?
22
Is this model the best one to use ?
Tools to check the mean model : scatterplot residuals vs fitted values test(s) Tools to check the variance model : scatterplot residuals vs fitted values Probability plot (Pplot)
23
Checking the mean model
scatterplot residuals vs fitted values structure in the residuals change the mean model No structure in the residuals OK
24
Checking the mean model : tests
Two cases No replication Try a polynomial model (quadratic first) Replications Test of lack of fit
25
Without replication try another mean model and test the improvement
Example : If the test on c is significant (c 0) then keep this model Dep Var: HPLC N: 18 Multiple R: Squared multiple R: 0.991 Adjusted squared multiple R: 0.991 Standard error of estimate: 8.539 Effect Coefficient Std Error t P(2 Tail) CONSTANT CONCENT CONCENT *CONCENT
26
With replications Perform a test of lack of fit Principle : compare to
Departure from linearity Pure error Principle : compare to if - > then change the model
27
Test of lack of fit : how to do it ?
Three steps 1) Linear regression 2) One way ANOVA 3) if then change the model
28
Test of lack of fit : example
Three steps 1) Linear regression 2) One way ANOVA Dep Var: HPLC N: 18 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P CONCENT Error 3) if We keep the straight line
29
Checking the variance model : homoscedasticity
scatterplot residuals vs fitted values No structure in the residuals but heteroscedasticity change the model (criterion) homoscedasticity OK
30
What to do with heteroscedasticity ?
scatterplot residuals vs fitted values : modelize the dispersion. The standard deviation of the residuals increases with : it increases with x
31
What to do with heteroscedasticity ?
Estimate again the slope and the intercept but with weights proportionnal to the variance. with and check that the weight residuals (as defined above) are homoscedastic
32
Checking the variance model : normality
Expected value for normal distribution Expected value for normal distribution No curvature : Normality Curvature : non normality is it so important ?
33
What to do with non normality ?
Try to modelize the distribution of residuals In general, it is difficult with few observations If enough observations are available, the non normality does not affect too much the result.
34
An interesting indice R²
R² = square correlation coefficient = % of dispersion of the Yi's explained by the straight line (the model) 0 R² 1 If R² = 1, all the ei = 0, the straight line explain all the variation of the Yi's If R² = 0, the slope is = 0, the straight line does not explain any variation of the Yi's
35
An interesting indice R²
R² and R (correlation coefficient) are not designed to measure linearity ! Example : Multiple R: 0.990 Squared multiple R: 0.980 Adjusted squared multiple R: 0.980
36
Questions How to obtain the best straight line ?
Is this straight line the best curve to use ? How to use this straight line ?
37
How to use this straight line ?
Direct use : for a given x predict the mean Y construct a confidence interval of the mean Y construct a prediction interval of Y Reverse use calibration (approximate results): for a given Y predict the mean x construct a confidence interval of the mean x construct a prediction interval of X
38
For a given x predict the mean Y
Example :
39
Confidence interval of the mean Y
There is a probability 1-a that a+bx belongs to this interval
40
Confidence interval of the mean Y
U L 30
41
Example
42
Prediction interval of Y
100(1-a)% of the measurements carried-out for this x belongs to this interval
43
Prediction interval of Y
U L 30
44
Example
45
Reverse use : for a given Y=y0 predict the mean X
Example :
46
For a given Y=y0 a confidence interval of the mean X
U
47
Confidence interval of the mean X
There is a probability 1-a that the mean X belongs to [ L , U ] L and U are so that
48
Example
49
What you should no longer believe
One can fit the straight line by inverting x and Y If the correlation coefficient is high, the straight line is the best model Normality of the xi's is required to perform a regression Normality of the ei's is essential to perform a good regression
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.