Download presentation
Presentation is loading. Please wait.
1
J.-F. Pâris University of Houston
Linear regression J.-F. Pâris University of Houston
2
Introduction Special case of regression analysis
3
Regression Analysis Models the relationship between
Values of a dependent variable (also called a response variable) Values of one or more independent variables Main outcome is a function y = f(x1, …, xn)
4
Linear regression Studies linear dependencies y = ax + b
And more y = ax2 + bx + c Is linear in a and b Uses Least-Square Method Assumes that departures from ideal line are to be random noise
5
Basic Assumptions (I) Sample is representative of the whole population
The error is assumed to be a random variable with a mean of zero conditional on the independent variables. Independent variables are error-free and linearly independent. Errors are uncorrelated
6
Basic Assumptions (II)
The variance of the error is constant across observations For very small samples, the errors must be Gaussian Does not apply to large samples ( 30)
7
General Formulation y1, y2, …, yn x11, x12, …, x1n x21, x22, …, x2n …
n samples of the dependent variable: y1, y2, …, yn n samples of each of the p dependent variables: x11, x12, …, x1n x21, x22, …, x2n … xp1, xp2, …, xpn
8
Objective Si (yi - b0 - b1x1i - b2x2i -… - bpxpi)2 Finding
Y = b0 + b1X1 + b2X2 +… + b2Xp Minimizing the sum of squares of the deviations Si (yi - b0 - b1x1i - b2x2i -… - bpxpi)2
9
Why the sum of squares It favors big deviations
Less likely to result from random noise than large variations Our objective is to estimate the function linking the dependent variable to the independent variable assuming that the experimental points represent random variations
10
Simplest case (I) One independent variable We must find Y = a + bX
Minimizing the sum of squares of errors Si (yi - a - bxi)2
11
Simplest case (II) Derive the previous expression with respect to the parameters a and b: Si -2a(yi - a - bxi) or na – Si xi b = Si yi Si 2 xi(yi - a - bxi) or Si xi a + Si xi2 b = Si xi yi
12
Simplest case (III) We obtain The second expression can be rewritten
13
More notations
14
Simplest case (IV) Solution can be rewritten
15
Coefficient of correlation
r = 1 would indicate a perfect fit r = 0 would indicate no linear dependency
16
More complex case (I) Use matrix formulation Y= Xb + e where Y is a column vector and X is
17
More complex case (II) Solution to the problem is b = (XTX)-1XTy
18
Non-linear dependencies
Can use polynomial model Y = b0 + b1X + b2X2 +… + b2Xp Or do a logarithmic transform Replace y = Keat by log y = K + at
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.