J.-F. Pâris University of Houston

J.-F. Pâris University of Houston
Linear regression J.-F. Pâris University of Houston

Introduction Special case of regression analysis

Regression Analysis Models the relationship between
Values of a dependent variable (also called a response variable) Values of one or more independent variables Main outcome is a function y = f(x1, …, xn)

Linear regression Studies linear dependencies y = ax + b
And more y = ax2 + bx + c Is linear in a and b Uses Least-Square Method Assumes that departures from ideal line are to be random noise

Basic Assumptions (I) Sample is representative of the whole population
The error is assumed to be a random variable with a mean of zero conditional on the independent variables. Independent variables are error-free and linearly independent. Errors are uncorrelated

Basic Assumptions (II)
The variance of the error is constant across observations For very small samples, the errors must be Gaussian Does not apply to large samples ( 30)

General Formulation y1, y2, …, yn x11, x12, …, x1n x21, x22, …, x2n …
n samples of the dependent variable: y1, y2, …, yn n samples of each of the p dependent variables: x11, x12, …, x1n x21, x22, …, x2n … xp1, xp2, …, xpn

Objective Si (yi - b0 - b1x1i - b2x2i -… - bpxpi)2 Finding
Y = b0 + b1X1 + b2X2 +… + b2Xp Minimizing the sum of squares of the deviations Si (yi - b0 - b1x1i - b2x2i -… - bpxpi)2

Why the sum of squares It favors big deviations
Less likely to result from random noise than large variations Our objective is to estimate the function linking the dependent variable to the independent variable assuming that the experimental points represent random variations

Simplest case (I) One independent variable We must find Y = a + bX
Minimizing the sum of squares of errors Si (yi - a - bxi)2

Simplest case (II) Derive the previous expression with respect to the parameters a and b: Si -2a(yi - a - bxi) or na – Si xi b = Si yi Si 2 xi(yi - a - bxi) or Si xi a + Si xi2 b = Si xi yi

Simplest case (III) We obtain The second expression can be rewritten

More notations

Simplest case (IV) Solution can be rewritten

Coefficient of correlation
r = 1 would indicate a perfect fit r = 0 would indicate no linear dependency

More complex case (I) Use matrix formulation Y= Xb + e where Y is a column vector and X is

More complex case (II) Solution to the problem is b = (XTX)-1XTy

Non-linear dependencies
Can use polynomial model Y = b0 + b1X + b2X2 +… + b2Xp Or do a logarithmic transform Replace y = Keat by log y = K + at

J.-F. Pâris University of Houston

Similar presentations

Presentation on theme: "J.-F. Pâris University of Houston"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

J.-F. Pâris University of Houston

Similar presentations

Presentation on theme: "J.-F. Pâris University of Houston"— Presentation transcript:

Similar presentations

About project

Feedback