Download presentation
Presentation is loading. Please wait.
1
Petter Mostad mostad@chalmers.se
Linear regression Petter Mostad
2
Relationships between variables
We want to understand the relationship between x and y!
3
Relationships between variables
We want to understand the relationship between x and y!
4
What to do with a fitted line
Interpolation Extrapolation Interpret parameters of line
5
How to define the ”best fitting” line?
Sum of squares of ”errors”, or residuals, is minimized = Least squares method Note: Other things could possible be optimized instead
6
How to compute the least squares line?
Let (x1, y1), (x2, y2),...,(xn, yn) be the data points. Find a and b such that y=a+bx fits the points, by minimizing Solution: where and all sums are made for i=1,...,n.
7
How do you obtain these formulas?
Differentiate S with respect to a and b, and set results to zero: We get: These are two equations with two unknowns, and the soluton is the answer above.
8
Example Crickets make sound by rubbing their wings together. There is a correlation between the temperature and the number of movements per second, unique for every species. Here are some data for Nemobius fasciatus fasciatus: Movements/sec Temperature 20,0 31,4 16,0 22,0 19,8 34,1 18,4 29,1 15,5 24,0 14,7 21,0 17,1 27,7 15,4 20,7 16,2 28,5 15,0 26,4 17,2 28,1 17,0 28,6 14,4 24,6 If you measure 18 movements per second, what is estimated temperature? Data from Pierce, GW. The Songs of Insects. Cambridge, Mass.: Harvard University Press, 1949, pp
9
Example (cont.) Computations: Answer: Estimated temperature
10
What about the uncertainty in the prediction?
Temperature is not perfectly predicted! We assume: A linear model relates the number of wing movements with a mean predicted temperature The actual temperature has a normal distribution around that mean prediction, with variance σ2 In order to make a prediction with uncertainty, we must: First estimate the parameters of the line from data Find the predicted temperature at the given wing movement number, with uncertainty Add the ”random error”, with the estimated variance
11
More examples A model for prediction of y claims that every unit increase in a variable x increases the expected value of y by 1.4, and that y is normally distributed around this expectation with some fixed variance. How can we test this model? You have a choice between a model relating y with x where either y=ax+b+error, or y=ax+error. How can you choose?
12
Procedure for answering such quesitons:
We set up a linear model for our observations We estimate its parameters, with uncertainty We draw conclusions from these estimates, with uncertainty, or We use the estimates, with uncertainties, to make predictions, with uncertainties We return soon to this, but first something more about the basic estimation…
13
y against x ≠ x against y Linear regression of y against x does not give the same result as the opposite: Regression of y against x This example needs to be carefully explained in order to be understood!! Use the cricket example! Regression of x against y
14
Centering the variables
Assume we subtract the averages fra x- and y-values We get and From definitions of correlation and standard deviation follows: (even in uncentered case) Note: The Residuals sum to 0. Note how subtracting averages corresponds to changing the coordinate system, which should not change the regression line
15
Example: transformed variables
The connection between the variables is not always linear. Example: The natural model may be We still want to find a and b such that approximates the data as well as possible
16
Example (cont) When then
Use standard formulas on the pairs (x1, log(y1)), (x2, log(y2)),..,(xn, log(yn)) We get estimates for log(a) and b, and thus a and b
17
Another example with transformed variables
Another natural model may be We then get Use standard formulas on the pairs (log(x1), log(y1)), (log(x2), log(y2)), ...,(log(xn),log(yn)) Note: In this model, the fitted line always goes through (0,0)
18
Several explanatory variables
Assume our data is of the type (x11, x12, x12, y1), (x21, x22, x23, y2), ... We can try to predict or ”explain” y from the x-values with a model Exactly as before we can deduce formulas for a,b,c,d minimizing the sum of the squares of the ”errors”, or residuals. x1,x2,x3 can be transformations of different variables, or even the same variable. Definer ordet ”forklaringsvariable”.... Selvfölgelig vilkårlig antall forklaringsvariable...(men må väre mindre enn antallet ukjente..)
19
Example: Fitting a polynomial line
Assume data (x1,y1),..., (xn,yn) seem to be following the curve of a third-degree polynomial We use the above theory on (x1, x12, x13, y1), (x2, x22, x23, y2),... We estimate a,b,c,d, and a third-degree polynomial
20
Regression as a linear model
Responses are modelled as a linear function of ”explanatory variables”, with unknown coefficients, plus a random error. The random error is normally distributed with zero mean and a fixed variance for all observations. With these assumptions, values for the unknown coefficients can be estimated using least squares. We can find an estimate for the variance. This can be used to obtain confidence intervals, and confidence regions, for the unknown coefficients.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.