Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Linear Regression

Similar presentations


Presentation on theme: "Non-Linear Regression"— Presentation transcript:

1 Non-Linear Regression
Introduction Previously we have fitted, by least squares, the General Linear model which were of the type: Y = b0 + b1X1 + b2 X bpXp + e

2 The Non-Linear regression model will generally be of the form:
Y = f(X1, X2, ..., Xp| q1, q2, ... , qq) + e where the function (expression) f is known except for the q unknown parameters q1, q2, ... , qq.

3 Least Squares in the Nonlinear Case

4 Suppose that we have collected data on the Y,
(y1, y2, ...yn) corresponding to n sets of values of the independent variables X1, X2, ... and Xp (x11, x21, ..., xp1) , (x12, x22, ..., xp2), ... and (x12, x22, ..., xp2).

5 For a set of possible values q1, q2,
For a set of possible values q1, q2, ... , qq of the parameters, a measure of how well these values fit the model described in equation * above is the residual sum of squares function where is the predicted value of the response variable yi from the values of the p independent variables x1i, x2i, ..., xpi using the model in equation * and the values of the parameters q1, q2, ... , qq.

6 The Least squares estimates of q1, q2, ... , qq, are values
which minimize S(q1, q2, ... , qq). It can be shown that the error terms are independent normally distributed with mean 0 and common variance s2 than the least squares estimates are also the maximum likelihood estimate of q1, q2, ... , qq).

7 Iterative Techniques for Estimating the Parameters of a Nonlinear Model
1) Steepest descent, 2) Linearization, and 3) Marquardt's procedure.

8 In each case a iterative procedure is used to find the least squares estimators .
That is an initial estimates, ,for these values are determined. The procedure than finds successfully better estimates, that hopefully converge to the least squares estimates,

9 Steepest Descent The steepest descent method focuses on determining the values of q1, q2, ... , qq that minimize the sum of squares function, S(q1, q2, ... , qq). The basic idea is to determine from an initial point, and the tangent plane to S(q1, q2, ... , qq) at this point, the vector along which the function S(q1, q2, ... , qq) will be decreasing at the fastest rate. The method of steepest descent than moves from this initial point along the direction of steepest descent until the value of S(q1, q2, ... , qq) stops decreasing.

10 It uses this point, as the next approximation to the value that minimizes S(q1, q2, ... , qq). The procedure than continues until the successive approximation arrive at a point where the sum of squares function, S(q1, q2, ... , qq) is minimized. At that point, the tangent plane to S(q1, q2, ... , qq) will be horizontal and there will be no direction of steepest descent.

11 The process may not converge.
Convergence may be slow. The converged solution may not be the global minimum but a local minimum. The performance of this procedure sometimes depends on the choiceof the initial starting values

12 Slow convergence is particularly likely when the S(q1, q2,
Slow convergence is particularly likely when the S(q1, q2, ... , qq) contours are attenuated and banana-shaped (as they often are in practice), and it happens when the path of steepest descent zigzags slowly down a narrow valley, each iteration bringing only a slight reduction in S(q1, q2, ... , qq).

13 The steepest descent method is, on the whole, slightly less favored than the linearization method (described later) but will work satisfactorily for many nonlinear problems, especially if modifications are made to the basic technique.

14 Steepest Descent Steepest descent path Initial guess

15 Linearization The linearization (or Taylor series) method uses the results of linear least squares in a succession of stages. Suppose the postulated model is of the form: Y = f(X1, X2, ..., Xp| q1, q2, ... , qq) + e Let be initial values for the parameters q1, q2, ... , qq. These initial values may be intelligent guesses or preliminary estimates based on whatever information are available.

16 These initial values will, hopefully, be improved upon in the successive iterations to be described below. The linearization method approximates f(X1, X2, ..., Xp| q1, q2, ... , qq) with a linear function of q1, q2, ... , qq using a Taylor series expansion of f(X1, X2, ..., Xp| q1, q2, ... , qq) about the point and curtailing the expansion at the first derivatives. The method then uses the results of linear least squares to find values, that provide the least squares fit of of this linear function to the data .

17 The procedure is then repeated again until the successive approximations converge to hopefully at the least squares estimates:

18 Linearization Contours of RSS for linear approximation 2nd guess
Initial guess

19 3rd guess Contours of RSS for linear approximation 2nd guess Initial guess

20 4th guess 3rd guess Contours of RSS for linear approximation 2nd guess Initial guess

21 The linearization procedure has the following possible drawbacks:
1. It may converge very slowly; that is, a very large number of iterations may be required before the solution stabilizes even though the sum of squares S(qi) may decrease consistently as j increases. This sort of behavior is not common but can occur. 2. It may oscillate widely, continually reversing direction, and often increasing, as well as decreasing the sum of squares. Nevertheless the solution may stabilize eventually. 3. It may not converge at all, and even diverge, so that the sum of squares increases iteration after iteration without bound.

22 Levenberg-Marquardt procedure.
Uses a combination of both steepest descent and Linearization.

23 Example In this example a chemical has been applied to the soil in an agricultural field. The concentration of the chemical(Y) is then measured 7, 14, 21, 28, 42, 56, 70, 84 days after application. 6 measurements of Y are made for each time.

24 The data

25 Graph

26 the Model a + b b b

27 To perform non-linear regression select Analysis -> Regression -> Nonlinear

28 The following dialog box appears
Select the dependent variable. Specify the parameters and their starting values Specify the model

29 The Examination of Residuals

30 Introduction Much can be learned by observing residuals.
This is true not only for linear regression models, but also for nonlinear regression models and analysis of variance models. In fact, this is true for any situation where a model is fitted and measures of unexplained variation (in the form of a set of residuals) are available for examination.

31 Quite often models that are proposed initially for a set of data are incorrect to some extent.
An important part of the modeling process is diagnosing the flaws in these models. Much of this can be done by carefully examining the residuals

32 The residuals are defined as the n differences :
where is an observation and is the corresponding fitted value obtained by use of the fitted model.

33 We can see from this definition that the residuals, ei, are the differences between what is actually observed, and what is predicted by model. That is, the amount which the model has not been able to explain.

34 Many of the statistical procedures used in linear and nonlinear regression analysis are based certain assumptions about the random departures from the proposed model. Namely; the random departures are assumed i) to have zero mean, ii) to have a constant variance, s2, iii) independent, and iv) follow a normal distribution.

35 Thus if the fitted model is correct,
the residuals should exhibit tendencies that tend to confirm the above assumptions, or at least, should not exhibit a denial of the assumptions. When examining the residuals one should ask: "Do the residuals make it appear that our assumptions are wrong?"

36 After examination of the residuals we shall be able to conclude either:
(1) the assumptions appear to be violated (in a way that can be specified), or (2) the assumptions do not appear to be violated.

37 Note that (2) , in the same spirit of hypothesis testing of does not mean that we are concluding that the assumptions are correct; it means merely that on the basis of the data we have seen, we have no reason to say that they are incorrect. The methods for examining the residuals are sometimes graphical and sometimes statistical

38 The principal ways of plotting the residuals ei are:
1. Overall. 2. In time sequence, if the order is known. 3. Against the fitted values 4. Against the independent variables xij for each value of j In addition to these basic plots, the residuals should also be plotted 5. In any way that is sensible for the particular problem under consideration,

39 The residuals can be plotted in an overall plot in several ways.

40 1. The scatter plot.

41 2. The histogram.

42 3. The box-whisker plot.

43 4. The kernel density plot

44 5. a normal plot or a half normal plot on standard probability paper.

45 If our model is correct these residuals should (approximately) resemble observations from a normal distribution with zero mean. Does our overall plot contradict this idea? Does the plot exhibit appear abnormal for a sample of n observations from a normal distribution. How can we tell? With a little practice one can develop an excellent "feel" of how abnormal a plot should look before it can be said to appear to contradict the normality assumption.

46 The standard statistical test for testing Normality are:
1. The Kolmogorov-Smirnov test. 2. The Chi-square goodness of fit test

47 The Kolmogorov-Smirnov test
The Kolmogorov-Smirnov uses the empirical cumulative distribution function as a tool for testing the goodness of fit of a distribution. The empirical distribution function is defined below for n random observations Fn(x) = the proportion of observations in the sample that are less than or equal to x.

48 Let F0(x) denote the hypothesized cumulative distribution function of the population (Normal population if we were testing normality) If F0(x) truly represented distribution of observations in the population than Fn(x) will be close to F0(x) for all values of x.

49 The Kolmogorov-Smirinov test statistic is :
= the maximum distance between Fn(x) and F0(x). If F0(x) does not provide a good fit to the distributions of the observation - Dn will be large. Critical values for are given in many texts

50 The Chi-square goodness of fit test
The Chi-square test uses the histogram as a tool for testing the goodness of fit of a distribution. Let fi denote the observed frequency in each of the class intervals of the histogram. Let Ei denote the expected number of observation in each class interval assuming the hypothesized distribution.

51 The hypothesized distribution is rejected if the statistic:
is large. (greater than the critical value from the chi-square distribution with m - 1 degrees of freedom. m = the number of class intervals used for constructing the histogram).

52 Note. The in the above tests it is assumed that the residuals are independent with a common variance of s2. This is not completely accurate for this reason: Although the theoretical random errors ei are all assumed to be independent with the same variance s2, the residuals are not independent and they also do not have the same variance.

53 They will however be approximately independent with common variance if the sample size is large relative to the number of parameters in the model. It is important to keep this in mind when judging residuals when the number of observations is close to the number of parameters in the model.

54 Time Sequence Plot The residuals should exhibit a pattern of independence. If the data was collected in time there could be a strong possibility that the random departures from the model are autocorrelated.

55 Namely the random departures for observations that were taken at neighbouring points in time are autocorrelated. This autocorrelation can sometimes be seen in a time sequence plot. The following three graphs show a sequence of residuals that are respectively i) positively autocorrelated , ii) independent and iii) negatively autocorrelated.

56 i) Positively auto-correlated residuals

57 ii) Independent residuals

58 iii) Negatively auto-correlated residuals

59 There are several statistics and statistical tests that can also pick out autocorrelation amongst the residuals. The most common are: i) The Durbin Watson statistic ii) The autocorrelation function iii) The runs test

60 The Durbin Watson statistic :
The Durbin-Watson statistic which is used frequently to detect serial correlation is defined by the following formula: If the residuals are serially correlated the differences, ei - ei+1, will be stochastically small. Hence a small value of the Durbin-Watson statistic will indicate positive autocorrelation. Large values of the Durbin-Watson statistic on the other hand will indicate negative autocorrelation. Critical values for this statistic, can be found in many statistical textbooks.

61 The autocorrelation function:
The autocorrelation function at lag k is defined by : This statistic measures the correlation between residuals the occur a distance k apart in time. One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals. Some typical patterns of the autocorrelation function are given below:

62 This statistic measures the correlation between residuals the occur a distance k apart in time.
One would expect that residuals that are close in time are more correlated than residuals that are separated by a greater distance in time. If the residuals are independent than rk should be close to zero for all values of k A plot of rk versus k can be very revealing with respect to the independence of the residuals.

63 Some typical patterns of the autocorrelation function are given below:
Auto correlation pattern for independent residuals

64 Various Autocorrelation patterns for serially correlated residuals

65

66 The runs test: This test uses the fact that the residuals will oscillate about zero at a “normal” rate if the random departures are independent. If the residuals oscillate slowly about zero, this is an indication that there is a positive autocorrelation amongst the residuals. If the residuals oscillate at a frequent rate about zero, this is an indication that there is a negative autocorrelation amongst the residuals.

67 In the “runs test”, one observes the time sequence of the “sign” of the residuals:
and counts the number of runs (i.e. the number of periods that the residuals keep the same sign). This should be low if the residuals are positively correlated and high if negatively correlated.

68 Plot Against fitted values and the Predictor Variables Xij
If we "step back" from this diagram and the residuals behave in a manner consistent with the assumptions of the model we obtain the impression of a horizontal "band " of residuals which can be represented by the diagram below.

69 Individual observations lying considerably outside of this band indicate that the observation may be and outlier. An outlier is an observation that is not following the normal pattern of the other observations. Such an observation can have a considerable effect on the estimation of the parameters of a model. Sometimes the outlier has occurred because of a typographical error. If this is the case and it is detected than a correction can be made. If the outlier occurs for other (and more natural) reasons it may be appropriate to construct a model that incorporates the occurrence of outliers.

70 If our "step back" view of the residuals resembled any of those shown below we should conclude that assumptions about the model are incorrect. Each pattern may indicate that a different assumption may have to be made to explain the “abnormal” residual pattern. b) a)

71 Pattern a) indicates that the variance the random departures is not constant (homogeneous) but increases as the value along the horizontal axis increases (time, or one of the independent variables). This indicates that a weighted least squares analysis should be used. The second pattern, b) indicates that the mean value of the residuals is not zero. This is usually because the model (linear or non linear) has not been correctly specified. Linear and quadratic terms have been omitted that should have been included in the model.


Download ppt "Non-Linear Regression"

Similar presentations


Ads by Google