Download presentation
Presentation is loading. Please wait.
1
Lecturer Dr. Veronika Alhanaqtah
ECONOMETRICS Lecturer Dr. Veronika Alhanaqtah
2
Topic 3. Nonlinear regression models
Topic 3. Nonlinear regression models. Selection of a functional form for a model and problems of specification Examples of linear and non-linear models applied in econometric analysis Logarithmic models Semi-log models Dummy variables in regression analysis Selection of a functional form for a model. Features of a “good” model Types of specification problems Detection of specification problems and its elimination. F-test Testing hypothesis on some linear restrictions simultaneously Testing hypothesis on insignificance of a regression as a whole Ramsey-test (RESET) Quality criteria of a regression model
3
1. Application of linear and some nonlinear models
Example of application analyses of the demand for a good Y dependent on its price X analysis of the price elasticity of demand analysis of the costs dependent on output Engel function: analysis of the demand for a good Y dependent on its price X (β<0); analysis of the demand for a good Y dependent on consumer’s income X (β>0) Cobb-Douglas production function: analysis of the relationship between the output (Y) and costs on capital (K) and labor (L) widely applied in banking and financial analysis to model growth rates and rates of increment of a particular economic variable analysis of the change in the dependent variable Y with the rate of increment constant in time
4
Notes about logarithms
When we logarithm data what a base should we take – e or 10? (e=2,718281… Euler’s number) Standard in Econometrics is to use natural logarithms (ln). Nowadays econometricians do not use any more decimal logarithms. It was widely applied until the beginning of 1970th. Useful rule for natural logarithms: If then
5
2. Logarithmic models Interpretation: when X increases by 1 %, then Y increases by β%. Note: coefficient β2 represent elasticity (percentage change) of a variable Y in response to the change in a variable X by 1 %. Here β2 is a constant, indicating the constant elasticity. For this reason binary logarithmic model is often called a model with constant elasticity. The function above is not linear in relation to X and Y, but it is linear in relation to lnX and lnY, as well as in relation to the β- parameters. If all assumptions for classical linear model are hold, then OLS will give the best linear unbiased β-estimators.
6
Logarithmic models Model with multiple variables: α+ß=1 α+ß<1
Application: very often this model is used for analysis of the production process. For example, when we logarithm Cobb-Duglas production function where α and β are elasticities of output dependent on costs of capital and costs of labor. The sum of this coefficients is a very important economic indicator, returns to scale: α+ß=1 constant returns to scale (output increases at the same rate as costs increases) α+ß<1 decreasing returns to scale (diminishing returns) (output increases at a lower rate than costs increase) α+ß>1 increasing returns to scale (output increases at a higher rate than costs increase)
7
Log-linear: Linear-to-log:
3. Semi-log models Log-linear: Interpretation: when X increases by 1 unit, then Y changes by Linear-to-log: Interpretation: when X increases by 1 %, then Y changes by units . Application: Models of this type are usually used when it is necessary to determine growth rate or rate of increment of a particular economic variable. For example, banking deposit at initial contribution, increment of output dependent on percentage increase of resources used, deficit of budget dependent on growth rate of GNP, inflation rate dependent on money supply, etc.
8
4. Dummy variables in the regression analysis
In econometric analysis we apply not only numeric (quantitative) variables, but also categorical (factor, qualitative) variables. For example, demand for a specific good can be determined by its price, price on substitution-good, consumer’s income, etc. But demand for the good can also be determined by consumer’s tastes and expectations, national and religion peculiarities, etc. Problem: how to determine the influence of categorical variables on a dependent variable? So we plug into a model a dummy variable which represents two opposite qualities of a categorical variable. Regression models, which contain only categorical variables, are called ANOVA-models (models of dispersion analysis). Regression models, which contain both numeric and categorical variables, are called ANCOVA-models (models of covariance analysis).
9
4. Dummy variables in the regression analysis
With the help of dummy variables we can describe different parts of a sample. Basic model (1): In the basic model for all the workers (males and females,)wages are, on average, the same, if they have the same working experience (years) and education (years). If we include in the model (1) a new variable malei (dummy) it will help to describe the difference in wages for males and females (“1” is for male). Model (2): By this model (1) has split up into two models describing different parts of a sample: For males: For females: Interpretation of ß4: it shows by how much the wages of males and females differ, with education and experience being equal. If ß4<0 (with other variables kept constant) females have higher wages than males; If ß4>0 (with other variables kept constant) males have higher wages than females.
10
4. Dummy variables in regression analysis
Dummy variables in the seasonal analysis Many economic variables are directly influenced by seasonal fluctuations. For example, demand for tourist trips, cold drinks and ice-cream are higher in summer than in winter; demand for warm clothes is higher in winter. How do we include dummy variables in this case?
11
4. Dummy variables in the seasonal analysis
Example: We choose the basic value for a factor variable: winter Include 3 dummy variables (4 seasons – 1 basic season): spring, summer, autumn. observation season springi summeri autumni 1 winter 2 spring 3 summer 4 autumn 5 …
12
4. Dummy variables in the seasonal analysis
Basic model: Sub-models: Winter: Spring: Summer: Autumn: Interpretation: ß3 shows the difference in demand for an ice-cream between spring and winter; we expect that ß3>0 (by how many the demand for an ice- cream is higher in spring than in winter, keeping the price fixed) ß4 shows the difference in demand for an ice-cream between summer and winter; ß5 shows the difference in demand for an ice-cream between autumn and winter.
13
4. Dummy variables in the regression analysis
Common mistake: when a researcher includes dummy variables for all the values of a categorical variable. For example, we includes dummy variables for winter, spring, summer and autumn. But, in this case, strict linear relationship between regressors appears and, as a consequence, we can not obtain unique estimators for ß-coefficients using OLS. Rule: if we have n values of a categorical variable, include into a model n-1 dummy variables.
14
5. Selection of a functional form for a model
Diversity and complexity of economic processes predetermine a large variety of models, used in econometric analysis. In case of a simple linear regression, model selection is usually based upon the layout of points plotted on a correlation field. It often happens, that layout of points approximately corresponds to several functions, and our task is to find the most adequate. For example, curvilinear relationships could be approximated by polynomial, power, exponential, logarithmic functions. For multiple regression plotting of statistical data is impossible.
15
5. Selection of a functional form for a model
Remember: an ideal model does not exist. Questions: What are the features of a “good” model? What specification problems we might face to and what are the consequences of these problems? How to find out a specification problem? How to eliminate a specification problem and to move on to the better (more qualitative) model?
16
5.1. Features of a “good” model
Simplicity. Out of 2 models, reflecting the reality approximately similarly, chose a model with fewer number of variables. Uniqueness. For any set of statistical data, estimators of ß-coefficients must be unique (with a single meaning). The right match. Model is admitted to be better, if it can explain more variance of a dependent variable, in comparison with other models. Choose the regression model with a higher R-squared-adjusted. Reconciliation with theory. For example, if in a demand function a coefficient at price is appeared to be positive, the model is not admitted to be “good”, even though it has high R2 (≈ 0.7). A model must be based on theoretical grounding. Prognostic qualities. Model is of high quality if its predictions are acknowledged by real data.
17
5.2. Types of specification problems
Good specification of a regression means that a regression, on the whole, reflects a relationship between economic parameters adequately. Incorrect selection of a functional form of a model or incorrect selection of a set of independent variables (regressors) is called problems of specification.
18
5.2. Types of specification problems
(1) Insignificant variables are included The only feature that is missing out of all useful features, provided by OLS, is efficiency of ß-estimators (consequently, Var(ß) is greater). At the same time ß-estimators are unbiased and consistent. We can test hypothesis and construct confidence intervals. If we include into the model insignificant variable confidence intervals for ß-coefficients and prediction intervals become wider.
19
5.2. Types of specification problems
(2) Omitted variables This problem is much worse than problem (1). Everything becomes bad! The only useful quality provided by OLS is that ß-estimators are linear towards Y. Other useful qualities lost: estimators became biased, inconsistent, inefficient; We can’t test hypothesis and construct confidence intervals, using this model. Moral: it is better to include extra variable than omit an important variable.
20
5.2. Types of specification problems
(3) Model misspecification (choice of incorrect functional form for a model) This mistake is very serious. Prediction qualities of a model are very low.
21
5.3. Detection of specification problems and its elimination. F-test
One insignificant variable is revealed by low t-statistic. Several insignificant variables: we have to construct a regression model without these variables. Using F-statistic, compare adjusted R2 for unrestricted (initial) model and restricted model (without insignificant variables).
22
5.3.1. Testing hypothesis on some linear restrictions simultaneously
Example Researcher estimates a relationship between price for the accommodation in Moscow (Russia) dependent on some variables. He includes in the initial model (UR-unrestricted) the following factors: total space living space, kitchen space whether an apartment is in a brick or non-brick building (“1” is for brick, “0” is for non-brick) distance to the metro station whether you can reach metro walking or driving (“1” is for walking). He also considers alternative model (R-restricted), where he omitted some variables. He computes RSS(UR)=62.6 and RSS(R)=69.3. Number of observations is 2040. Which model is better?
23
5.3.1. Testing hypothesis on some linear restrictions simultaneously
Example Researcher estimates a relationship between price for the accommodation in Moscow dependent on some variables. Number of observations is RSS(UR)=62.6 and RSS(R)=69.3. Which model is better? UR-model: R-model: Consideration: The value of RSS(UR) is smaller so UR-model has better prediction qualities. But the R-model is more simple. We are wondering, the difference in RSS is caused by random factors or because we omitted significant variables in R-model?
24
5.3.1. Testing hypothesis on some linear restrictions simultaneously
Example Which model is better? UR-model: R-model: Hypothesis testing: Ho: regressors metrdist and walk are insignificant Ha: at least one of these two regressors is significant
25
5.3.1. Testing hypothesis on some linear restrictions simultaneously
Example Which model is better? UR-model: R-model: Hypothesis testing: Null-hypothesis is verified with the help of F-test: where r – number of limitations (in our case r=2), k – number of estimated coefficients in unrestricted model (k=7), n – number of observations (n=2040). If H0 is realistic, F-statistic must have Fisher-distribution with r degrees of freedom and (n-k) linear restrictions. If F-statistic is greater than F-critical H0 is rejected.
26
5.3.1. Testing hypothesis on some linear restrictions simultaneously
Example Which model is better? UR-model: R-model: Hypothesis testing (α=5%): Using F-table or command in R, we find that Fcr=3: R-Studio: We see that so H0 is rejected. It means that UR-model predicts much better. In R-model one of the coefficients (or both) ≠ 0, and omission of coefficients is not good.
27
Some mathematical theory. Sum of squares
RSS – Residual Sum of Squares measures how large the values of estimated residuals are (how far from zero they locate). TSS – Total Sum of Squares measures how much every yi is different from the mean of y (average). ESS – Explained Sum of Squares measures how far a predicted value of yi is from the mean of y (average). RSS+ESS=TSS One more formula for R-squared:
28
5.3.2. Testing hypothesis on insignificance of the whole regression
We may simplify formula for F-statistic: Null-hypothesis: all regressors are insignificant Alternative hypothesis: at least one coefficient is significant
29
5.3.2. Testing hypothesis on insignificance of the whole regression
Example: test a hypothesis whether the regression is insignificant. R2=0.09, n=3294. Ha: at least one of the coefficients ≠ 0. Inference: F-statistic is too far from the F-critical. It means that R2 is too large for statistical significance (even though R2 is just 0.09, but from the standpoint of hypothesis testing it is too large). H0 is rejected. It means, that at least one of the coefficients is significant. In other words, regression as a whole is significant.
30
5.3.2. Testing hypothesis on insignificance of the whole regression
Moral for the Question : If in the theory it is supposed that y is dependent on x, it is recommended to include x into the model, even if x is insignificant. If a statistical test shows that variables are significant, it is better to include them into the model, even if in the theory it is not supposed any dependence on them.
31
Other tests for identification of specification problems
Ramsey-test (RESET-Regression specification error test) The Likelihood Ration test The Wald test The Lagrange multiplier test The Hausman test Box-Cox transformation. If we have observations then in order to verify whether significant variables were omitted or not we use F-test. However, it can be a situation, when we had to include variables into the model for which we do not have observations at all. To verify, whether we omitted variables, we apply Ramsey-test.
32
6. Ramsey-test (RESET) Null-hypothesis: there are not omitted variables (the model is correct, we included exactly those variables that must have been included) Alternative hypothesis: there are unknown omitted regressors Algorithm of Ramsey-test: Estimate a model Obtain y-estimators Estimate an auxiliary regression with artificial variables Fulfill F-test to verify a hypothesis that all In accordance with the Ramsey test: if H0 is true and residuals have normal distribution, then The idea of Ramsey-test: If we omitted any variable then, perhaps, this variable would be left in y-estimators. Consequently, we the help of y-estimators we may change this variable. In other words, are substitutes for the omitted variable. If, in fact, there is no omitted variable, then values of y-estimators will not contain the information about omitted variables. So γ1, γ2, γ3 will be equal to zero.
33
6. Ramsey-test (RESET) Example: Fulfill the Ramsey-test for the regression: R2=0.091 in the basic (restricted) regression, R2=0.095 in the auxiliary (unrestricted) regression of Ramsey), n=3294, α=5%. Step 1.Restricted (basic) regression: H0: there are not omitted regressors Ha: there are unknown omitted regressors Step 2. Ramsey-test. Estimate unrestricted (auxiliary) regression (in R): Having estimated the R-model (on Step 1 and Step 2), we include into the UR- model predicted values of y. Ha: at least one of the coefficients ≠ 0. F-statistic: F-critical (in R): so H0 is rejected. In means, there are omitted variables in the basic model.
34
7. Quality criteria of a regression model
(1) Coefficient of determination R-squared is the most simple criteria of quality of a model. It is a square of a sample correlation between y and y-estimator. It shows, how much predicted value of y is close to real y. The higher R2 the better quality of the model. Its disadvantage is that in UR-model R2 is always higher than in R- model. That is why it is not advisable to choose between R or UR models based on this criteria.
35
7. Quality criteria of a regression model
(2) Coefficient of determination (adjusted): In more complicated models (with many ß-coefficients (UR-models)) the value of RSS is smaller. And econometricians implemented so called penalty for the higher number of coefficients: The higher k (number of regressors), the smaller value of R2adj. With the help of R2adj we can compare R-model and UR-model. We select the model with higher R2adj. Comparing R2adj is equivalently comparing If in one of the models (UR and R) R2adj is higher, it means that in this model the value of is smaller (which is better). Moral: modal is of a bad quality if: it predicts badly (RSS is high); it is rather complicated (there are many regressors (high k)).
36
7. Quality criteria of a regression model
(3) Informational criteria (penalty for high RSS and k) The higher k (with other things kept constant), the higher the penalty. ACAICE-criteria: Schwartz-criteria: Choose the model in which the value of AIC and BIC is smaller.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.