Business statistics and econometrics

Business statistics and econometrics
Lecturer: Ing. Martina Hanová, PhD.

Econometrics „Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena.“ (Arthur S. Goldberger)

Econometric Theory Econometrics - uses a variety of techniques, including regression analysis to compare and test two or more variables. Econometrics is a mixture of economic theory, mathematical economics, economic statistics, and mathematical statistics. Statistics Mathematics Economics Econometrics

Methodology of Econometrics
Traditional or classical methodology 1. Statement of theory or hypothesis 2. Specification of the mathematical model 3. Specification of the statistical, or econometric model 4. Obtaining the data 5. Estimation of the parameters of the econometric model 6. Hypothesis testing 7. Forecasting or prediction 8. Using the model for control or policy purposes.

1. Theory or hypothesis A theory should have a prediction – hypothesis
(in statistics and econometrics) Keynesian theory of consumption: Keynes stated - men are disposed to increase their consumption as their income increases, but not as much as the increase in their income. marginal propensity to consume (MPC) - is greater than zero but less than 1.

β1 intercept and β2 a slope coefficient.
2. Mathematical Model Mathematical equation: Y = β1 + β2X β1 intercept and β2 a slope coefficient. THEORY: Return to schooling is positive Y = wage X = number of years in school Keynesian consumption function: Y = consumption expenditure X = income β2 measures the MPC 0 < β2 < 1

3. Specification of the Econometric Model
Mathematical model - deterministic relationship between variables Y = β1 + β2X + u Econometric model – random or stochastic relationship between variables Y = β1 + β2X +  u or  - disturbance, error term, or random (stochastic) variable - represents other non-quantifiable, unknown factors that affect Y, also represents mismeasurements EXAMLE: relationship between Crop yield vs. Rain fall

4. Obtain Data observational data non-experimental data, experimental data Types of Data time series data cross-section data pooled data Measurement of Scale Ratio scale Interval scale Ordinal scale Nominal scale

5. Estimation of the model
to estimate the parameters of the function, β1 and β2, Regression analysis - statistical technique - the single most important tool at the econometrician’s disposal Ŷ = − X Ŷ - is an estimate of consumption

H1:the alternative hypothesis
6. Hypothesis Testing Statistical inference (hypothesis testing) We can use the information in the sample to make inferences about the population. We will have two hypotheses that go together, H0: the null hypothesis H1:the alternative hypothesis The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The alternative hypothesis represents the remaining outcomes of interest.

7. Forecasting forecast, variable Y on the basis of known or expected future value(s) of the explanatory, or predictor, variable X. 8. Use for Policy Recommendation

TERMINOLOGY AND NOTATION
Independent variable Explanatory variable Predictor Regressor Stimulus Exogenous Covariate Control variable Dependent variable Explained variable Predictand Regressand Response Endogenous Outcome Controlled variable two-variable (simple) regression analysis multiple regression analysis multivariate regression vs. multiple regression

Determining the Regression Coefficients Finding a Line of Best Fit
We can use the general equation for a straight line, to get the line that best “fits” the data.

Ordinary Least Squares (OLS)
The most common method used to fit a line to the data is known as OLS (ordinary least squares). Actual and Fitted Value

The Theory of OLS min  ei2 = e12 + e22 + e32 +.........+ en2
E(YiXi) = o + 1Xi population regression line (PRF) Ŷi= bo + b1Xi sample regression equation (SRF) min  ei2 = e12 + e22 + e en2

How does OLS get estimates of the coefficients?
Excel Tools/data analysis/ regression Matrix form Formula – mathematical function

RESULTS Interpretation of the regression output: R Square – Multiple R – Standard Error of the regression - Intercept – Regression Coefficient –

Assumptions of Classical SIMPLE Linear Regression Model(SLRM)
How do we evaluate a model? How do we know if the model we are using is good? assumptions relate to the (population) prediction errors study through the (sample) estimated errors, the residuals

Assumption of Ordinary Least Squares
The four conditions of the SRL model What can be wrong with our model? Yi is a Linear function of the Xi Errors εi, are Independent. Errors εi are Normally distributed. Errors εihave Equal variances (denoted σ2).

Why we have to evaluate any regression model?
all of the estimates, intervals, and hypothesis tests arising in a regression analysis have been developed assuming that the model is correct. all the formulas depend on the model being correct! if the model is incorrect, then the formulas and methods (OLS) we use are at risk of being incorrect.

Assumption 1 Model is linear in parameters

Linear regression model
Model is linear in parameters LRM NRM

Zero mean value of disturbances - ui.
ASSUMPTION 1 Zero mean value of disturbances - ui.

ASSumption 2 No autocorrelation between the disturbances The data are a random sample of the population

ASSumption 3 Equal variance of disturbences - ui Errors have constant variance “homoskedasticity” Errors have non-constant variance “heteroskedasticity”

variation covariance matrix
Construction of var-cov matrix: vector ei * transpose vector ei

Assumption 4 The errors are normally distributed Normal Probability Plot

An alternative way to describe all four assumptions
The errors, ϵi, are independent normal random variables with mean zero and constant variance, σ2

what can be wrong with our model
The population regression function is not linear. The error terms are not independent. The error terms are not normally distributed. The error terms do not have equal variance.

ASSumption 5 Zero covariance between ui and Xi

ASSumption 6 the number of >= the number of observations explanatory variables

MLR Model Assumptions The mean of the response , E(Yi), at each set of values of the predictor, (x1i,x2i,…), is a Linear function of the predictors. The errors, εi, are Independent. The errors, εi, at each set of values of the predictor, (x1i,x2i,…), are Normally distributed. The errors, εi, at each set of values of the predictor, (x1i,x2i,…) have Equal variances (denoted σ2).

some of the model conditions are more forgiving than others.
All tests and intervals are very sensitive to even minor variance from independence. All tests and intervals are sensitive to moderate variance from equal variance. The hypothesis tests and confidence intervals for βi are fairly "robust" (that is, forgiving) against variance from normality. Prediction intervals are quite sensitive to variance from normality.

Assumption of the ordinary least squares
The Gauss Markov theorem When the first 4 assumptions of the simple regression model are satisfied the parameter estimates are unbiased and have the smallest variance among other linear unbiased estimators. The OLS estimators are therefore called BLUE for Best Linear Unbiased Estimators

multiple regression model

Predicted values of Y, Comparison of real (actual) and predicted values of Y RESIDUAL OUTPUT Observation Predicted Y Residuals Standard Residuals 1 28,852139 -1,052 -0, 2 31,934008 -2,034 -1, 3 31,048974 -1,249 -0, 4 31,858932 -1,059 -0, 5 33,185693 -1,986 -1,

Residual variance, Variances of parameters and their standard deviations.

Regression Statistics
correlation coefficient, coefficient of determination adjusted coefficient of determination.

Confidence interval for estimated parameters

test the statistical significance of the model as a whole
Verification of the statistical significance of the model by using F-test

test the statistical significance of the estimated regression parameters
Verification of the statistical significance of parameters by using T-test

Calculation of standardized coefficients or beta coefficients
βj adj. = βj * R2 *100 [%]

SPECIFICATION OF ECONOMETRIC MODELS
Model specification refers to the determination of which independent variables should be included in or excluded from a regression equation. Specification is the first and most critical of stage. Our estimates of the parameters of a model and our interpretation of them depend on the correct specification of the model.

Most often problems: incorrect specification of functional form of the model quality or usage of the discrete variables, and the possibility of quantifying them mission of relevant and inclusion of irrelevant of explanatory variables in the model easurement error, as a special issue of model specification, and their effect on the properties of the OLS method.

Verification of model specification
t - test, F - test, the coefficient of determination, adjusted coefficient of determination. various measurements of the model fit: Akaike information criterion (AIK) Schwarz (Bayesian) information criterion (SBK)

H0: model is specified correctly H1: model is not specified correctly
Ramsey's RESET Test Regression Specification Error Test Ramsey's RESET Test H0: model is specified correctly H1: model is not specified correctly

Verification of the model specification
Akaike information criterion Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC)

Dummy variables Examples:
dummy for gender – e.g. 1 for female, 0 for male. dummy for years in which there was some unusual circumstance, e.g. war– would equal 1 in war years, 0 otherwise. set of seasonal dummies – e.g. dummies for the four quarters of the year, each equal to 1 in its own quarter, 0 otherwise. set of category dummies – e.g. for different industries – Manufacturing dummy = 1 for manufacturing firms, 0 otherwise, retail dummy =1 for retail firms, 0 otherwise, etc.

dummy for gender – e.g. 1 for female, 0 for male.
dummy for education with three categories (replacement character: primary, secondary and higher education)

Education D1 D2 PE 1 SE HE

econometric model with dummy and quantitative explanatory variables

econometric model with artificial explanatory variables of expression, seasonally adjusted

Model with Dummy variables
E.g. an individual earnings equation might take the form Earnings = β0 + β1Education + β2Age + β3Age2 + β4Gender + u Gender = 1 for females and 0 for males. Separate equations: M: Earnings = β0 + β1Education + β2Age + β3Age2 + u F: Earnings = (β0 + β4) + β1Education + β2Age + β3Age2 + u

education vs. income among Gender

DATA TRANSFORMATION To introduce basic ideas behind
data transformations we first consider a simple linear regression model in which: We transform the predictor (x) values only. We transform the response (y) values only. We transform both the predictor (x) values and response (y) values.

Log-transforming Only the Predictor for
Lin-log model Y = β0 + β1 *ln X + ε Y = β0 + β1 *ln (time) + ε B1 coefficients represent the estimated unit change in your dependent variable for a percentage change in your independent variable.

Log-transforming Only the Response
Log-lin model ln Y = β0 + β1 * X + ε Log-Linear Trend Models Y = e^(β0 + β1 * t) + ε ln Y = β0 + β1 * t + ε B1 provides the instantaneous rate of growth for your dependent variable associated with a unit change in your independent variable The compounded growth rate is considered to be a more accurate estimate of the impact of X. compounded growth rate = e^B1-1

The log-log transformation
Cobb–Douglas production function Y = total production (the monetary value of all goods produced in a year) L = labor input K = capital input A = total factor productivity α and β – output elasticities

Log – log model

ELASTICITY B1 and B2 - elasticities of labor and capital. These values are constants determined by available technology. Elasticity - percentage change in Y for a given (small) percentage change in X

Heteroscedasticity Heteroskedasticity occurs when the variance of the disturbance is not constant. often encountered in cross section data not affect the parameter estimates bias the variance of the estimated parameters. t-values for your estimated coefficients cannot be trusted. Goldfeld–Quandt test Breusch–Pagan test or Lagrange Multiplier White test

White test White test has become extremely widely used, the most powerful and most respected. Auxiliary regression ei^2=b0 + b1*X1 + b2*X2 + b3*X1^2 + b4*X2^2 + b5*X1*X2 + ui

Autocorrelation violation of the ordinary least squares assumption that the error terms are uncorrelated Durbin–Watson Breusch–Godfrey test or Lagrange multiplier test

Durbin-Watson test

Multicollinearity occurs when two or more predictors in the model are correlated provide redundant information about the response Consequences: Increased standard error of estimates Often confusing and misleading results

Detecting multicollinearity
compute correlations between all pairs of predictors – correlation matrix Farrar-Glauber Test calculation of the paired correlation coefficients

Business statistics and econometrics

Similar presentations

Presentation on theme: "Business statistics and econometrics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Business statistics and econometrics

Similar presentations

Presentation on theme: "Business statistics and econometrics"— Presentation transcript:

Similar presentations

About project

Feedback