Business statistics and econometrics

Slides:



Advertisements
Similar presentations
Econometric Modeling Through EViews and EXCEL
Advertisements

Managerial Economics in a Global Economy
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
The Multiple Regression Model.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Bivariate Regression Analysis
The Simple Linear Regression Model: Specification and Estimation
Linear Regression.
Chapter 13 Additional Topics in Regression Analysis
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Additional Topics in Regression Analysis
The Simple Regression Model
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Topic 3: Regression.
Econometrics I Summer 2011/2012 Course Guarantor: prof. Ing. Zlata Sojková, CSc., Lecturer: Ing. Martina Hanová, PhD.

Simple Linear Regression Analysis
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Regression Method.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Bivariate Regression Assumptions and Testing of the Model Economics 224, Notes for November 17, 2008.
FINANCIAL ECONOMETRIC Financial econometrics is the econometrics of financial markets Econometrics is a mixture of economics, mathematics and statistics.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Chapter 16 Data Analysis: Testing for Associations.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Lecturer: Ing. Martina Hanová, PhD. Business Modeling.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Lecturer: Ing. Martina Hanová, PhD. Business Modeling.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
Econometrics I Summer 2011/2012 Course Guarantor: prof. Ing. Zlata Sojková, CSc., Lecturer: Ing. Martina Hanová, PhD.
Heteroscedasticity Chapter 8
Chapter 13 Simple Linear Regression
Chapter 4 Basic Estimation Techniques
Linear Regression with One Regression
Regression Analysis AGEC 784.
Lecturer: Ing. Martina Hanová, PhD.
The Simple Linear Regression Model: Specification and Estimation
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Econometrics Econometrics I Summer 2011/2012
Econometric methods of analysis and forecasting of financial markets
Evgeniya Anatolievna Kolomak, Professor
Multivariate Regression
Fundamentals of regression analysis
ECONOMETRICS DR. DEEPTI.
Business Modeling Lecturer: Ing. Martina Hanová, PhD.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Introduction to Econometrics
CHAPTER 29: Multiple Regression*
Chapter 6: MULTIPLE REGRESSION ANALYSIS
6-1 Introduction To Empirical Models
PENGOLAHAN DAN PENYAJIAN
Undergraduated Econometrics
Tutorial 1: Misspecification
Lecturer: Ing. Martina Hanová, PhD.
Heteroskedasticity.
Simple Linear Regression
Linear Regression Summer School IFPRI
BEC 30325: MANAGERIAL ECONOMICS
Introduction to Regression
Regression Models - Introduction
Presentation transcript:

Business statistics and econometrics Lecturer: Ing. Martina Hanová, PhD.

Econometrics „Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena.“ (Arthur S. Goldberger)

Econometric Theory Econometrics - uses a variety of techniques, including regression analysis to compare and test two or more variables. Econometrics is a mixture of economic theory, mathematical economics, economic statistics, and mathematical statistics. Statistics Mathematics Economics Econometrics

Methodology of Econometrics Traditional or classical methodology 1. Statement of theory or hypothesis 2. Specification of the mathematical model 3. Specification of the statistical, or econometric model 4. Obtaining the data 5. Estimation of the parameters of the econometric model 6. Hypothesis testing 7. Forecasting or prediction 8. Using the model for control or policy purposes.

1. Theory or hypothesis A theory should have a prediction – hypothesis (in statistics and econometrics) Keynesian theory of consumption: Keynes stated - men are disposed to increase their consumption as their income increases, but not as much as the increase in their income. marginal propensity to consume (MPC) - is greater than zero but less than 1.

β1 intercept and β2 a slope coefficient. 2. Mathematical Model Mathematical equation: Y = β1 + β2X β1 intercept and β2 a slope coefficient. THEORY: Return to schooling is positive Y = wage X = number of years in school Keynesian consumption function: Y = consumption expenditure X = income β2 measures the MPC 0 < β2 < 1

3. Specification of the Econometric Model Mathematical model - deterministic relationship between variables Y = β1 + β2X + u Econometric model – random or stochastic relationship between variables Y = β1 + β2X +  u or  - disturbance, error term, or random (stochastic) variable - represents other non-quantifiable, unknown factors that affect Y, also represents mismeasurements EXAMLE: relationship between Crop yield vs. Rain fall

4. Obtain Data observational data non-experimental data, experimental data Types of Data time series data cross-section data pooled data Measurement of Scale Ratio scale Interval scale Ordinal scale Nominal scale

5. Estimation of the model to estimate the parameters of the function, β1 and β2, Regression analysis - statistical technique - the single most important tool at the econometrician’s disposal Ŷ = −184.08 + 0.7064X Ŷ - is an estimate of consumption

H1:the alternative hypothesis 6. Hypothesis Testing Statistical inference (hypothesis testing) We can use the information in the sample to make inferences about the population. We will have two hypotheses that go together, H0: the null hypothesis H1:the alternative hypothesis The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The alternative hypothesis represents the remaining outcomes of interest.

7. Forecasting forecast, variable Y on the basis of known or expected future value(s) of the explanatory, or predictor, variable X. 8. Use for Policy Recommendation

TERMINOLOGY AND NOTATION Independent variable Explanatory variable Predictor Regressor Stimulus Exogenous Covariate Control variable Dependent variable Explained variable Predictand Regressand Response Endogenous Outcome Controlled variable two-variable (simple) regression analysis multiple regression analysis multivariate regression vs. multiple regression

Determining the Regression Coefficients Finding a Line of Best Fit We can use the general equation for a straight line, to get the line that best “fits” the data.

Ordinary Least Squares (OLS) The most common method used to fit a line to the data is known as OLS (ordinary least squares). Actual and Fitted Value

The Theory of OLS min  ei2 = e12 + e22 + e32 +.........+ en2 E(YiXi) = o + 1Xi population regression line (PRF) Ŷi= bo + b1Xi sample regression equation (SRF) min  ei2 = e12 + e22 + e32 +.........+ en2

How does OLS get estimates of the coefficients? Excel Tools/data analysis/ regression Matrix form Formula – mathematical function

RESULTS Interpretation of the regression output: R Square – Multiple R – Standard Error of the regression - Intercept – Regression Coefficient –

Assumptions of Classical SIMPLE Linear Regression Model(SLRM) How do we evaluate a model? How do we know if the model we are using is good? assumptions relate to the (population) prediction errors study through the (sample) estimated errors, the residuals

Assumption of Ordinary Least Squares The four conditions of the SRL model What can be wrong with our model? Yi is a Linear function of the Xi Errors εi, are Independent. Errors εi are Normally distributed. Errors εihave Equal variances (denoted σ2).

Why we have to evaluate any regression model? all of the estimates, intervals, and hypothesis tests arising in a regression analysis have been developed assuming that the model is correct. all the formulas depend on the model being correct! if the model is incorrect, then the formulas and methods (OLS) we use are at risk of being incorrect.

Assumption 1 Model is linear in parameters

Linear regression model Model is linear in parameters LRM     NRM

Zero mean value of disturbances - ui. ASSUMPTION 1 Zero mean value of disturbances - ui.

ASSumption 2 No autocorrelation between the disturbances The data are a random sample of the population

ASSumption 3 Equal variance of disturbences - ui Errors have constant variance “homoskedasticity” Errors have non-constant variance “heteroskedasticity”

variation covariance matrix Construction of var-cov matrix: vector ei * transpose vector ei

Assumption 4 The errors are normally distributed Normal Probability Plot

An alternative way to describe all four assumptions The errors, ϵi, are independent normal random variables with mean zero and constant variance, σ2

what can be wrong with our model The population regression function is not linear. The error terms are not independent. The error terms are not normally distributed. The error terms do not have equal variance.

ASSumption 5 Zero covariance between ui and Xi

ASSumption 6 the number of >= the number of observations explanatory variables

MLR Model Assumptions The mean of the response , E(Yi), at each set of values of the predictor, (x1i,x2i,…), is a Linear function of the predictors. The errors, εi, are Independent. The errors, εi, at each set of values of the predictor, (x1i,x2i,…), are Normally distributed. The errors, εi, at each set of values of the predictor, (x1i,x2i,…) have Equal variances (denoted σ2).

some of the model conditions are more forgiving than others. All tests and intervals are very sensitive to even minor variance from independence. All tests and intervals are sensitive to moderate variance from equal variance. The hypothesis tests and confidence intervals for βi are fairly "robust" (that is, forgiving) against variance from normality. Prediction intervals are quite sensitive to variance from normality.

Assumption of the ordinary least squares The Gauss Markov theorem When the first 4 assumptions of the simple regression model are satisfied the parameter estimates are unbiased and have the smallest variance among other linear unbiased estimators. The OLS estimators are therefore called BLUE for Best Linear Unbiased Estimators

multiple regression model

multiple regression model Predicted values of Y, Comparison of real (actual) and predicted values of Y RESIDUAL OUTPUT Observation Predicted Y Residuals Standard Residuals 1 28,852139 -1,052 -0,604609351 2 31,934008 -2,034 -1,16883838 3 31,048974 -1,249 -0,717720232 4 31,858932 -1,059 -0,608512645 5 33,185693 -1,986 -1,141073963

multiple regression model Residual variance, Variances of parameters and their standard deviations.

Regression Statistics correlation coefficient, coefficient of determination adjusted coefficient of determination.

Confidence interval for estimated parameters

test the statistical significance of the model as a whole Verification of the statistical significance of the model by using F-test

test the statistical significance of the estimated regression parameters Verification of the statistical significance of parameters by using T-test

Calculation of standardized coefficients or beta coefficients βj adj. = βj * R2 *100 [%]

SPECIFICATION OF ECONOMETRIC MODELS Model specification refers to the determination of which independent variables should be included in or excluded from a regression equation. Specification is the first and most critical of stage. Our estimates of the parameters of a model and our interpretation of them depend on the correct specification of the model.

Most often problems: incorrect specification of functional form of the model quality or usage of the discrete variables, and the possibility of quantifying them mission of relevant and inclusion of irrelevant of explanatory variables in the model easurement error, as a special issue of model specification, and their effect on the properties of the OLS method.

Verification of model specification t - test, F - test, the coefficient of determination, adjusted coefficient of determination. various measurements of the model fit: Akaike information criterion (AIK) Schwarz (Bayesian) information criterion (SBK)

H0: model is specified correctly H1: model is not specified correctly Ramsey's RESET Test Regression Specification Error Test Ramsey's RESET Test H0: model is specified correctly H1: model is not specified correctly

Verification of the model specification Akaike information criterion Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC)

Dummy variables Examples: dummy for gender – e.g. 1 for female, 0 for male. dummy for years in which there was some unusual circumstance, e.g. war– would equal 1 in war years, 0 otherwise. set of seasonal dummies – e.g. dummies for the four quarters of the year, each equal to 1 in its own quarter, 0 otherwise. set of category dummies – e.g. for different industries – Manufacturing dummy = 1 for manufacturing firms, 0 otherwise, retail dummy =1 for retail firms, 0 otherwise, etc.

dummy for gender – e.g. 1 for female, 0 for male. dummy for education with three categories (replacement character: primary, secondary and higher education)

Education D1 D2 PE 1 SE HE

econometric model with dummy and quantitative explanatory variables

econometric model with artificial explanatory variables of expression, seasonally adjusted

Model with Dummy variables E.g. an individual earnings equation might take the form Earnings = β0 + β1Education + β2Age + β3Age2 + β4Gender + u Gender = 1 for females and 0 for males. Separate equations: M: Earnings = β0 + β1Education + β2Age + β3Age2 + u F: Earnings = (β0 + β4) + β1Education + β2Age + β3Age2 + u

education vs. income among Gender

DATA TRANSFORMATION To introduce basic ideas behind data transformations we first consider a simple linear regression model in which: We transform the predictor (x) values only. We transform the response (y) values only. We transform both the predictor (x) values and response (y) values.

Log-transforming Only the Predictor for Lin-log model Y = β0 + β1 *ln X + ε Y = β0 + β1 *ln (time) + ε B1 coefficients represent the estimated unit change in your dependent variable for a percentage change in your independent variable.

Log-transforming Only the Response Log-lin model ln Y = β0 + β1 * X + ε Log-Linear Trend Models Y = e^(β0 + β1 * t) + ε ln Y = β0 + β1 * t + ε B1 provides the instantaneous rate of growth for your dependent variable associated with a unit change in your independent variable The compounded growth rate is considered to be a more accurate estimate of the impact of X. compounded growth rate = e^B1-1

The log-log transformation Cobb–Douglas production function Y = total production (the monetary value of all goods produced in a year) L = labor input K = capital input A = total factor productivity α and β – output elasticities

Log – log model

ELASTICITY B1 and B2 - elasticities of labor and capital. These values are constants determined by available technology. Elasticity - percentage change in Y for a given (small) percentage change in X

Heteroscedasticity Heteroskedasticity occurs when the variance of the disturbance is not constant. often encountered in cross section data not affect the parameter estimates bias the variance of the estimated parameters. t-values for your estimated coefficients cannot be trusted. Goldfeld–Quandt test Breusch–Pagan test or Lagrange Multiplier White test

White test White test has become extremely widely used, the most powerful and most respected. Auxiliary regression ei^2=b0 + b1*X1 + b2*X2 + b3*X1^2 + b4*X2^2 + b5*X1*X2 + ui

Autocorrelation violation of the ordinary least squares assumption that the error terms are uncorrelated Durbin–Watson Breusch–Godfrey test or Lagrange multiplier test

Durbin-Watson test

Multicollinearity occurs when two or more predictors in the model are correlated provide redundant information about the response Consequences: Increased standard error of estimates Often confusing and misleading results

Detecting multicollinearity compute correlations between all pairs of predictors – correlation matrix Farrar-Glauber Test calculation of the paired correlation coefficients