Issues in Estimation Data Generating Process:

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

The Maximum Likelihood Method
Things to do in Lecture 1 Outline basic concepts of causality
Multiple Regression Analysis
The Simple Regression Model
Objectives 10.1 Simple linear regression
Copyright © 2010 Pearson Education, Inc. Slide
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Sampling Distributions
Business Statistics for Managerial Decision
Part 1 Cross Sectional Data
Maximum likelihood (ML) and likelihood ratio (LR) test
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 4 Multiple Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
The Simple Regression Model
Inferences About Process Quality
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Valuing Changes in Environmental Amenities When the amenity is a quality characteristic of a privately consumed good The good’s price is not affected by.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
CORRELATION & REGRESSION
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Copyright © Cengage Learning. All rights reserved. 4 Quadratic Functions.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Managerial Economics Demand Estimation & Forecasting.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Chapter 13: Limited Dependent Vars. Zongyi ZHANG College of Economics and Business Administration.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Example x y We wish to check for a non zero correlation.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Econometric analysis of CVM surveys. Estimation of WTP The information we have depends on the elicitation format. With the open- ended format it is relatively.
Estimating standard error using bootstrap
The simple linear regression model and parameter estimation
The Maximum Likelihood Method
Limited Dependent Variables
Linear Regression.
The Maximum Likelihood Method
The Maximum Likelihood Method
LIMITED DEPENDENT VARIABLE REGRESSION MODELS
Simultaneous Equations Models
Presentation transcript:

Issues in Estimation Data Generating Process: What behavior and what sampling process generated data that you have collected?

Estimation Are you gathering a random sample of all possible participants (e.g. telephone or mail survey of population)? Or, are you sampling on site?

1. Censored Samples If you sample a population of potential participants, you will find that some took trips to the site of interest and some (many?) took no trips. Plot of trip cost against number of trips for all observations in a hypothetical sample. Trip cost x x x x x x x Non-participants x x x x x x x x x x x x x x 0 trips Number of trips

Here’s a hypothetical data set and actual least squares regression lines Least Squares line including zeros Least squares line excluding zeros Which, if either, is right? Answer: Neither

Censored Samples – empirical models to analyze them: Tobit model – Assumes an underlying latent variable that could be negative Count models – Recognizes that trips are non-negative integers Sample selection models– Models the participation decision differently from the trips decision

Tobit Model Underlying model Latent variable: But, zi = zi*, if zi* > 0 zi = 0, if zi*  0 (To cut down on notation, 0i stands for the intercept and all other covariates that might be in the model, so it varies over individuals.)

Estimation by Maximum Likelihood Every observation makes contribution to the likelihood function. Contribution by non-trip takers: Pr(zi*  0) = where F is the cumulative distribution function for ; the x’s are the explanatory variables in the model, including cost of access.

Contribution by trip takers: Note: this is the same expression as for ordinary least squares.

Tobit – maximize the following likelihood function Likelihood function equals: where T is the set of trip takers and N is the set of non-trip takers

For our simple example: Ordinary Least Squares estimates: 0 = 8.89 1 = -.28 2 = 2.4 Tobit estimates: 0 =13.81 1 = -.72 2 = 2.3 OLS Tobit

How do we get welfare measures in the Tobit? The Tobit is usually estimated in linear form. The area behind a linear demand function is given by:

But how do you evaluate this expression? Use as estimate for 1; But what do you use for zi? Do you use the individual’s actual number of trips? Or do you use the predicted number of trips using the model? estimated function . ci zi

If you want to use the predicted number of trips... You must calculate the expected value of trips in the Tobit framework – which is a somewhat complicated expression. Fortunately, LIMDEP* will do this for you in a simple command. You should know that expected trips will always be positive in the Tobit. *LIMDEP is a software package by William Greene, Columbia University

The answers can be quite different… but the choice is not obvious. In our simple example, the difference isn’t great. Using Actual z Using Predicted z Ave. trips 2.53 2.55 Ave. consumer surplus $15.32 $13.93 Total CS for sample $459.60 $417.90 Difference in average consumer surplus is due to nonlinearity of consumer surplus in trips.

Reasons for using one rather than another… Use the expected value of trips, if you think the dominant source of “error” is from measurement. Use the actual number of trips, if you think the dominant source of “error” is from specification. (Note: in the Tobit, the predicted number of trips is never zero.)

Getting an estimate for the population If your sample is a random sample of the population: average CS * population

Count Models The Tobit assumes an underlying latent variable that can take on negative values. Count models explicitly account for the fact that the dependent variable, trips, can only be an integer and can only be non-negative.

Count Models.. …specify that the quantity demanded of trips is a non-negative random variable whose mean is a function of the exogenous regressors in the model.

The Poisson Distribution is a common choice Where the mean is i and it is usually modeled as:

Intuition? The Poisson model implies that the number of trips a person decides to take is a random variable drawn from a distribution that only allows non-negative integers. The distribution can be centered around different non-negative numbers, however, depending on the exogenous variables the individual faces. E.g. A person with a relatively low access cost will face a distribution with a higher mean number of trips.

An individual’s contribution to the likelihood function in the Poisson is this very complicated looking expression: (Note: 0! is defined mathematically as =1) Fortunately, LIMDEP will estimate this for you without any hard work on your part.

Getting Welfare Measures in the Poisson The expected number of trips for an individual is the mean of the Poisson distribution for that individual. The mean is i in the above expression and is a usually specified as a semi-log function of the explanatory variables:

We saw earlier that… the area under a semi-log demand function is given by: Because CS is linear in trips for a semi-log function, it does not matter whether you use actual or expected trips. The answer is the same.

Welfare measures in our simple hypothetical case Using Actual z Using Predicted z Ave. trips 2.53 2.53 Ave. consumer surplus $14.90 $14.90 Total CS for sample $447.00 $447.00 The Poisson has the property that the mean of expected trips = mean of actual trips. The formula for consumer surplus in a semi-log function is linear in trips. THEREFORE, it does not matter in this model whether you use expected or actual trips.

Another Popular Count Model The negative binomial distribution is also used often. It is a more general distribution than the Poisson, in that it does not constrain the mean and the variance to be equal. See LIMDEP if you wish to estimate this model.

Participation vs Demand for Trips In the above models, the same model affects how many trips a user takes and whether or not he is a user. Suppose different factors affected whether he used the site how many times he used the site, if he did use the site Two types of models (see LIMDEP): Combination of probit and truncated models (E.g. Cragg) Selection models (e.g. Heckman)

2. Truncated Samples Now suppose you have only collected data from people who actually visit the site. There will be no zeros in this dataset. Do you still need to make econometric adjustments?

The answer is “YES” Ordinary least squares assumes that every observation is drawn from a normal distribution with a given variance.

Let’s look at data again… Remember the model is: OLS assumes that Trip cost Result of running OLS regression Distribution is truncated for obs near access x x x x Relationship you want x x x x x x x x x x x 0 trips Number of trips

OLS applied to truncated data produces biased slope estimates if truncation is “relevant”. The bias will generate a larger negative estimate for the slope of the line in the graph, which is really a smaller negative estimate for 1. Since -1 is in the denominator of the consumer surplus formula, the result will be an over-estimate of consumer surplus.

Contribution to the Likelihood Function in the Truncated Model Pr (trips=zi|trips>0) =

The difference between the OLS and Truncated estimated relationship for our simple hypothetical data OLS Regression line Truncated regression

Oh no, another problem! The reason you have only non-zero observations for trips is probably because you sampled on site. On-site sampling is often the only practical way to get enough information on users of a site.

But this, too, causes problems! If you randomly sample on-site, you are actually randomly sampling trips instead of trip-takers. This is not a random sample of users of the site. The problem is called “endogenous stratification”.

A simple example.. Suppose there are only two types of users: 25 users take 1 trip to site 75 users take 2 trips to site Total number of trips taken = 175. Average number of trips taken = 1.75. Now, suppose you randomly sample trips (not users). Prob. of encountering a 1-trip user = 25/175 = .14 (rather than .25) Prob. of encountering a 2-trip user = 75/175 = .86 (rather than .75)

Parameter estimates for our little sample: A solution to endogenous stratification is to weight each observation by 1/trips. Parameter estimates for our little sample: * * *Note: for many problems the truncated model does not converge in estimation.

A Better and Easier Alternative Poisson Count Model: Easy to estimate with truncation. Easy to estimate with truncation and endogenous stratification “It turns out that”….. You can solve both the truncation and the endogenous stratification problem by: estimating the regular Poisson with the value zi –1 substituted for zi in estimation

Poisson Endogenous Stratification Results and Welfare Estimates *Note: Remember that this is basically a semi-log demand function so the parameters are not directly comparable to the parameters in the previous models.

Welfare Calculation Average WTP estimate for elimination of site Note: must also be adjusted for endogenous stratification. Mean number of trips = N=number of individuals sampled zn = number of trips taken by individual n