1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Econometrics I Professor William Greene Stern School of Business
Brief introduction on Logistic Regression
Comparing Two Proportions (p1 vs. p2)
1 Javier Aparicio División de Estudios Políticos, CIDE Otoño Regresión.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 Logit/Probit Models. 2 Making sense of the decision rule Suppose we have a kid with great scores, great grades, etc. For this kid, x i β is large. What.
Models with Discrete Dependent Variables
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland.
Ordered probit models.
The Simple Regression Model
Chi-square Test of Independence
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Discrete and Categorical Data William N. Evans Department of Economics/MPRC University of Maryland.
Chapter 11 Multiple Regression.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
EC Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Today Concepts underlying inferential statistics
BINARY CHOICE MODELS: LOGIT ANALYSIS
Generalized Linear Models
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 5–Hypothesis Testing Forever November 2, 2012.
Lecture 14-1 (Wooldridge Ch 17) Linear probability, Probit, and
Regression Analysis (2)
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Two-stage least squares 1. D1 S1 2 P Q D1 D2D2 S1 S2 Increase in income Increase in costs 3.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Multiple Logistic Regression STAT E-150 Statistical Methods.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
ANOVA, Regression and Multiple Regression March
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Nonparametric Statistics
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
BINARY LOGISTIC REGRESSION
Advanced Quantitative Techniques
assignment 7 solutions ► office networks ► super staffing
CHAPTER 7 Linear Correlation & Regression Methods
Hypothesis Testing Review
Generalized Linear Models
Multiple logistic regression
Chapter 7: The Normality Assumption and Inference with OLS
Introduction to Econometrics, 5th edition
Presentation transcript:

1 Section 3 Probit and Logit Models

2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient dies or not –Working or not –Smoker or not In data, y i =1 if yes, y i =0 if no

3 How to model the data generating process? There are only two outcomes Research question: What factors impact whether the event occurs? To answer, will model the probability the outcome occurs Pr(Y i =1) when y i =1 or Pr(Y i =0) = 1- Pr(Y i =1) when y i =0

4 Think of the problem from a MLE perspective Likelihood for i’th observation L i = Pr(Y i =1) Yi [1 - Pr(Y i =1)] (1-Yi) When y i =1, only relevant part is Pr(Y i =1) When y i =0, only relevant part is [1 - Pr(Y i =1)]

5 L = Σ i ln[L i ] = = Σ i {y i ln[Pr(y i =1)] + (1-y i )ln[Pr(y i =0)] } Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(y i =1)

6 Modeling the probability There is some process (biological, social, decision theoretic, etc) that determines the outcome y Some of the variables impacting are observed, some are not Requires that we model how these factors impact the probabilities Model from a ‘latent variable’ perspective

7 Consider a women’s decision to work y i * = the person’s net benefit to work Two components of y i * –Characteristics that we can measure Education, age, income of spouse, prices of child care –Some we cannot measure How much you like spending time with your kids how much you like/hate your job

8 We aggregate these two components into one equation y i * = β 0 + x 1i β 1 + x 2i β 2 +… x ki β k + ε i = x i β + ε i x i β (measurable characteristics but with uncertain weights) ε i random unmeasured characteristics Decision rule: person will work if y i * > 0 (if net benefits are positive) y i =1 if y i *>0 y i =0 if y i * ≤0

9 y i =1 if y i *>0 y i * = x i β + ε i > 0 only if ε i > - x i β y i =0 if y i * ≤0 y i * = x i β + ε i ≤ 0 only if ε i ≤ - x i β

10 Suppose x i β is ‘big.’ –High wages –Low husband’s income –Low cost of child care We would expect this person to work, UNLESS, there is some unmeasured ‘variable’ that counteracts this

11 Suppose a mom really likes spending time with her kids, or she hates her job. The unmeasured benefit of working has a big negative coefficient ε i If we observe them working, ε i must not have been too big, since y i =1 if ε i > - x i β

12 Consider the opposite. Suppose we observe someone NOT working. Then ε i must not have been big, since y i =0 if ε i ≤ - x i β

13 Logit Recall y i =1 if ε i > - x i β Since ε i is a logistic distribution Pr(ε i > - x i β) = 1 – F(- x i β) The logistic is also a symmetric distribution, so 1 – F(- x i β) = F(x i β) = exp(x i β)/(1+exp(x i β))

14 When ε i is a logistic distribution Pr(y i =1) = exp(x i β)/(1+exp(x i β)) Pr(y i =0) = 1/(1+exp(x i β))

15 Example: Workplace smoking bans Smoking supplements to 1991 and 1993 National Health Interview Survey Asked all respondents whether they currently smoke Asked workers about workplace tobacco policies Sample: workers Key variables: current smoking and whether they faced by workplace ban

16 Data: workplace1.dta Sample program: workplace1.doc Results: workplace1.log

17 Description of variables in data. desc; storage display value variable name type format label variable label > - smoker byte %9.0g is current smoking worka byte %9.0g has workplace smoking bans age byte %9.0g age in years male byte %9.0g male black byte %9.0g black hispanic byte %9.0g hispanic incomel float %9.0g log income hsgrad byte %9.0g is hs graduate somecol byte %9.0g has some college college float %9.0g

18 Summary statistics sum; Variable | Obs Mean Std. Dev. Min Max smoker | worka | age | male | black | hispanic | incomel | hsgrad | somecol | college |

19 Running a probit probit smoker age incomel male black hispanic hsgrad somecol college worka; The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables Includes a constant as a default

20 Running a logit logit smoker age incomel male black hispanic hsgrad somecol college worka; Same as probit, just change the first word

21 Running linear probability reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; Simple regression. Standard errors are incorrect (heteroskedasticity) robust option produces standard errors with arbitrary form of heteroskedasticity

22 Probit Results Probit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval] age | incomel | male | black | hispanic | hsgrad | somecol | college | worka | _cons |

23 How to measure fit? Regression (OLS) –minimize sum of squared errors –Or, maximize R 2 –The model is designed to maximize predictive capacity Not the case with Probit/Logit –MLE models pick distribution parameters so as best describe the data generating process –May or may not ‘predict’ the outcome well

24 Pseudo R 2 LL k log likelihood with all variables LL 1 log likelihood with only a constant 0 > LL k > LL 1 so | LL k | < |LL 1 | Pseudo R 2 = 1 - |LL 1 /LL k | Bounded between 0-1 Not anything like an R 2 from a regression

25 Predicting Y Let b be the estimated value of β For any candidate vector of x i, we can predict probabilities, P i P i = Ф(x i b) Once you have P i, pick a threshold value, T, so that you predict Y p = 1 if P i > T Y p = 0 if P i ≤ T Then compare, fraction correctly predicted

26 Question: what value to pick for T? Can pick.5 –Intuitive. More likely to engage in the activity than to not engage in it –However, when the  is small, this criteria does a poor job of predicting Y i =1 –However, when the  is close to 1, this criteria does a poor job of picking Y i =0

27 *predict probability of smoking; predict pred_prob_smoke; * get detailed descriptive data about predicted prob; sum pred_prob, detail; * predict binary outcome with 50% cutoff; gen pred_smoke1=pred_prob_smoke>=.5; label variable pred_smoke1 "predicted smoking, 50% cutoff"; * compare actual values; tab smoker pred_smoke1, row col cell;

28. sum pred_prob, detail; Pr(smoker) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis

29 Notice two things –Sample mean of the predicted probabilities is close to the sample mean outcome –99% of the probabilities are less than.5 –Should predict few smokers if use a 50% cutoff

30 | predicted smoking, is current | 50% cutoff smoking | 0 1 | Total | 12, | 12,167 | | | | | | | 4, | 4,091 | | | | | | Total | 16, | 16,258 | | | | | |

31 Check on-diagonal elements. The last number in each 2x2 element is the fraction in the cell The model correctly predicts = 74.90% of the obs It only predicts a small fraction of smokers

32 Do not be amazed by the 75% percent correct prediction If you said everyone has a  chance of smoking (a case of no covariates), you would be correct Max[( ,(1-  )] percent of the time

33 In this case, 25.16% smoke. If everyone had the same chance of smoking, we would assign everyone Pr(y=1) =.2516 We would be correct for the = people who do not smoke

34 Key points about prediction MLE models are not designed to maximize prediction Should not be surprised they do not predict well In this case, not particularly good measures of predictive capacity

35 Translating coefficients in probit: Continuous Covariates Pr(y i =1) = Φ[β 0 + x 1i β 1 + x 2i β 2 +… x ki β k ] Suppose that x 1i is a continuous variable d Pr(y i =1) /d x 1i = ? What is the change in the probability of an event give a change in x 1i?

36 Marginal Effect d Pr(y i =1) /d x 1i = β 1 φ[β 0 + x 1i β 1 + x 2i β 2 +… x ki β k ] Notice two things. Marginal effect is a function of the other parameters and the values of x.

37 Translating Coefficients: Discrete Covariates Pr(y i =1) = Φ[β 0 + x 1i β 1 + x 2i β 2 +… x ki β k ] Suppose that x 2i is a dummy variable (1 if yes, 0 if no) Marginal effect makes no sense, cannot change x 2i by a little amount. It is either 1 or 0. Redefine the variable of interest. Compare outcomes with and without x 2i

38 y 1 = Pr(y i =1 | x 2i =1) = Φ[β 0 + x 1i β 1 + β 2 + x 3i β 3 +… ] y 0 = Pr(y i =1 | x 2i =0) = Φ[β 0 + x 1i β 1 + x 3i β 3 … ] Marginal effect = y 1 – y 0. Difference in probabilities with and without x 2i?

39 In STATA Marginal effects for continuous variables, STATA picks sample means for X’s Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s

40 STATA command for Marginal Effects mfx compute; Must be after the outcome when estimates are still active in program.

41 Marginal effects after probit y = Pr(smoker) (predict) = variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X age | incomel | male*| black*| hispanic*| hsgrad*| somecol*| college*| worka*| (*) dy/dx is for discrete change of dummy variable from 0 to 1

42 Interpret results 10% increase in income will reduce smoking by 2.9 percentage points 10 year increase in age will decrease smoking rates.4 percentage points Those with a college degree are 21.5 percentage points less likely to smoke Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking

43 Do not confuse percentage point and percent differences –A 6.7 percentage point drop is 29% of the sample mean of 24 percent. –Blacks have smoking rates that are 3.2 percentage points lower than others, which is 13 percent of the sample mean

44 Comparing Marginal Effects VariableLPProbitLogit age incomel male Black hispanic hsgrad college worka

45 When will results differ? Normal and logit CDF look –Similar in the mid point of the distribution –Different in the tails You obtain more observations in the tails of the distribution when –Samples sizes are large –  approaches 1 or 0 These situations will produce more differences in estimates

46 Some nice properties of the Logit Outcome, y=1 or 0 Treatment, x=1 or 0 Other covariates, x Context, –x = whether a baby is born with a low weight birth –x = whether the mom smoked or not during pregnancy

47 Risk ratio RR = Prob(y=1|x=1)/Prob(y=1|x=0) Differences in the probability of an event when x is and is not observed How much does smoking elevate the chance your child will be a low weight birth

48 Let Y yx be the probability y=1 or 0 given x=1 or 0 Think of the risk ratio the following way Y 11 is the probability Y=1 when X=1 Y 10 is the probability Y=1 when X=0 Y 11 = RR*Y 10

49 Odds Ratio OR=A/B = [Y 11 /Y 01 ]/[Y 10 /Y 00 ] A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)] = odds of Y occurring if you are a smoker B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)] = odds of y happening if you are not a smoker What are the relative odds of Y happening if you do or do not experience X

50 Suppose Pr(Y i =1) = F(β o + β 1 X i + β 2 Z) and F is the logistic function Can show that OR = exp(β 1 ) = e β1 This number is typically reported by most statistical packages

51 Details Y 11 = exp(β o + β 1 + β 2 Z) /(1+ exp(β o + β 1 + β 2 Z) ) Y 10 = exp(β o + β 2 Z)/(1+ exp(β o +β 2 Z)) Y 01 = 1 /(1+ exp(β o + β 1 + β 2 Z) ) Y 00 = 1/(1+ exp(β o +β 2 Z) [Y 11 /Y 01 ] = exp(β o + β 1 + β 2 Z) [Y 10 /Y 00 ] = exp(β o + β 2 Z) OR=A/B = [Y 11 /Y 01 ]/[Y 10 /Y 00 ] = exp(β o + β 1 + β 2 Z)/ exp(β o + β 2 Z) = exp(β 1 )

52 Suppose Y is rare,  close to 0 –Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close to 1, so they cancel Therefore, when  is close to 0 –Odds Ratio = Risk Ratio Why is this nice?

53 Population attributable risk Average outcome in the population  = (1- ) Y 10 + Y 11 = (1- )Y 10 + (RR)Y 10 Average outcomes are a weighted average of outcomes for X=0 and X=1 What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0) Y a = Y 10

54 Population Attributable Risk PAR Fraction of outcome attributed to X The difference between the current rate and the rate that would exist without X, divided by the current rate PAR = (  – Y a )/  = (RR – 1) /[(1- ) + RR ]

55 Example: Maternal Smoking and Low Weight Births 6% births are low weight –< 2500 grams ( –Average birth is 3300 grams (5.5 lbs) Maternal smoking during pregnancy has been identified as a key cofactor –13% of mothers smoke –This number was falling about 1 percentage point per year during 1980s/90s –Doubles chance of low weight birth

56 Natality detail data Census of all births (4 million/year) Annual files starting in the 60s Information about –Baby (birth weight, length, date, sex, plurality, birth injuries) –Demographics (age, race, marital, educ of mom) –Birth (who delivered, method of delivery) –Health of mom (smoke/drank during preg, weight gain)

57 Smoking not available from CA or NY ~3 million usable observations I pulled.5% random sample from 1995 About 12,500 obs Variables: birthweight (grams), smoked, married, 4-level race, 5 level education, mothers age at birth

> - storage display value variable name type format label variable label > - birthw int %9.0g birth weight in grams smoked byte %9.0g =1 if mom smoked during pregnancy age byte %9.0g moms age at birth married byte %9.0g =1 if married race4 byte %9.0g 1=white,2=black,3=asian,4=other educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15, 5=16+ visits byte %9.0g prenatal visits

59 dummy | variable, | =1 | =1 if mom smoked ifBW<2500 | during pregnancy grams | 0 1 | Total | 11,626 1,745 | 13,371 | | | | | | | | 859 | | | | 6.04 | | Total | 12,285 1,945 | 14,230 | | | | | |

60 Notice a few things –13.7% of women smoke –6% have low weight birth Pr(LBW | Smoke) =10.28% Pr(LBW |~ Smoke) = 5.36% RR = Pr(LBW | Smoke)/ Pr(LBW |~ Smoke) = / = 1.92

61 Logit results Log likelihood = Pseudo R2 = lowbw | Coef. Std. Err. z P>|z| [95% Conf. Interval] smoked | age | married | _Ieduc5_2 | _Ieduc5_3 | _Ieduc5_4 | _Ieduc5_5 | _Irace4_2 | _Irace4_3 | _Irace4_4 | _cons |

62 Odds Ratios Smoked –exp(0.674) = 1.96 –Smokers are twice as likely to have a low weight birth _Irace4_2 (Blacks) –exp(0.707) = 2.02 –Blacks are twice as likely to have a low weight birth

63 Asking for odds ratios Logistic y x1 x2; In this case xi: logistic lowbw smoked age married i.educ5 i.race4;

64 Log likelihood = Pseudo R2 = lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] smoked | age | married | _Ieduc5_2 | _Ieduc5_3 | _Ieduc5_4 | _Ieduc5_5 | _Irace4_2 | _Irace4_3 | _Irace4_4 |

65 PAR PAR = (RR – 1) /[(1- ) + RR ] = RR = 1.96 PAR = % of low weight births attributed to maternal smoking

66 Hypothesis Testing in MLE models MLE are asymptotically normally distributed, one of the properties of MLE Therefore, standard t-tests of hypothesis will work as long as samples are ‘large’ What ‘large’ means is open to question What to do when samples are ‘small’ – table for a moment

67 Testing a linear combination of parameters Suppose you have a probit model Φ[β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 +… ] Test a linear combination or parameters Simplest example, test a subset are zero β 1 = β 2 = β 3 = β 4 =0 To fix the discussion N observations K parameters J restrictions (count the equals signs, j=4)

68 Wald Test Based on the fact that the parameters are distributed asymptotically normal Probability theory review –Suppose you have m draws from a standard normal distribution (z i ) –M = z z …. Z m 2 –M is distributed as a Chi-square with m degrees of freedom

69 Wald test constructs a ‘quadratic form’ suggested by the test you want to perform This combination, because it contains squares of the true parameters, should, if the hypothesis is true, be distributed as a Chi square with j degrees of freedom. If the test statistic is ‘large’, relative to the degrees of freedom of the test, we reject, because there is a low probability we would have drawn that value at random from the distribution

70 Reading values from a Table All stats books will report the ‘percentiles’ of a chi-square –Vertical axis (degrees of freedom) –Horizontal axis (percentiles) –Entry is the value where ‘percentile’ of the distribution falls below

71 Example: Suppose 4 restrictions 95% of a chi-square distribution falls below So there is only a 5% a number drawn at random will exceed If your test statistic is below, cannot reject null If your test statistics is above, reject null

72 Chi-square

73 Wald test in STATA Default test in MLE models Easy to do. Look at program test hsgrad somecol college Does not estimate the ‘restricted’ model ‘Lower power’ than other tests, i.e., high chance of false negative

74 -2 Log likelihood test * how to run the same tests with a -2 log like test; * estimate the unresticted model and save the estimates ; * in urmodel; probit smoker age incomel male black hispanic hsgrad somecol college worka; estimates store urmodel; * estimate the restricted model. save results in rmodel; probit smoker age incomel male black hispanic worka; estimates store rmodel; lrtest urmodel rmodel;

75 I prefer -2 log likelihood test –Estimates the restricted and unrestricted model –Therefore, has more power than a Wald test In most cases, they give the same ‘decision’ (reject/not reject)