Count Models 2 Sociology 8811 Lecture 13 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission
Announcements Paper #1 deadline coming up: March 8 Class schedule You should have a dataset by now You should have some simple models by now If not, you need to do something right away!!! Class schedule Today: Talk a bit about papers Wrap up count models Thursday: New topic – Event History Analysis
Review: Count Models Many dependent variables are counts: Non-negative integers OLS is inappropriate: linearity and normality assumptions are violated Solution: Poisson & Negative Binomial models Coefficient interpretation = similar to logit Exponentiated coefficients show multiplicative effect on rate Poisson assumes there is no overdispersion Skewed variables may lead to overdispersion If overdispersion is identified, use neg binomial model Neg binomial model offers chi-square test to identify overdispersion!
Negative Binomial Example: Web Use Note: Info on overdispersion is provided Negative binomial regression Number of obs = 1552 LR chi2(5) = 57.80 Prob > chi2 = 0.0000 Log likelihood = -4368.6846 Pseudo R2 = 0.0066 ------------------------------------------------------------------------------ wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .3617049 .0634391 5.70 0.000 .2373666 .4860433 age | -.0109788 .0024167 -4.54 0.000 -.0157155 -.006242 educ | .0171875 .0120853 1.42 0.155 -.0064992 .0408742 lowincome | -.0916297 .0724074 -1.27 0.206 -.2335457 .0502862 babies | -.1238295 .0624742 -1.98 0.047 -.2462767 -.0013824 _cons | 1.881168 .1966654 9.57 0.000 1.495711 2.266625 /lnalpha | .2979718 .0408267 .217953 .3779907 alpha | 1.347124 .0549986 1.243529 1.459349 Likelihood-ratio test of alpha=0: chibar2(01) = 8459.61 Prob>=chibar2 = 0.000 Alpha is clearly > 0! Overdispersion is evident; LR test p<.05 You should not use Poisson Regression in this case
General Remarks It is often useful to try both Poisson and Negative Binomial models The latter allows you to test for overdispersion Use LRtest on alpha (a) to guide model choice If you don’t suspect dispersion and alpha appears to be zero, use Poission Regression It makes fewer assumptions Such as gamma-distributed error.
Example: Labor Militancy Isaac & Christiansen 2002 Note: Results are presented as % change
Zero-Inflated Poisson & NB Reg If outcome variable has many zero values it tends to be highly skewed Under those circumstances, NBREG works better than ordinary Poisson due to overdispersion But, sometimes you have LOTS of zeros. Even nbreg isn’t sufficient Model under-predicts zeros, doesn’t fit well Examples: # violent crimes committed by a person in a year # of wars a country fights per year # of foreign subsidiaries of firms.
Zero-Inflated Poisson & NB Reg Logic of zero-inflated models: Assume two types of groups in your sample Type A: Always zero – no probability of non-zero value Type ~A: Non-zero chance of positive count value Probability is variable, but not zero 1. Use logit to model group membership 2. Use poisson or nbreg to model counts for those in group ~A 3. Compute probabilities based on those results.
Zero-Inflated Poisson & NB Reg Example: Web usage at work More skewed than overall web usage. Why? Many people don’t have computers at work! So, web usage is zero for many
Zero-Inflated Poisson & NB Reg Zero-inflated models in Stata “zip” = Poisson, zinb = negative binomial Commands accept two separate variable lists Variables that affect counts For those with non-zero counts Modeled with Poisson or NB regression Variables that predict membership in “zero” group Modeled with logit Ex: zinb webatwork male age educ lowincome babies, inflate(male age educ lowincome babies)
ZINB Example: Web Hrs at Work “Inflate” output = logit for group membership Zero-inflated negative binomial regression Number of obs = 1135 Nonzero obs = 562 Zero obs = 573 Inflation model = logit LR chi2(5) = 13.25 Log likelihood = -2239.23 Prob > chi2 = 0.0212 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- webatwork | male | .2348353 .1298324 1.81 0.070 -.0196315 .4893021 age | -.0152071 .0053766 -2.83 0.005 -.0257451 -.0046692 educ | .0126503 .0265321 0.48 0.634 -.0393517 .0646523 lowincome | -.4183108 .2164324 -1.93 0.053 -.8425105 .0058889 babies | .0588977 .1385245 0.43 0.671 -.2126053 .3304008 _cons | 1.703158 .4538886 3.75 0.000 .8135524 2.592763 inflate | male | .2630493 .340892 0.77 0.440 -.4050866 .9311853 age | -.0197401 .0195075 -1.01 0.312 -.057974 .0184939 educ | -.3601863 .071167 -5.06 0.000 -.4996711 -.2207015 lowincome | .844378 .4013074 2.10 0.035 .0578299 1.630926 babies | .4504404 .2502363 1.80 0.072 -.0400138 .9408947 _cons | 4.137417 1.172503 3.53 0.000 1.839354 6.43548 Education reduces odds of zero value But doesn’t have an effect on count for those that are non-zero Model predicting zero group
Zero-Inflated Poisson & NB Reg Remarks ZINB produces estimate of alpha Helps choose between zip & zinb Long and Freese (2006) have helpful tool to compare fit of count models: countfit See textbook Zero-inflated models seem very useful Count variables often have many zeros It is often reasonable to assume a “always zero” group But, they are fairly new Not many examples in the literature Haven’t been widely scrutinized.
Zero-truncated Poisson & NB reg Truncation – the absence of information about cases in some range of a variable Example: Suppose we study income based on data from tax returns… Cases with income below a certain value are not required to submit a tax return… so data is missing Example: Data on # crimes committed, taken from legal records Individuals with zero crimes are not evident in data Example: An on-line survey of web use Individuals with zero web use are not in data Poisson & NB have been adapted to address truncated data: Zero-truncated Poisson & Zero-trunciated NB reg.
Example: Zero-truncated NB Reg Web use (zeros removed) Zero-truncated negative binomial regression Number of obs = 1304 LR chi2(5) = 34.87 Dispersion = mean Prob > chi2 = 0.0000 Log likelihood = -3653.162 Pseudo R2 = 0.0047 ------------------------------------------------------------------------------ wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .3744582 .0874595 4.28 0.000 .2030407 .5458758 age | -.0114399 .0033817 -3.38 0.001 -.0180679 -.0048119 educ | .0081191 .016731 0.49 0.627 -.024673 .0409112 lowincome | .1899431 .1111248 1.71 0.087 -.0278574 .4077437 babies | -.1375942 .0860954 -1.60 0.110 -.306338 .0311496 _cons | 1.533013 .2907837 5.27 0.000 .9630872 2.102938 /lnalpha | 1.099164 .1385789 .8275543 1.370774 alpha | 3.001656 .4159661 2.287717 3.938396 Likelihood-ratio test of alpha=0: chibar2(01) = 6857.67 Prob>=chibar2 = 0.000 Coefficient interpretation works just like ordinary poisson or NB regression.
Empirical Example 2 Example: Haynie, Dana L. 2001. “Delinquent Peers Revisited: Does Network Structure Matter?” American Journal of Sociology, 106, 4:1013-1057.