Count Models 2 Sociology 8811 Lecture 13

Slides:



Advertisements
Similar presentations
Event History Models 1 Sociology 229A: Event History Analysis Class 3
Advertisements

Brief introduction on Logistic Regression
Logit & Probit Regression
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Count Models Sociology 229: Advanced Regression Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Multilevel Models 2 Sociology 8811, Class 24
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Generalized Linear Models
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Multinomial Logit Sociology 8811 Lecture 10
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Count Models 1 Sociology 8811 Lecture 12
Limited Dependent Variables Ciaran S. Phibbs May 30, 2012.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Logistic Regression Analysis Gerrit Rooks
Analysis of Experimental Data IV Christoph Engel.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Exact Logistic Regression
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Logistic Regression: Regression with a Binary Dependent Variable.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
QM222 Class 9 Section A1 Coefficient statistics
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
EHS Lecture 14: Linear and logistic regression, task-based assessment
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
Microeconometric Modeling
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
assignment 7 solutions ► office networks ► super staffing
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 Class 11 Section A1 Multiple Regression
Event History Analysis 3
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
QM222 Class 8 Section A1 Using categorical data in regression
Generalized Linear Models
Introduction to Logistic Regression
Microeconometric Modeling
What is Regression Analysis?
Logistic Regression.
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Problems with infinite solutions in logistic regression
Logistic Regression 4 Sociology 8811 Lecture 9
Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Common Statistical Analyses Theory behind them
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Presentation transcript:

Count Models 2 Sociology 8811 Lecture 13 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements Paper #1 deadline coming up: March 8 Class schedule You should have a dataset by now You should have some simple models by now If not, you need to do something right away!!! Class schedule Today: Talk a bit about papers Wrap up count models Thursday: New topic – Event History Analysis

Review: Count Models Many dependent variables are counts: Non-negative integers OLS is inappropriate: linearity and normality assumptions are violated Solution: Poisson & Negative Binomial models Coefficient interpretation = similar to logit Exponentiated coefficients show multiplicative effect on rate Poisson assumes there is no overdispersion Skewed variables may lead to overdispersion If overdispersion is identified, use neg binomial model Neg binomial model offers chi-square test to identify overdispersion!

Negative Binomial Example: Web Use Note: Info on overdispersion is provided Negative binomial regression Number of obs = 1552 LR chi2(5) = 57.80 Prob > chi2 = 0.0000 Log likelihood = -4368.6846 Pseudo R2 = 0.0066 ------------------------------------------------------------------------------ wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .3617049 .0634391 5.70 0.000 .2373666 .4860433 age | -.0109788 .0024167 -4.54 0.000 -.0157155 -.006242 educ | .0171875 .0120853 1.42 0.155 -.0064992 .0408742 lowincome | -.0916297 .0724074 -1.27 0.206 -.2335457 .0502862 babies | -.1238295 .0624742 -1.98 0.047 -.2462767 -.0013824 _cons | 1.881168 .1966654 9.57 0.000 1.495711 2.266625 /lnalpha | .2979718 .0408267 .217953 .3779907 alpha | 1.347124 .0549986 1.243529 1.459349 Likelihood-ratio test of alpha=0: chibar2(01) = 8459.61 Prob>=chibar2 = 0.000 Alpha is clearly > 0! Overdispersion is evident; LR test p<.05 You should not use Poisson Regression in this case

General Remarks It is often useful to try both Poisson and Negative Binomial models The latter allows you to test for overdispersion Use LRtest on alpha (a) to guide model choice If you don’t suspect dispersion and alpha appears to be zero, use Poission Regression It makes fewer assumptions Such as gamma-distributed error.

Example: Labor Militancy Isaac & Christiansen 2002 Note: Results are presented as % change

Zero-Inflated Poisson & NB Reg If outcome variable has many zero values it tends to be highly skewed Under those circumstances, NBREG works better than ordinary Poisson due to overdispersion But, sometimes you have LOTS of zeros. Even nbreg isn’t sufficient Model under-predicts zeros, doesn’t fit well Examples: # violent crimes committed by a person in a year # of wars a country fights per year # of foreign subsidiaries of firms.

Zero-Inflated Poisson & NB Reg Logic of zero-inflated models: Assume two types of groups in your sample Type A: Always zero – no probability of non-zero value Type ~A: Non-zero chance of positive count value Probability is variable, but not zero 1. Use logit to model group membership 2. Use poisson or nbreg to model counts for those in group ~A 3. Compute probabilities based on those results.

Zero-Inflated Poisson & NB Reg Example: Web usage at work More skewed than overall web usage. Why? Many people don’t have computers at work! So, web usage is zero for many

Zero-Inflated Poisson & NB Reg Zero-inflated models in Stata “zip” = Poisson, zinb = negative binomial Commands accept two separate variable lists Variables that affect counts For those with non-zero counts Modeled with Poisson or NB regression Variables that predict membership in “zero” group Modeled with logit Ex: zinb webatwork male age educ lowincome babies, inflate(male age educ lowincome babies)

ZINB Example: Web Hrs at Work “Inflate” output = logit for group membership Zero-inflated negative binomial regression Number of obs = 1135 Nonzero obs = 562 Zero obs = 573 Inflation model = logit LR chi2(5) = 13.25 Log likelihood = -2239.23 Prob > chi2 = 0.0212 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- webatwork | male | .2348353 .1298324 1.81 0.070 -.0196315 .4893021 age | -.0152071 .0053766 -2.83 0.005 -.0257451 -.0046692 educ | .0126503 .0265321 0.48 0.634 -.0393517 .0646523 lowincome | -.4183108 .2164324 -1.93 0.053 -.8425105 .0058889 babies | .0588977 .1385245 0.43 0.671 -.2126053 .3304008 _cons | 1.703158 .4538886 3.75 0.000 .8135524 2.592763 inflate | male | .2630493 .340892 0.77 0.440 -.4050866 .9311853 age | -.0197401 .0195075 -1.01 0.312 -.057974 .0184939 educ | -.3601863 .071167 -5.06 0.000 -.4996711 -.2207015 lowincome | .844378 .4013074 2.10 0.035 .0578299 1.630926 babies | .4504404 .2502363 1.80 0.072 -.0400138 .9408947 _cons | 4.137417 1.172503 3.53 0.000 1.839354 6.43548 Education reduces odds of zero value But doesn’t have an effect on count for those that are non-zero Model predicting zero group

Zero-Inflated Poisson & NB Reg Remarks ZINB produces estimate of alpha Helps choose between zip & zinb Long and Freese (2006) have helpful tool to compare fit of count models: countfit See textbook Zero-inflated models seem very useful Count variables often have many zeros It is often reasonable to assume a “always zero” group But, they are fairly new Not many examples in the literature Haven’t been widely scrutinized.

Zero-truncated Poisson & NB reg Truncation – the absence of information about cases in some range of a variable Example: Suppose we study income based on data from tax returns… Cases with income below a certain value are not required to submit a tax return… so data is missing Example: Data on # crimes committed, taken from legal records Individuals with zero crimes are not evident in data Example: An on-line survey of web use Individuals with zero web use are not in data Poisson & NB have been adapted to address truncated data: Zero-truncated Poisson & Zero-trunciated NB reg.

Example: Zero-truncated NB Reg Web use (zeros removed) Zero-truncated negative binomial regression Number of obs = 1304 LR chi2(5) = 34.87 Dispersion = mean Prob > chi2 = 0.0000 Log likelihood = -3653.162 Pseudo R2 = 0.0047 ------------------------------------------------------------------------------ wwwhr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .3744582 .0874595 4.28 0.000 .2030407 .5458758 age | -.0114399 .0033817 -3.38 0.001 -.0180679 -.0048119 educ | .0081191 .016731 0.49 0.627 -.024673 .0409112 lowincome | .1899431 .1111248 1.71 0.087 -.0278574 .4077437 babies | -.1375942 .0860954 -1.60 0.110 -.306338 .0311496 _cons | 1.533013 .2907837 5.27 0.000 .9630872 2.102938 /lnalpha | 1.099164 .1385789 .8275543 1.370774 alpha | 3.001656 .4159661 2.287717 3.938396 Likelihood-ratio test of alpha=0: chibar2(01) = 6857.67 Prob>=chibar2 = 0.000 Coefficient interpretation works just like ordinary poisson or NB regression.

Empirical Example 2 Example: Haynie, Dana L. 2001. “Delinquent Peers Revisited: Does Network Structure Matter?” American Journal of Sociology, 106, 4:1013-1057.