Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
EPI 809/Spring Probability Distribution of Random Error.
Logistic Regression Example: Horseshoe Crab Data
PROC GLIMMIX: AN OVERVIEW
Copyright © 2003, SAS Institute Inc. All rights reserved.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Modeling Wim Buysse RUFORUM 1 December 2006 Research Methods Group.
Chapter 12 Multiple Regression
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Log-linear and logistic models
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
OLS versus MLE Example YX Here is the data:
Assumption and Data Transformation. Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly,
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Simple Linear Regression Analysis
Generalized Linear Models
Correlation & Regression
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Regression and Correlation Methods Judy Zhong Ph.D.
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Inference for regression - Simple linear regression
3 CHAPTER Cost Behavior 3-1.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Discrete Random Variables Chapter 4.
Stephen Mildenhall September 2001
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
CHAPTER 14 MULTIPLE REGRESSION
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Practical GLM Modeling of Deductibles
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
Linear Model. Formal Definition General Linear Model.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
1 STA 617 – Chp9 Loglinear/Logit Models 9.7 Poisson regressions for rates  In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Session C7: Dynamic Risk Modeling Loss Simulation Model Working Party Basic Model Underlying Prototype Presented by Robert A. Bear Consulting Actuary and.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Logistic Regression and Odds Ratios Psych DeShon.
Introduction to Probability - III John Rundle Econophysics PHYS 250
BINARY LOGISTIC REGRESSION
Discrete Random Variables
Logistic Regression APKC – STATS AFAC (2016).
Generalized Linear Models
Generalized Linear Models
What is Regression Analysis?
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS

Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS OVERVIEW

Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW GENERAL LINEAR MODELS Actually, proc glm

Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW GENERALIZED LINEAR MODELS The distribution of the observations can come from the exponential family of distributions. The variance of the response variable is a specified function of its mean. X  is fit to a function of E(y) (called a link function) suggested by the distribution of the observations: g(E(y)) = g(  ) = X  … Link function

Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW LOGIT LINK FUNCTION FOR BINARY RESPONSE Logit (p i ) Predictor Logit Transform Predictor pipi

Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW LOG LINK FUNCTION FOR COUNT DATA Count Log(count) Log Transform Predictor

Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW EXAMPLES OF GENERALIZED LINEAR MODELS *Models often use the LOG link in practice.

Copyright © 2013, SAS Institute Inc. All rights reserved. Poisson reg Ols reg Не постоянная дисперсия Negative Binomial Distribution [for count data] Overdispersion Poisson reg [for rate data] Events over time or area … Gamma reg Много нулей ZIP ZIN

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION PROPERTIES AND EXAMPLES is one type of generalized linear model assumes that the response variable follows a Poisson distribution conditional on the values of the predictor variables can be used to model the number of occurrences of an event of interest or the rate of occurrence of an event of interest as a function of some predictor variables is most appropriate for rare events Examples include number of ear infections in infants number of equipment failures colony counts for bacteria or viruses counts of a rare disease in a population number of fatal crashes at an intersection homicide rates in a given state rate of insurance claims number of infected areas per unit volume of a tree response rates to a marketing campaign Response dist. should have small mean (<10 or even <5 and ideally ~1) If no, gamma and lognormal could be better choice

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION POISSON VERSUS NORMAL DISTRIBUTION Poisson distribution is skewed to the right for rare events is for nonnegative integer values has only one parameter (the mean) has a variance that is equal to the mean Normal distribution is symmetric can be for negative as well as positive real values has two unrelated parameters (mean and variance)

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION MODEL

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION PARAMETER ESTIMATES multiplicative effect on for a one-unit change in X. 1.20, then a one-unit increase in X 1 yields a 20% increase in the estimated mean. 0.80, then a one-unit increase in X 2 yields a 20% decrease in the estimated mean. Example 1, if Example 2, if

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION ESTIMATE STATEMENT

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION ПРИМЕР: ДАННЫЕ Gender Number of Self-Diagnosed Ear Infections Age in Years Frequent or Occasional Ocean Swimmer Typical Swimming Location

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION CATEGORICAL Gender Frequent or Occasional Ocean Swimmer Typical Swimming Location Occasional Freq Beach nonBeach Male Female

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION INTERVAL Age in Years

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION ПРИМЕР proc genmod data=sasuser.earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age age*age / dist=poisson link=log type3; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION ПРИМЕР: PROC GENMOD OUTPUT Scale = 1 *

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION Poisson regression models assume the variance is equal to the mean. Count data often exhibit variability exceeding the mean. Overdispersion leads to underestimates of the standard errors of parameter estimates. Overdispersion results in overestimates of the test statistic and liberal p-values. OVERDISPERSION Subject heterogeneity due to an under-specified model Outliers in the data Positive correlation between the responses in clustered data WHAT TO DO Use the negative binomial distribution [NOW] Apply a multiplicative adjustment factor (PSCALE or DSCALE option in the MODEL statement) [HW]

Copyright © 2013, SAS Institute Inc. All rights reserved. NEGATIVE BINOMIAL REGRESSION

Copyright © 2013, SAS Institute Inc. All rights reserved. NEGATIVE BINOMIAL REGRESSION DISTRIBUTION AND MODEL The negative binomial distribution is the distribution for count data that permits the variance to exceed the mean enables the model to have greater flexibility in modeling the relationship between the mean and the variance of the response variable than the Poisson model Natural LogNegative Binomial Count Variance Function Link Function DistributionResponse Variable

Copyright © 2013, SAS Institute Inc. All rights reserved. NEGATIVE BINOMIAL REGRESSION DISPERSION PARAMETER K The dispersion parameter k is not allowed to vary over observations. The limiting case when the parameter k is equal to 0 corresponds to a Poisson regression model. When the parameter is greater than 0, overdispersion is evident and the standard errors will increase. The fitted values are similar, but the larger standard errors reflect the overdispersion uncaptured with the Poisson model.

Copyright © 2013, SAS Institute Inc. All rights reserved. NEGATIVE BINOMIAL REGRESSION ПРИМЕР proc genmod data=sasuser.earinfection; class Swimmer (param=ref ref='Freq') Location (param=ref ref='Beach') Gender (param=ref ref='Male'); model infections = swimmer location gender age age*age / dist=negbin link=log type3; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. NEGATIVE BINOMIAL REGRESSION ПРИМЕР: PROC GENMOD OUTPUT

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION FOR RATES

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION: RATES RATES DATA: DEFINITION & EXAMPLES When events occur over time, space, or some other index of exposure, it is more relevant to model the rate at which they occur rather than the number of events. Rates provide the necessary standardization to make the outcomes comparable. You use the OFFSET= option in the MODEL statement in PROC GENMOD. How crime rates are related to the city’s unemployment rate How melanoma incidence rates are related to demographic variables How the rate of loan defaults is related to region of the country How response rates to marketing campaigns relate to known characteristics of the recipients

Copyright © 2013, SAS Institute Inc. All rights reserved. Log(T) is called the offset variable that has a coefficient equal to 1. The offset variable makes the fitted rate proportional to the index of exposure. For example, using the log of the population as an offset variable is the same as modeling the mean number of events proportional to population size. POISSON REGRESSION: RATES RATES DATA: OFFSET … … … OFFSET = Variable

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION: RATES SKIN CANCER IN TEXAS AND MINNESOTA Incidence of nonmelanoma skin cancer City: Minneapolis-St. Paul Dallas-Fort Worth Age_ Group:

Copyright © 2013, SAS Institute Inc. All rights reserved. POISSON REGRESSION: RATES ПРИМЕР proc genmod data=sasuser.skin; class City (param=ref ref='MSP') Age (param=ref ref='85+'); model cases = city age / offset=log_pop dist=poisson link=log type3; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. ZERO-INFLATED POISSON MODEL

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP PURPOSE In some settings, the incidence of zero counts will be much greater than expected for the Poisson distribution. Poisson regression models will exhibit overdispersion when they are fit to data with an excess number of zeros. Zero-inflated Poisson (ZIP) models might be a better fit to the data.

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP MODEL The population that can be modeled with the zero-inflated Poisson distribution is considered to consist of two types of responses. The first type gives Poisson distributed counts, which can produce the zero outcome or some other positive outcome. The second type always gives a zero count. Therefore, the relevant distribution is a mixture of a Poisson distribution and a distribution that is constant at zero.

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP COMPONENTS MODEL statement ZEROMODEL statement proc genmod data=sasuser.roots; class bap photoperiod; model roots = photoperiod | bap / dist=zip link=log type3; zeromodel photoperiod; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP ПРИМЕР: ДАННЫЕ photoperiod (hour) concentration (  M) Number of roots 16 Number of roots

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP ПРИМЕР 16 hours 8 hours

Copyright © 2013, SAS Institute Inc. All rights reserved. ZIP ПРИМЕР: РЕЗУЛЬТАТЫ dist=zinb

Copyright © 2013, SAS Institute Inc. All rights reserved. GAMMA REGRESSION

Copyright © 2013, SAS Institute Inc. All rights reserved. GAMMA DISTRIBUTION is a skewed distribution for positive values has a variance that is proportional to the squared mean has lighter tails than a lognormal distribution Var(y)  [E(y)] 2 gamma

Copyright © 2013, SAS Institute Inc. All rights reserved. DISTRIBUTIONS COMPARISON  Normal (truncated)constant*  Poisson  E(Y)  Gamma  (E(Y)) 2  Lognormal  (E(Y)) 2 Distribution Variance 100x

Copyright © 2013, SAS Institute Inc. All rights reserved. GAMMA REGRESSION ПРИМЕР proc univariate data=car; var price; histogram / gamma (alpha=est sigma=est theta=est color=blue w=2) vaxis=0 to 14 by 2 midpoints=8 to 50 by 2; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. GAMMA REGRESSION REG AND GENMOD RESULTS: RESIDUAL PROC REG PROC GENMOD, link=log PROC GENMOD, link=identity proc genmod data=car; model price = hwympg hwympg2 horsepower / dist=gamma link=log /*identity*/ obstats id=model; run;

Copyright © 2013, SAS Institute Inc. All rights reserved. SUMMARY Problem: nonconstant variance Approaches:  Transform the dependent variable Price (log).  Fit a gamma regression model with the log link function.  Fit a gamma regression model with the identity link function. PROBLEM for OLS ?

Copyright © 2013, SAS Institute Inc. All rights reserved. СТРАХОВАНИЕ CASE STUDY

Copyright © 2013, SAS Institute Inc. All rights reserved. GENMOD СТРАХОВАНИЕ Frequency - how often claims are made Severity A typical way to model severity (claim amount) is by using a gamma distribution with a log link function Pure premium - it is the portion of the company’s expected cost that is “purely” attributed to loss does not include the general expense of doing business Tweedie distribution

Copyright © 2013, SAS Institute Inc. All rights reserved. GLM СТРАХОВАНИЕ: FREQUENCY & PURE PREMIUM ZIP Tweedie distribution – PROC SEVERITY SAS/ETS

Copyright © 2013, SAS Institute Inc. All rights reserved. sas.com СПАСИБО!