The log-rate model Statistical analysis of occurrence-exposure rates

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Simple Logistic Regression
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Models of migration Observations and judgments In: Raymer and Willekens, 2008, International migration in Europe, Wiley.
Generalised linear models
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
C. Logit model, logistic regression, and log-linear model A comparison.
Log-linear modeling and missing data A short course Frans Willekens Boulder, July-August 1999.
Logistic regression for binary response variables.
Cox Proportional Hazards Regression Model Mai Zhou Department of Statistics University of Kentucky.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Incomplete data: Indirect estimation of migration flows Modelling approaches.
1 1. Observations and random experiments Observations are viewed as outcomes of a random experiment.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
AS 737 Categorical Data Analysis For Multivariate
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
The life table LT statistics: rates, probabilities, life expectancy (waiting time to event) Period life table Cohort life table.
Logit model, logistic regression, and log-linear model A comparison.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
A. Analysis of count data
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Logistic Regression Analysis Gerrit Rooks
Logistic regression (when you have a binary response variable)
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Nonparametric Statistics
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
John Loucks St. Edward’s University . SLIDES . BY.
Generalized Linear Models
Statistics 103 Monday, July 10, 2017.
Wildlife Population Analysis What are those βs anyway?
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Logistic Regression.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Introduction to Logistic Regression
Introduction to log-linear models
Modeling Ordinal Associations Bin Hu
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

The log-rate model Statistical analysis of occurrence-exposure rates 16 January 2019 The log-rate model Statistical analysis of occurrence-exposure rates

References Laird, N. and D. Olivier (1981) Covariance analysis of censored survival data using log-linear analysis techniques. Journal of the American Statistical Institute, 76(374):231-240 Holford, T.R. (1980) The analysis of rates and survivorship using log-linear models. Biometrics, 36:299-305 Yamaguchi, K. (1991) Event history analysis. Sage, Newbury Park, Chapter 4:’Log-rate models for piecewise constant rates’

Data: leaving parental home Leaving home

The log-rate model: the occurrence matrix and the exposure matrix Leaving home The log-rate model: the occurrence matrix and the exposure matrix Occurrences: Number leaving home by age and sex, 1961 birth cohort: nij Exposures: number of months living at home (includes censored observations): PMij

ij = E[Nij] The log-rate model PMij fixed offset The log-rate model is a log-linear model with OFFSET (constant term)

Ln(PM): offset : linear predictor The log-rate model  Multiplicative form Addititive form Ln(PM): offset : linear predictor The log-rate model is a log-linear model with OFFSET (constant term)

The log-rate model in two steps Use the model to predict the counts (predict counts from marginal distribution of occurrences and from exposures): IPF (Iterative proportional fitting) Estimate parameters of log-rate model from predicted values using conventional log-linear modeling The model:

Leaving home

Leaving home

The log-rate model in SPSS: unsaturated model Leaving home The log-rate model in SPSS: unsaturated model Model and Design Information: unsaturated model Model: Poisson Design: Constant + SEX + TIMING Ref. cat Ref. cat Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper ln 170/9114 (ref.cat) 1 -3.9818 .0694 -4.12 -3.85 2 .5070 .0878 .33 .68 [ln 151/4876]+3.9818 3 .0000 . . . 4 -1.3044 .0897 -1.48 -1.13 [ln 82/16202]+3.9818 5 .0000 . . .

The log-rate model in SPSS: unsaturated model Leaving home The log-rate model in SPSS: unsaturated model PM *exp[ ] = RATE 9114*exp[-3.982 ] = 170.0 0.01865 16202*exp[-3.982-1.304 ] = 82.0 0.00506 15113*exp[-3.982-1.304+0.507] = 127.0 0.00840 4876*exp[-3.982+ 0.507] = 151.0 0.03096

The log-rate model in SPSS: unsaturated model SEX TIMING NUMBER EXPOSURE 1 1 135 15113 2 1 74 16202 1 2 143 4876 2 2 178 9114 GENLOG timing sex /CSTRUCTURE=exposure /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN sex timing /SAVE PRED .

Leaving home The log-rate model in GLIM: unsaturated model Occ = Exp * exp[overall + sex] DATA: Occurrence matrix and exposure matrix (2*2) [i] $fit +sex$ [o] scaled deviance = 218.48 (change = -14.80) at cycle 4 [o] d.f. = 2 (change = -1 ) [o] [i] $d e$ [o] estimate s.e. parameter [o] 1 -4.275 0.05997 1 [o] 2 -0.3344 0.08697 SEX(2) [o] scale parameter taken as 1.000 Females 278 = 19989 * exp[-4.275] RATE = exp[-4.275] = 0.0139 Males 252 = 25316 * exp [-4.275 - 0.3344] RATE = exp [-4.275 - 0.3344] = 0.0100 [i] $d r$ [o] unit observed fitted residual [o] 1 135 210.19 -5.186 [o] 2 74 161.28 -6.873 [o] 3 143 67.81 9.130 [o] 4 178 90.72 9.163

Leaving home The log-rate model in GLIM: unsaturated model Occ = Exp * exp[overall + sex + timing]

The log-rate model in GLIM: unsaturated model Leaving home The log-rate model in GLIM: unsaturated model

Leaving home The log-rate model in TDA The basic exponential model with time-constant covariates (Blossfeld and Rohwer, pp. 87ff) Occ = Exp * exp[overall + sex] SN Org Des Episodes Weighted Duration TS Min TF Max Excl ---------------------------------------------------------------------------- 1 0 0 53 53.00 128.47 0.00 144.00 - 1 0 1 530 530.00 72.63 0.00 140.00 - Sum 583 583.00 Number of episodes: 583 Successfully created new episode data. Idx SN Org Des MT Variable Coeff Error C/Error Signif ------------------------------------------------------------------- 1 1 0 1 A Constant -4.6098 0.0630 -73.1777 1.0000 2 1 0 1 A SEX1 0.3344 0.0870 3.8451 0.9999 Log likelihood (starting values): -2887.5967 Log likelihood (final estimates): -2880.1982 command file: ehd21.cf data file: test.dat (micro data)

LOG-RATE MODEL IN TDA: PROGRAMME Leaving home LOG-RATE MODEL IN TDA: PROGRAMME # ehd2.cf Basic exponential model with covariate SEX nvar( dfile = test.dat, # data file ID = c1, # identification number SN = c2, # spell number TF = c3, # TIME LEAVING HOME (=ENDING TIME) # measured from age 0!!!! TF15 = TF-180, # measured from age 15 SEX = c4, # sex REASON = c5, # reason SEX1 = SEX[1], # see boek p. 61 SEX1 = 1 for females and 0 for males # MALES ref.cat SEX2 = SEX[2], # = 1 for females DES = if eq(REASON,4) then 0 else 1, # destination TFP = TF15, # Blossfeld: TF+1 !!!!!! ); edef( # define single episode data ts = 0, # starting time tf = TFP, # ending time org = 0, # origin state des = DES, # destination state # BASIC exponential model (Blossfeld-Rohwer p. 90-91) rate( xa (0,1) = SEX1, pres = ehd21.res, ) = 2;

Related models Parameters of these models are related Poisson distribution: counts have Poisson distribution (total number not fixed) Poisson regression Log-linear model: model of count data (log of counts) Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed) Logit model: model of proportions [and odds (log of odds)] Logistic regression Log-rate model: log-linear model with OFFSET (constant term) Parameters of these models are related

The unsaturated model Similarity with log-rate model

The unsaturated log-linear model Leaving home The unsaturated log-linear model Assume: two-way classification; counts unknown but marginal totals given Predict the expected counts (cell entries)

Leaving home

Odds ratio = 1 The unsaturated log-linear model as a log-rate model Leaving home The unsaturated log-linear model as a log-rate model Odds ratio = 1

Leaving home With PMij = 1

Update table Update a table Similarity with log-rate model Illustration: migration analysis with incomplete data Migration is a realisation of a Poisson process Literature: “Indirect estimation of migration”, Special issue of Mathematical Population Studies, A. Rogers ed. Vol 7, no 3 (1999)

Update table Updating a table: THE LOG-RATE MODEL IN TWO STEPS Odds ratio = 2.270837

Updating a table: THE LOG-RATE MODEL IN TWO STEPS Update table Updating a table: THE LOG-RATE MODEL IN TWO STEPS

Update table

Log-rate model: rate = events/exposure Update table Log-rate model: rate = events/exposure Gravity / spatial interaction model i and j are balancing factors

IPF and biproportional adjustment Update table IPF and biproportional adjustment Log-likelihood function:

Biproportional adjustment method Update table Biproportional adjustment method RAS method (Richard A. Stone: Input-output models, 1962) DSF procedure (DSF = Deming, Stephan, Furness) (Sen and Smith, 1995, p. 374) See e.g. Willekens (1983) Log-linear analysis of spatial interaction

Biproportional adjustment Update table Biproportional adjustment Step 0: s (Step) = 0 Step 1 Step 2 Step 3: go to Step 1 unless convergence criteria is reached. The stopping criterion is reached when the change is the adjustment factors is less than 10-6 for all x and j.

Likelihood equations may be written as: Update table Likelihood equations may be written as: Marginal totals are sufficient statistics

A different way of writing the spatial interaction model: Update table A different way of writing the spatial interaction model: Link Poisson - Multinomial

The gravity model is a log-linear model Update table The gravity model is a log-linear model The entropy model is a log-linear model The RAS model is as log-linear (log-rate) model

Update table Parameter estimation Maximise (log) likelihood function: probability that the model predicts the data Expectation: predict E[Nrs] = rs given the model and initial parameter estimates. Maximisation: maximise the ‘complete-data’ log-likelihood.

The log-rate model Piecewise constant hazard model Kidney Transplant Histocompatibility Study The data describe the survival of the kidney graft (organ) following kidney transplant operations. The risk factor 'donor relationship' has two categories, cadaveric nonrelated donor (CAD) and living related donor (LRD). The sample in this follow-up study is 1975 transplant operations. Laird N. and D. Olivier (1981) Covariance analysis of censored survival data using log-linear analysis techniques, Journal of the American Statistical Association, Vol. 76, no. 374, pp. 231- 240. The authors claim that they go beyond Holford (1980) ‘The analysis of rates and survivorship using log-linear models’, Biometrics, 36:299-306 d:\s\data\laird\kidney\laird.doc

Life-table data on graft survival Kidney Transplant Study Life-table data on graft survival Exposure (Exp) is calculated as follows: Exp = [E - 0.5(W + D)]*# in days where E = number entered W = number withdrawn D = number died # = width of interval (the last open interval was taken as having 180 days) 608*90 + 30*45 d:\s\data\laird\laird_lt.xls

Death rates (* 1000; per day) Kidney Transplant Study Death rates (* 1000; per day)

Kidney Transplant Study CAD LRD

SPSS Deaths 9-12 m: 107325 * exp[-8.7857+1.0087]=107325*0.0004193=45 Kidney Transplant Study Model: Poisson Design: Constant + TIME SPSS 1 Constant 2 [TIME = 1] 3 [TIME = 2] 4 [TIME = 3] 5 [TIME = 4] 6 [TIME = 5] 7 [TIME = 6] 8 [TIME = 7] 9 [TIME = 8] 10 [TIME = 9] 11 [TIME = 10] 12 [TIME = 11] 13 [TIME = 12] 14 [TIME = 13] 15 [TIME = 14] 16 [TIME = 15] 17x[TIME = 16] Parameter Estimate SE 1 -8.7857 .5774 2 3.6281 .5883 3 3.5743 .5879 4 3.6502 .5909 5 3.4168 .5893 6 3.1263 .5825 7 2.7212 .5858 8 2.0913 .5831 9 1.0350 .5951 10 1.0087 .5963 11 .1819 .5997 12 .2822 .5986 13 .1147 .6065 14 .0197 .6191 15 .0742 .6325 16 -.7640 .7638 17 .0000 . Deaths 9-12 m: 107325 * exp[-8.7857+1.0087]=107325*0.0004193=45

Kidney Transplant Study Model: Poisson Design: Constant + DONOR TYPE + TIME (unsaturated model) Estimate SE -9.2184 .5791 .8573 .0730 .0000 . 3.4734 .5885 3.4260 .5880 3.5097 .5910 3.2837 .5893 3.0026 .5826 2.6089 .5858 1.9928 .5832 .9476 .5952 .9258 .5963 .1046 .5997 .2116 .5986 .0555 .6065 -.0258 .6191 .0349 .6325 -.7890 .7638 1 Constant 2 [CAD = 1.00] 3 x [LRD = 2.00] 4 [P1 = 1] 5 [P1 = 2] 6 [P1 = 3] 7 [P1 = 4] 8 [P1 = 5] 9 [P1 = 6] 10 [P1 = 7] 11 [P1 = 8] 12 [P1 = 9] 13 [P1 = 10] 14 [P1 = 11] 15 [P1 = 12] 16 [P1 = 13] 17 [P1 = 14] 18 [P1 = 15] 19 x [P1 = 16] Deaths 9-12 m: 53370 * exp[-9.2184+0.8573+0.9258]=53370*0.000590=31.49 Observed: 30