Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression Psy 524 Ainsworth.
Simple Logistic Regression
The Analysis of Categorical Data. Categorical variables When both predictor and response variables are categorical: Presence or absence Color, etc. The.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Models with Discrete Dependent Variables
Models of migration Observations and judgments In: Raymer and Willekens, 2008, International migration in Europe, Wiley.
Log-linear modeling and missing data A short course Frans Willekens Boulder, July
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
C. Logit model, logistic regression, and log-linear model A comparison.
Log-linear modeling and missing data A short course Frans Willekens Boulder, July-August 1999.
Logistic regression for binary response variables.
Incomplete data: Indirect estimation of migration flows Modelling approaches.
1 1. Observations and random experiments Observations are viewed as outcomes of a random experiment.
AS 737 Categorical Data Analysis For Multivariate
Occurrence and timing of events depend on Exposure to the risk of an event exposure Risk depends on exposure.
The maximum likelihood method Likelihood = probability that an observation is predicted by the specified model Plausible observations and plausible models.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Logit model, logistic regression, and log-linear model A comparison.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
A. Analysis of count data
Multinomial Distribution
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Introduction to Logistic Regression Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Discrepancy between Data and Fit. Introduction What is Deviance? Deviance for Binary Responses and Proportions Deviance as measure of the goodness of.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Logistic regression (when you have a binary response variable)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Nonparametric Statistics
The log-rate model Statistical analysis of occurrence-exposure rates
Introduction to Logistic Regression
Introduction to log-linear models
Joyful mood is a meritorious deed that cheers up people around you
Presentation transcript:

Log-linear analysis Summary

Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data likelihood’ Focus on prediction Focus on interaction/association Link with risk analysis Unified perspective on different models The approach

Risk measures Count: Number of events during given period ( observation window ) Probability: probability of an outcome: proportion of risk set experiencing a given outcome (event) at least once Risk set = all persons at risk at given point in time. Rate: number of events per time unit of exposure (per unit of any measure of size, e.g. time, space, miles travelled)

Risk measures Difference of probabilities: p 1 - p 2 Relative risk: ratio of probabilities (focus: risk factor) prob. of event in presence of risk factor/ prob. of event in absence of risk factor (control group; reference category): p 1 / p 2 Odds: odds on an outcome: ratio of favourable outcomes to unfavourable outcomes. Chance of one outcome rather than another: p 1 / (1-p 1 ) The odds are what matter when placing a bet on a given outcome, i.e. when something is at stake. Odds reflect the degree of belief in a given outcome. Relation odds and relative risk: Agresti, 1996, p. 25

Risk measures Odds Odds ratio : ratio of odds (focus: risk indicator, covariate) odds in target group / odds in control group [reference category]: ratio of favourable outcomes in target group over ratio in control group. The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar.

Risk analysis Probability models: – Counts  Poisson r.v.  Poisson distribution  Poisson regression / log-linear model – Probabilities  binomial and multinomial r.v.  binomial and multinomial distribution  logistic regression / logit model (parameter p, probability of occurrence, is also called risk; e.g. Clayton and Hills, 1993, p. 7 ) – Rates  Occurrences/exposure  Poisson r.v.  log-rate model

Analysis of count data Introduction to log-linear models

The Poisson probability model Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter is the expected number of events per unit time interval: = E[N]

Likelihood function Probability mass function : Log-likelihood function :  Likelihood equations to determine ‘best’ value of

The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

Log-linear models for two-way tables Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints

Relation log-linear model and Poisson regression model are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is

Design matrix unsaturated log-linear model Number of parameters exceeds number of equations  need for additional equations (X’X) -1 is singular  identify linear dependencies 

Hybrid log-linear models Hybrid log-linear models contain unconventional effect parameters. Interaction effects are restricted in certain way.  restrictions on interaction parameters.

Diagonals parameter model 1: (main) diagonal effect With c k = 1 for i  j and c k = c for i = j (diagonal) Off-diagonal elements are independent and diagonal elements are changed by a common factor. Examples of hybrid log-linear models

Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of c k. Application: APC models Diagonals parameter model 2: each diagonal element has separate effect parameter c k = 1 for i  j and c k = c i for i = j (diagonal) Diagonal elements are predicted perfectly by the model

The log-rate model Statistical analysis of occurrence-exposure rates

The log-rate model: the occurrence matrix and the exposure matrix Occurrences: Number leaving home by age and sex, 1961 birth cohort: n ij Exposures: number of months living at home (includes censored observations): PM ij

The log-rate model offset The log-rate model is a log-linear model with OFFSET (constant term) ij = E[N ij ] PM ij fixed

Log-rate model: rate = events/exposure Gravity model With c k = 1 for i  j and c k = c for i = j (diagonal)

Logit model and log-linear model A comparison

Log-linear model: Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex Are females more likely to vote conservative than males? Logit model:

Males voting conservative rather than labour: Females voting conservative rather than labour: Are females more likely to vote conservative than males? Log-odds = logit Effect coding (1) A = Party; B = Sex

Are women more conservative than men? Do women vote more conservative than men? The odds ratio. If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men. Logit model: with a = and b = Log odds of reference category (males) Log odds ratio (odds females / odds males) with x = 0, 1