Download presentation
Presentation is loading. Please wait.
1
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
2
References Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.
3
Log Linear Model
4
Two-way table where Note: X and Y are independent if
In this case the log-linear model becomes
5
Three-way Frequency Tables
6
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
7
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
8
Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123]
9
Maximum Likelihood Estimation
Log-Linear Model
10
For any Model it is possible to determine the maximum Likelihood Estimators of the parameters
Example Two-way table – independence – multinomial model or
11
Log-likelihood where With the model of independence
12
and with also
13
Let Now
14
Since
15
Now or
16
Hence and Similarly Finally
17
Hence Now and
18
Hence Note or
19
Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects)
20
Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table
21
Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than
22
Example: Variables Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)
23
Goodness of fit testing of Models
MODEL DF LIKELIHOOD- PROB PEARSON PROB RATIO CHISQ CHISQ B,C,H B,CH C,BH H,BC BC,BH BH,CH n.s. CH,BC BC,BH,CH n.s. Possible Models: 1. [BH][CH] – B and C independent given H [BC][BH][CH] – all two factor interaction model
24
Model 1: [BH][CH] Log-linear parameters
Heart disease -Blood Pressure Interaction
25
Multiplicative effect
Log-Linear Model
26
Heart Disease - Cholesterol Interaction
27
Multiplicative effect
28
Model 2: [BC][BH][CH] Log-linear parameters
Blood pressure-Cholesterol interaction:
29
Multiplicative effect
30
Heart disease -Blood Pressure Interaction
31
Multiplicative effect
32
Heart Disease - Cholesterol Interaction
33
Multiplicative effect
34
Another Example In this study it was determined for N = 4353 males
Occupation category Educational Level Academic Aptidude
35
Occupation categories
Self-employed Business Teacher\Education Self-employed Professional Salaried Employed Education levels Low Low/Med Med High/Med High
36
Academic Aptitude Low Low/Med High/Med High
37
Self-employed, Business Teacher
Education Education Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low Low LMed LMed Med Med HMed HMed High High Total Total Self-employed, Professional Salaried Employed Low Low LMed LMed Med Med HMed HMed High High Total Total
39
This is similar to looking at all the bivariate correlations
It is common to handle a Multiway table by testing for independence in all two way tables. This is similar to looking at all the bivariate correlations In this example we learn that: Education is related to Aptitude Education is related to Occupational category Can we do better than this?
40
Fitting various log-linear models
Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence between Aptitude and Occupation given Education.
41
Log-linear Parameters
Aptitude – Education Interaction
42
Aptitude – Education Interaction (Multiplicative)
43
Occupation – Education Interaction
44
Occupation – Education Interaction (Multiplicative)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.