Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Multivariate Analysis

Similar presentations


Presentation on theme: "Discrete Multivariate Analysis"— Presentation transcript:

1 Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

2 References Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.

3 Log Linear Model

4 Two-way table where Note: X and Y are independent if
In this case the log-linear model becomes

5 Three-way Frequency Tables

6 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

7 Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

8 Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123]

9 Maximum Likelihood Estimation
Log-Linear Model

10 For any Model it is possible to determine the maximum Likelihood Estimators of the parameters
Example Two-way table – independence – multinomial model or

11 Log-likelihood where With the model of independence

12 and with also

13 Let Now

14 Since

15 Now or

16 Hence and Similarly Finally

17 Hence Now and

18 Hence Note or

19 Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects)

20 Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table

21 Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than

22 Example: Variables Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)

23 Goodness of fit testing of Models
MODEL DF LIKELIHOOD- PROB PEARSON PROB RATIO CHISQ CHISQ B,C,H B,CH C,BH H,BC BC,BH BH,CH n.s. CH,BC BC,BH,CH n.s. Possible Models: 1. [BH][CH] – B and C independent given H [BC][BH][CH] – all two factor interaction model

24 Model 1: [BH][CH] Log-linear parameters
Heart disease -Blood Pressure Interaction

25 Multiplicative effect
Log-Linear Model

26 Heart Disease - Cholesterol Interaction

27 Multiplicative effect

28 Model 2: [BC][BH][CH] Log-linear parameters
Blood pressure-Cholesterol interaction:

29 Multiplicative effect

30 Heart disease -Blood Pressure Interaction

31 Multiplicative effect

32 Heart Disease - Cholesterol Interaction

33 Multiplicative effect

34 Another Example In this study it was determined for N = 4353 males
Occupation category Educational Level Academic Aptidude

35 Occupation categories
Self-employed Business Teacher\Education Self-employed Professional Salaried Employed Education levels Low Low/Med Med High/Med High

36 Academic Aptitude Low Low/Med High/Med High

37 Self-employed, Business Teacher
Education Education Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low Low LMed LMed Med Med HMed HMed High High Total Total Self-employed, Professional Salaried Employed Low Low LMed LMed Med Med HMed HMed High High Total Total

38

39 This is similar to looking at all the bivariate correlations
It is common to handle a Multiway table by testing for independence in all two way tables. This is similar to looking at all the bivariate correlations In this example we learn that: Education is related to Aptitude Education is related to Occupational category Can we do better than this?

40 Fitting various log-linear models
Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence between Aptitude and Occupation given Education.

41 Log-linear Parameters
Aptitude – Education Interaction

42 Aptitude – Education Interaction (Multiplicative)

43 Occupation – Education Interaction

44 Occupation – Education Interaction (Multiplicative)


Download ppt "Discrete Multivariate Analysis"

Similar presentations


Ads by Google