Discrete Multivariate Analysis

Slides:



Advertisements
Similar presentations
Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong.
Advertisements

Sociology 690 Multivariate Analysis Log Linear Models.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Log-linear Analysis - Analysing Categorical Data
(Hierarchical) Log-Linear Models Friday 18 th March 2011.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chi Square Test Dealing with categorical dependant variable.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Analysis of Categorical Data
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
LOG-LINEAR MODEL FOR CONTIGENCY TABLES Mohd Tahir Ismail School of Mathematical Sciences Universiti Sains Malaysia.
Multinomial Distribution
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Chapter 13 Multiple Regression
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
The p-value approach to Hypothesis Testing
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Methods of Presenting and Interpreting Information Class 9.
Chi Square Test Dr. Asif Rehman.
Ch 26 – Comparing Counts Day 1 - The Chi-Square Distribution
The Chi-square Statistic
Nonparametric Statistics
Chapter 4: Basic Estimation Techniques
Logistic Regression APKC – STATS AFAC (2016).
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
CHAPTER 7 Linear Correlation & Regression Methods
Dr. Siti Nor Binti Yaacob
Discrete Multivariate Analysis
Basic Estimation Techniques
Chapter 11 Chi-Square Tests.
Inference for the mean vector
Chapter 12 Tests with Qualitative Data
Active Learning Lecture Slides
Categorical Data Aims Loglinear models Categorical data
Multivariate Data Analysis
Discrete Multivariate Analysis
Comparing k Populations
Multiple logistic regression
Inferential Statistics
Nonparametric Statistics
Goodness-of-Fit Tests
NURS 790: Methods for Research and Evidence Based Practice
Multivariate Data Summary
Comparing k Populations
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Addition of Independent Normal Random Variables
Chapter 11 Chi-Square Tests.
Introduction to log-linear models
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Comparing k Populations
Chi – square Dr. Anshul Singh Thapa.
Wednesday, September 21, 2016 Farrokh Alemi, PhD.
Multiple Testing Tukey’s Multiple comparison procedure
Chapter 11 Chi-Square Tests.
Hypothesis Testing - Chi Square
Modeling Ordinal Associations Bin Hu
Presentation transcript:

Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

References Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.

Log Linear Model

Two-way table where Note: X and Y are independent if In this case the log-linear model becomes

Three-way Frequency Tables

Log-Linear model for three-way tables Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

Hierarchical Log-linear models for categorical Data For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

Hierarchical Log-linear models for 3 way table Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123]

Maximum Likelihood Estimation Log-Linear Model

For any Model it is possible to determine the maximum Likelihood Estimators of the parameters Example Two-way table – independence – multinomial model or

Log-likelihood where With the model of independence

and with also

Let Now

Since

Now or

Hence and Similarly Finally

Hence Now and

Hence Note or

Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects)

Goodness of Fit Statistics These statistics can be used to check if a log-linear model will fit the observed frequency table

Goodness of Fit Statistics The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than

Example: Variables Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)

Goodness of fit testing of Models MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s. Possible Models: 1. [BH][CH] – B and C independent given H. 2. [BC][BH][CH] – all two factor interaction model

Model 1: [BH][CH] Log-linear parameters Heart disease -Blood Pressure Interaction

Multiplicative effect Log-Linear Model

Heart Disease - Cholesterol Interaction

Multiplicative effect

Model 2: [BC][BH][CH] Log-linear parameters Blood pressure-Cholesterol interaction:

Multiplicative effect

Heart disease -Blood Pressure Interaction

Multiplicative effect

Heart Disease - Cholesterol Interaction

Multiplicative effect

Another Example In this study it was determined for N = 4353 males Occupation category Educational Level Academic Aptidude

Occupation categories Self-employed Business Teacher\Education Self-employed Professional Salaried Employed Education levels Low Low/Med Med High/Med High

Academic Aptitude Low Low/Med High/Med High

Self-employed, Business Teacher Education Education Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low 42 55 22 3 122 Low 0 0 1 19 20 LMed 72 82 60 12 226 LMed 0 3 3 60 66 Med 90 106 85 25 306 Med 1 4 5 86 96 HMed 27 48 47 8 130 HMed 0 0 2 36 38 High 8 18 19 5 50 High 0 0 1 14 15 Total 239 309 233 53 834 Total 1 7 12 215 235 Self-employed, Professional Salaried Employed Low 1 2 8 19 30 Low 172 151 107 42 472 LMed 1 2 15 33 51 LMed 208 198 206 92 704 Med 2 5 25 83 115 Med 279 271 331 191 1072 HMed 2 2 10 45 59 HMed 99 126 179 97 501 High 0 0 12 19 31 High 36 35 99 79 249 Total 6 11 70 199 286 Total 794 781 922 501 2998

This is similar to looking at all the bivariate correlations It is common to handle a Multiway table by testing for independence in all two way tables. This is similar to looking at all the bivariate correlations In this example we learn that: Education is related to Aptitude Education is related to Occupational category Can we do better than this?

Fitting various log-linear models Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence between Aptitude and Occupation given Education.

Log-linear Parameters Aptitude – Education Interaction

Aptitude – Education Interaction (Multiplicative)

Occupation – Education Interaction

Occupation – Education Interaction (Multiplicative)