The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES
Three or more categorical variables
Chapter 11 Other Chi-Squared Tests
Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2.
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
Probability and Statistics Dr. Saeid Moloudzadeh Frequency table and graphs 1 Contents Descriptive Statistics Axioms of Probability Combinatorial.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Logistic Regression and Odds Ratios
Inference for Categorical Data Chi-SquareCh.11. Facts about Chi-Square ► Takes only positive values and the graph is skewed to the right ► Test Statistic.
Sociology 601 Lecture 11: October 6, 2009 No office hours Oct. 15, but available all day Oct. 16 Homework Contingency Tables for Categorical Variables.
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
Section 2.6 Relations in Categorical Variables So far in chapter two we have dealt with data that is quantitative. In this section we consider categorical.
Chi-Square and Analysis of Variance (ANOVA) Lecture 9.
Is the following graph Hamiltonian- connected from vertex v? a). Yes b). No c). I have absolutely no idea v.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
CrimiNole Gatorbait OR ? Are you a Two rivalries in one table! Two rivalries in one table! !
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection The Data Collection Process Getting Ready for Data Analysis Descriptive Statistics.
Chi-Square and Analysis of Variance (ANOVA)
Presentation 12 Chi-Square test.
Chapter 15 – Elaborating Bivariate Tables
© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Applied Discrete Mathematics Week 13: Boolean Algebra
Census Bureau’s Interim and Final State Projections Population Projections Branch Population Division U.S. Census Bureau.
GRAPH THEORY.  A graph is a collection of vertices and edges.  An edge is a connection between two vertices (or nodes).  One can draw a graph by marking.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 6 Solving Normal Equations and Estimating Estimable Model Parameters.
A. Analysis of count data
Test of Independence. The chi squared test statistic and test procedure can also be used to investigate association between 2 categorical variables in.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Handout week 1 course Renske Doorenspleet 1 Chapter 1 -A. The role of statistics in the research process -B. Statistical applications -C. Types of variables.
Statistics: Analyzing 2 Categorical Variables MIDDLE SCHOOL LEVEL  Session #1  Presented by: Dr. Del Ferster.
Analysis of Two-Way Tables Moore IPS Chapter 9 © 2012 W.H. Freeman and Company.
Other Chi-Square Tests
Aim: How do we analyze data with a two-way table?
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Seven Steps for Doing  2 1) State the hypothesis 2) Create data table 3) Find  2 critical 4) Calculate the expected frequencies 5) Calculate  2 6)
1.  The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring* proportions in a.
Dr. Muhammad Razzaq Malik. DEMOGRAPHY It is the scientific study of human population concerning their size, distribution, structure and changes within.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Sociology. Sociology is a science because it uses the same techniques as other sciences Explaining social phenomena is what sociological theory is all.
The Geography of Viral Hepatitis C In Texas, Mara Hedrich
Goodness-of-Fit and Contingency Tables Chapter 11.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Statistical Analysis Professor Lynne Stokes
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Extra Brownie Points! Lottery To Win: choose the 5 winnings numbers from 1 to 49 AND Choose the "Powerball" number from 1 to 42 What is the probability.
Categorical Data Aims Loglinear models Categorical data
Dr. Siti Nor Binti Yaacob
SPSS OUTPUT & INTERPRETATION
SPSS OUTPUT & INTERPRETATION
Spanning Trees Discrete Mathematics.
Lecture 1. Introduction Outlines for Today 1.Types of Variables
Joyful mood is a meritorious deed that cheers up people around you
Presentation transcript:

The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA

OUTLINE 1.LOGLINEAR MODEL (LLM) - two-way table - three-way table - examples 2.MULTIGRAPH - construction - maximum spanning tree - conditional independencies - collapsibility 3.EXAMPLES 2

Loglinear Model Goal Identify the structure of associations among a set of categorical variables. 3

LLM: two variables Y 123…JTotal n 11 n 12 n 13 …n 1J n 1+ 2n 21 n 22 n 23 …n 2J n X In I1 n I2 n I3 … n IJ n I+ Total n +1 n +2 n +3 … n +J n 4

LLM: two variables Example Survey of High School Seniors in Dayton, Ohio Collaboration: WSU Boonshoft School of Medicine and United Health Services of Dayton Marijuana Use? YesNoTotal Yes Cigarette Use? No Total

LLM: two variables 6 Two discrete variables, X and Y Model of independence: generating class is [X][Y]

LLM: two variables LLM of independence: 7

LLM: two variables Saturated LLM: generating class is [XY]: 8

LLM: two variables GeneratingProbabilistic InterpretationClassModel X and Y independent[X][Y]p ij = p i+ p +j X and Y dependent[XY]p ij 9

LLM: three variables Example: Dayton High School Data AlcoholCigarette Marijuana Use UseUseYesNo YesYes No NoYes 3 43 No

1111 LLM: three variables Saturated LLM, [XYZ]:

LLM: three variables Generating Probabilistic InterpretationClassModel mutual independence[X][Y][Z]p ijk = p i++ p +j+ p ++k joint independence[XZ][Y]p ijk = p i+k p +j+ conditional independence[XY][XZ]p ijk = p ij+ p i+k /p i++ homogeneous association * [XY][XZ][YZ] * saturated model[XYZ]p ijk * nondecomposable model 12

Decomposable LLMs  closed-form expression for MLEs  closed-form expression for asymptotic variances (Lee, 1977) asymptotic variances (Lee, 1977)  conditional G 2 statistic simplifies  allow for causal interpretations  easier to interpret the LLM 13

14

3 Categorical Variables: X, Y, and Z If [X ⊗ Y] and [Y ⊗ Z] then [X ⊗ Z] FALSE! 15

LLM: three variables Generating Probabilistic InterpretationClassModel mutual independence[X][Y][Z]p ijk = p i++ p +j+ p ++k joint independence[XZ][Y]p ijk = p i+k p +j+ conditional independence[XY][XZ]p ijk = p ij+ p i+k /p i++ homogeneous association[XY][XZ][YZ]p ijk = ψ ij φ ik ω jk saturated model[XYZ]p ijk 16

3 Categorical Variables: X, Y, and Z If [Y ⊗ Z] for all X = 1, 2, …. then [Y ⊗ Z] FALSE! 17

LLM: three variables Generating Probabilistic InterpretationClassModel mutual independence[X][Y][Z]p ijk = p i++ p +j+ p ++k joint independence[XZ][Y]p ijk = p i+k p +j+ conditional independence[XY][XZ]p ijk = p ij+ p i+k /p i++ homogeneous association[XY][XZ][YZ]p ijk = ψ ij φ ik ω jk saturated model[XYZ]p ijk 18

3 Categorical Variables: X, Y, and Z If [Y ⊗ Z] then [Y ⊗ Z] for all X = 1, 2, 3, … FALSE! 19

Which Treatment is Better? TRIAL 1 TRIAL 2 CURED? CURED? YesNoTotalYesNoTotal A40 (.20) (.85) TREATMENT B30 (.15) (.75) Combine TRIALS 1 and 2: CURED? Yes NoTotal A125 (.42) TREATMENT B330 (.55) “Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28,

Florida Homicide Convictions Resulting in Death Penalty ML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991 Death Penalty Yes No White53 (0.11) 430 Defendant’s Race Black15 (0.08) 176 White VictimBlack VictimDeath Penalty YesNoYesNo White53 (0.11)414White 0 (0.00) 16 Defendant’s Race Black11 (0.23) 37Black 4 (0.03)139 21

Multigraph Representation of LLMs   Vertices = generators of the LLM   Multiedges = edges that are equal in number to the number of indices shared by the two vertices being joined 22

Multigraph: three variables [XY][XZ]XY XZ 23

Examples of Multigraphs 24 [AS][ACR][MCS][MAC] ASACR MACMCS

Examples of Multigraphs 25 [ABCD][ACE][BCG][CDF] ABCD CDF ACEBCG

Maximum Spanning Tree The maximum spanning tree of a multigraph M: tree (connected graph with no circuits) includes each vertex sum of the edges is maximum 26

Examples of maximum spanning trees 27 [XY][XZ]XYXZ

Examples of maximum spanning trees 28 [AS][ACR][MCS][MAC] ASACR MACMCS

Examples of maximum spanning trees 29 [ABCD][ACE][BCG][CDF] ABCD CDF ACEBCG

Fundamental Conditional Independencies for a Decomposable LLM 1.Let S be the set of indices in a branch of the maximum spanning tree 2.Remove each factor of S from the multigraph, M; the resulting multigraph is M/S 3.An FCI is determined as: where C 1, C 2, …, C k are the sets of factors in the components of M/S 30

31 FCIs [XY][XZ]XYXZ X S = {X} M/S: Y Z [Y ⊗ Z|X]

Collapsibility Conditions Consider a conditional independence relationship of the form [C 1 ⊗ C 2 |S]. If the levels of all factors in C 1 are collapsed, then all relationships among the remaining factors are undistorted EXCEPT for relationships among factors in S. 32

33 FCIs [XY][XZ]XYXZ X S = {X} M/S: Y Z [Y ⊗ Z|X]

Example: Ob-Gyn Study Example: Ob-Gyn Study (Darrocca, et al., 1996) n = 201 pregnant mothers Variables: E: EGA (Early, Late) B: Bishop score (High, Low) T: Treatment (Prostin, Placebo) 34

Example: Ob-Gyn Study BISHOP SCORE (B) HighLow EGA (E) EGA (E) TREATMENT (T) Early Late Early Late Prostin Placebo Best-fitting model: [E][TB] 35

Example: Ob-Gyn Study Generating Class: [E][TB] Multigraph: ETB FCI: [E ⊗ T,B] 36

Example: Ob-Gyn Study Collapsed Table (collapse over EGA): BISHOP SCORE (B) HighLowTotal Prostin58 (0.55) TREATMENT (T) Placebo38 (0.40) P =

Example: WSU-United Way Study M: Marijuana (No, Yes) A: Alcohol (No, Yes) C: Cigarettes (No, Yes) R: Race (Other, White) S: Sex (Female, Male) Observed cell frequencies (n = 2,276):

Example: WSU-United Way Study Generating class: [ACE][MAC][MCG] Multigraph, M: ACE MCGMAC 39

Example: WSU-United Way Study M: S = {A,C} ACE M/S: E A C MGM MCG MAC [E ⊗ M,G|A,C] A = AlcoholC = CigaretteE = Ethnic G = GenderM = Marijuana 40

Example: WSU PASS Program “Preparing for Academic Success” GPA below 2.0 at the end of first quarter 41

Example: WSU PASS Program Variables (n = 972): FACTORLABELLEVELS RetentionR1=No, 2=Yes CohortC1, 2, 3, 4 PASS ParticipationP1=No, 2=Yes Ethnic GroupE1=Caucasian, 2=African-American, 3=Other GenderG1=Male, 2=Female 42

Example: WSU PASS Program The best-fitting LLM has generating class [EG][CP][RC][PG] Multigraph, M: G EGPG P RC C CP 43

Example: WSU PASS Program M: S = {C} EG PG RC CP R P C M M/S [E,G,P ⊗ R|C] C = CohortE = EthnicG = Gender P = PASS ParticipationR = Retention 44

Example: Affinal Relations in Bosnia-Herzegovina Example: Affinal Relations in Bosnia-Herzegovina Data courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations. M: Marriage Type (traditional, elopement) L: Location of Man and Wife (same, different) E: Ethnicity (Bosniak, Serb, Croat) S: Settlement (rural, urban) Best-fitting model: [MLES] Consider structural associations among M, L, and S for each ethnic group (E) separately. 45

Example: Affinal Relations in Bosnia-Herzegovina Bosniaks:[ML][LS] Serbs:[MS][SL] Croats:[M][L][S] M: Marriage TypeL: Location of Man and WifeS: Settlement 46

Conclusions   The generator multigraph uses mathematical graph theory to analyze and interpret LLMs in a facile manner   Properties of the multigraph allow one to: – –Find all conditional independencies – –Determine all collapsibility conditions REFERENCE Khamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models, SAGE series Quantitative Applications in the Social Sciences, No

Without data, you’re just one more person with an opinion 48