Download presentation
Presentation is loading. Please wait.
1
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
2
References Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.
3
Example 1 In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol
4
Example 2 The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).
5
The study involved a dichotomous response Y
Success (no major parole violation) or Failure (returned to prison either as technical violators or with a new conviction) based on a one-year follow-up. The predictors of parole success included are: type of committed offence (Person offense or Other offense), Age (25 or Older or Under 25), Prior Record (No prior sentence or Prior Sentence), and Drug or Alcohol Dependency (No drug or Alcohol dependency or Drug and/or Alcohol dependency).
6
The data were randomly split into two parts
The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses. The second part of the data was set aside for a validation study of the model to be fitted in the first part.
7
Table
8
Multiway Frequency Tables
Two-Way A
9
Three -Way B A C
10
Three -Way C B A
11
four -Way B A C D
12
Binomial Hypergeometric Poisson Multinomial
Models for count data Binomial Hypergeometric Poisson Multinomial
13
Univariate models for count data
14
The Binomial distribution
We observe a Bernoulli trial (S,F) n times. Let X denote the number of successes in the n trials. Then X has a binomial distribution, i. e. where p = the probability of success (S), and q = 1 – p = the probability of failure (F)
15
The Poisson distribution
Suppose events are occurring randomly and uniformly in time. Let X be the number of events occuring in a fixed period of time. Then X will have a Poisson distribution with parameter l.
16
The Hypergeometric distribution
Suppose we have a population containing N objects. Suppose the elements of the population are partitioned into two groups. Let a = the number of elements in group A and let b = the number of elements in the other group (group B). Note N = a + b. Now suppose that n elements are selected from the population at random. Let X denote the elements from group A. (n – X will be the number of elements from group B.)
17
Population GroupB (b elements) Group A (a elements) n - x x
sample (n elements)
18
Thus the probability function of X is:
The number of ways x elements can be chosen Group A . The number of ways n - x elements can be chosen Group B . The total number of ways n elements can be chosen from N = a + b elements A random variable X that has this distribution is said to have the Hypergeometric distribution. The possible values of X are integer values that range from max(0,n – b) to min(n,a)
19
Mean and Variance of Hypergeometric distribution
20
Mutivariate models for count data
21
The Multinomial distribution
Suppose that we observe an experiment that has k possible outcomes {O1, O2, …, Ok } independently n times. Let p1, p2, …, pk denote probabilities of O1, O2, …, Ok respectively. Let Xi denote the number of times that outcome Oi occurs in the n repetitions of the experiment. Then the joint probability function of the random variables X1, X2, …, Xk is This distribution is called the Multinomial distribution with parameters n, p1, ..., pk
22
Comments The marginal distribution of Xi is Binomial with parameters n and pi. Multivariate Analogs of the Poisson and Hypergeometric distributions also exist
23
Analysis of a Two-way Frequency Table:
24
Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)
25
Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure)
The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.
26
Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol )
The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.
27
Conditional Distributions (Serum Cholesterol given Systolic Blood Pressure)
28
GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol
29
Notation: Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.
30
Different Models The Multinomial Model:
Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters pij
31
The Product Multinomial Model:
Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters pj|i
32
The Poisson Model: In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let mij denote the mean of xij.
33
Independence
34
Multinomial Model if independent and The estimated expected frequency in cell (i,j) in the case of independence is:
35
The same can be shown for the other two models – the Product Multinomial model and the Poisson model
namely The estimated expected frequency in cell (i,j) in the case of independence is: Standardized residuals are defined for each cell:
36
The Chi-Square Statistic
The Chi-Square test for independence Reject H0: independence if
37
Table Expected frequencies, Observed frequencies, Standardized Residuals
c2 = (p = )
38
Example In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied. The crime of the first victimization (X) and the crime of the second victimization (Y) were noted. The data were tabulated on the following slide
39
Table 1: Frequencies
40
Table 2: Expected Frequencies (assuming independence)
41
Table 3: Standardized residuals
42
Table 3: Conditional distribution of second victimization given the first victimization (%)
43
Log Linear Model
44
Recall, if the two variables, rows (X) and columns (Y) are independent then
45
In general let then (1) where Equation (1) is called the log-linear model for the frequencies xij.
46
Note: X and Y are independent if
In this case the log-linear model becomes
47
Comment: The log-linear model for a two-way frequency table: is similar to the model for a two factor experiment
48
Three-way Frequency Tables
49
Example Data from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962]) Variables Systolic Blood Pressure (X) < 127, , , 167+ Serum Cholesterol <200, , , 260+ Heart Disease Present, Absent The data is tabulated on the next slide
50
Three-way Frequency Table
51
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
52
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
53
1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.
54
2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
55
3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
56
4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
57
5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
58
6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
59
7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
60
8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
61
9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
62
Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model
63
Maximum Likelihood Estimation
Log-Linear Model
64
For any Model it is possible to determine the maximum Likelihood Estimators of the parameters
Example Two-way table – independence – multinomial model or
65
Log-likelihood where With the model of independence
66
and with also
67
Let Now
68
Since
69
Now or
70
Hence and Similarly Finally
71
Hence Now and
72
Hence Note or
73
Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of mijk… is xijk… .
74
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
75
Multiway Frequency Tables
Two-Way A
76
four -Way B A C D
77
Log Linear Model
78
Two- way table where The multiplicative form:
79
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
80
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general or the multiplicative form
81
Comments The log-linear model is similar to the ANOVA models for factorial experiments. The ANOVA models are used to understand the effects of categorical independent variables (factors) on a continuous dependent variable (Y). The log-linear model is used to understand dependence amongst categorical variables The presence of interactions indicate dependence between the variables present in the interactions
82
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
83
1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.
84
2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
85
3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
86
4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
87
5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
88
6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
89
7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
90
8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
91
9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
92
Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model
93
Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table
94
Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than
95
Example: Variables Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)
96
Goodness of fit testing of Models
MODEL DF LIKELIHOOD- PROB PEARSON PROB RATIO CHISQ CHISQ B,C,H B,CH C,BH H,BC BC,BH BH,CH n.s. CH,BC BC,BH,CH n.s. Possible Models: 1. [BH][CH] – B and C independent given H [BC][BH][CH] – all two factor interaction model
97
Model 1: [BH][CH] Log-linear parameters
Heart disease -Blood Pressure Interaction
98
Multiplicative effect
Log-Linear Model
99
Heart Disease - Cholesterol Interaction
100
Multiplicative effect
101
Model 2: [BC][BH][CH] Log-linear parameters
Blood pressure-Cholesterol interaction:
102
Multiplicative effect
103
Heart disease -Blood Pressure Interaction
104
Multiplicative effect
105
Heart Disease - Cholesterol Interaction
106
Multiplicative effect
107
Another Example In this study it was determined for N = 4353 males
Occupation category Educational Level Academic Aptidude
108
Occupation categories
Self-employed Business Teacher\Education Self-employed Professional Salaried Employed Education levels Low Low/Med Med High/Med High
109
Academic Aptitude Low Low/Med High/Med High
110
Self-employed, Business Teacher
Education Education Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low Low LMed LMed Med Med HMed HMed High High Total Total Self-employed, Professional Salaried Employed Low Low LMed LMed Med Med HMed HMed High High Total Total
112
This is similar to looking at all the bivariate correlations
It is common to handle a Multiway table by testing for independence in all two way tables. This is similar to looking at all the bivariate correlations In this example we learn that: Education is related to Aptitude Education is related to Occupational category Can we do better than this?
113
Fitting various log-linear models
Simplest model that fits is: [Apt,Ed][Occ,Ed] This model implies conditional independence between Aptitude and Occupation given Education.
114
Log-linear Parameters
Aptitude – Education Interaction
115
Aptitude – Education Interaction (Multiplicative)
116
Occupation – Education Interaction
117
Occupation – Education Interaction (Multiplicative)
118
Conditional Test Statistics
119
Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1.
That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.
120
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
121
Example
122
Goodness of Fit test for the all k-factor models
Conditional tests for zero k-factor interactions
123
Conclusions The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705) The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705) All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239). The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359) There are significant 2 factor interactions G2(1|2) = 33.0 (p = Conclude that the model should contain main effects and some two-factor interactions
124
There also may be a natural sequence of progressively complicated models that one might want to identify. In the laundry detergent example the variables are: Softness of Laundry Used Previous use of Brand M Temperature of laundry water used Preference of brand X over brand M
125
A natural order for increasingly complex models which should be considered might be:
[1][2][3][4] [1][3][24] [1][34][24] [13][34][24] [13][234] [134][234] The all-Main effects model Independence amongst all four variables Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction Brand M is recommended for hot water add 2nd the 3-4 interaction brand M is also recommended for Soft laundry add 3rd the 1-3 interaction Add finally some possible 3-factor interactions
126
Likelihood Ratio G2 for various models
d]f] G2 [1][3][24] 17 22.4 [1][24][34] 16 18 [13][24][34] 14 11.9 [13][23][24][34] 13 11.2 [12][13][23][24][34] 11 10.1 [1][234] 14.5 [134][24] 10 12.2 [13][234] 12 8.4 [24][34][123] 9 [123][234] 8 5.6
128
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
129
Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where
130
Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
131
Models for three-way tables
132
1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables. Comment: For any model the parameters (u, u1(i) , u2(j) , u3(k)) can be estimated in addition to the expected frequencies (mijk) in each cell
133
2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.
134
3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.
135
4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.
136
5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.
137
6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.
138
7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.
139
8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
140
9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.
141
Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than
142
Conditional Test Statistics
143
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
144
Stepwise selection procedures
Forward Selection Backward Elimination
145
Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
146
Backward Elimination:
Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
148
K = knowledge N = Newspaper R = Radio S = Reading L = Lectures
151
Continuing after 10 steps
152
The final step
153
The best model was found a the previous step
[LN][KLS][KR][KN][LR][NR][NS]
154
Modelling of response variables
Independent → Dependent
155
Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.
157
The variables Type of seedling (T) Depth of planting (D)
Longleaf seedling Slash seedling Depth of planting (D) Too low. Too high Mortality (M) (the dependent variable) Dead Alive
158
The Log-linear Model Note: mij1 = # dead when T = i and D = j.
mij2 = # alive when T = i and D = j. = mortality ratio when T = i and D = j.
159
Hence since
160
The logit model: where
161
Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable. Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit model k + 1 = constant term in logit model k + 1 = 2, main effects in logit model
162
1 = Depth, 2 = Mort, 3 = Type
163
Log-Linear parameters for Model: [TM][TD][DM]
164
Logit Model for predicting the Mortality
166
The best model was found by forward selection was
[LN][KLS][KR][KN][LR][NR][NS] To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely [LNRS][KLS][KR][KN] The logit model will contain Main effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading) Two factor interaction effect for L and S
167
The Logit Parameters for the Model : LNSR, KLS, KR, KN
( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters) The Constant term: (0.798) The Main effects on Knowledge: Lectures Lect (1.307) None (0.765) Newspaper News (1.383) None (0.723) Reading Solid (1.405) Not (0.712) Radio Radio (1.162) None (0.861) The Two-factor interaction Effect of Reading and Lectures on Knowledge
168
Fitting a Logit Model with a Polytomous Response Variable
169
Example: NA – Not available
170
The variables Race – white, black Age - < 22, ≥ 22
Father’s education – GS, some HS, HS grad, NA Respondents Education - GS, some HS, HS grad – the response (dependent) variable
172
Techniques for handling Polytomous Response Variable Approaches
Consider the categories 2 at a time. Do this for all possible pairs of the categories. Look at the continuation ratios 1 vs 2 1,2 vs 3 1,2,3 vs 4 etc
176
Causal or Path Analysis for Categorical Data
177
When the data is continuous, a causal pattern may be assumed to exist amongst the variables.
The path diagram This is a diagram summarizing causal relationships. Straight arrows are drawn between a variable that has some cause and effect on another variable X Y Curved double sided arrows are drawn between variables that are simply correlated Y X
178
Job Stress Smoking Example 1
The variables – Job stress, Smoking, Heart Disease The path diagram Job Stress Smoking Heart Disease In Path Analysis for continuous variables, one is interested in determining the contribution along each path (the path coefficents)
179
Job Stress Smoking Drinking Example 2
The variables – Job stress, Alcoholic Drinking, Smoking, Heart Disease The path diagram Job Stress Smoking Drinking Heart Disease
180
In analysis of categorical data there are no path coefficients but path diagrams can point to the appropriate logit analysis Example In this example the data consists of a two wave, two variable panel data for a sample of n =3398 schoolboys. It is looking at “membership” and “attitude towards” the leading crowd.
181
A B C D The path diagram: This suggest predicting B from A, then
C from A and B and finally D from A, B and C.
185
Example 2 In this example we are looking at
Social Economic Status (SES) Sex IQ Parental Encouragement for Higher Education (PE) College Plans(CP)
187
The Path Diagram Sex SES IQ PE CP
188
The path diagram suggests
Predicting Parental Encouragement from Sex, SocioEconomic status, and IQ, then Predicting College Plans from Parental Encouragement, Sex, SocioEconomic status, and IQ.
190
Logit Parameters: Model [ABC][ABD][ACD][BCD]
191
Two factor Interactions
194
Logit Parameters for Predicting College Plans Using Model 9:
Logit Parameters for Predicting College Plans Using Model 9: [ABCD][BCE][AE][DE]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.