Download presentation
Presentation is loading. Please wait.
Published byAugust McDonald Modified over 9 years ago
1
F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R) hwu
2
Examples Single classifications (1-13) Two-way classifications (14-27) Three-way classifications (28-32) hwu
3
Example 1 Eye colours Colour A B C D Frequency observed 89 66 60 85
4
hwu Example 2 Prussian cavalry deaths (a)Numbers killed in each unit in each year - frequency table Number killed 0 1 2 3 4 5 Total Frequency observed 144 91 32 11 2 0280
5
hwu Example 2 Prussian cavalry deaths (b) Numbers killed in each unit in each year – raw data 0 0 1 0 0 2 0 0 0 0....................... 0 0 0 2 0 1 0 1 2 0 1........................0 ….. 3 0 0 1 0 0 2 1 0 0 1 0 0 1 0 0 1 1 2 0 1 0 1 1
6
hwu Example 2 Prussian cavalry deaths (c) Total numbers killed each year 1875 ’76 ’77 ’78 ’79 ’80 ’81 ’82 ’83 ’84 ’85 ’86 ’87 ’88 ’89 ’90 ’91 ’92 ’93 ‘94 3 5 7 9 10 18 6 14 11 9 5 11 15 6 11 17 12 15 8 4
7
hwu Example 4 Political views 1 2 3 4 5 6 7 (very L) (centre) (very R) Don’t Know Total 46 179 196 559 232 150 35931490
8
hwu Example 7 Vehicle repair visits Number of visits 0 1 2 3 4 5 6 Total Frequency observed 295 190 53 5 5 2 0550
9
hwu Example 15 Patients in clinical trial DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100
10
§1 INTRODUCTION Data are counts/frequencies (not measurements) Categories (explanatory variable) Distribution in the cells (response) Frequency distribution Single classifications Two-way classifications hwu
11
B: Cause of death CancerOther A: Smoking status Smoker3020 Not smoker1535 Illustration 1.1
12
Data may arise as Bernoulli/binomial data (2 outcomes) Multinomial data (more than 2 outcomes) Poisson data [+ Negative binomial data – the version with range x = 0,1,2, …] hwu
13
§2 POISSON PROCESS AND ASSOCIATED DISTRIBUTIONS
14
hwu 2.1 Bernoulli trials and related distributions Number of successes – binomial distribution [Time before k th success – negative binomial distribution Time to first success – geometric distribution] Conditional distribution of success times
15
hwu 2.2 Poisson process and related distributions time
16
hwu Poisson process with rate λ Number of events in a time interval of length t, N t, has a Poisson distribution with mean t
17
hwu Poisson process with rate λ Inter-event time, T, has an exponential distribution with parameter ( mean 1/ )
18
hwu given n events in time (0,t) how many in time (0,s) (s < t)? Conditional distribution of number of events
19
hwu given n events in time (0,t) how many in time (0,s) (s < t)? Conditional distribution of number of events Answer N s |N t = n ~ B(n,s/t)
20
hwu Splitting into subprocesses time
21
hwu Realisation of a Poisson process # events time
22
hwu X ~ Pn( ), Y ~ Pn( ) X,Y independent then we know X + Y ~ Pn( + ) Given X + Y = n, what is distribution of X?
23
hwu X ~ Pn( ), Y ~ Pn( ) X,Y independent then we know X + Y ~ Pn( + ) Given X + Y = n, what is distribution of X? Answer X|X+Y=n ~ B(n,p) where p = /( + )
24
hwu 2.3 Inference for the Poisson distribution N i, i = 1, 2, …, r, i. i. d. Pn(λ), N=ΣN i
25
hwu CI for.
26
hwu 2.4 Dispersion and LR tests for Poisson data Homogeneity hypothesis H 0 : the N i s are i. i. d. Pn( ) (for some unknown ) Dispersion statistic (M = sample mean)
27
hwu Likelihood ratio statistic form for calculation – see p18 ◄◄
28
hwu §3 SINGLE CLASSIFICATIONS Binary classifications (a) N 1, N 2 independent Poisson, with N i ~ Pn( i ) or (b) fixed sample size, N 1 + N 2 = n, with N 1 ~ B(n,p 1 ) where p 1 = 1 /( 1 + 2 )
29
hwu Qualitative categories (a) N 1, N 2, …, N r independent Poisson, with N i ~ Pn(λ i ) or (b) fixed sample size n, with joint multinomial distribution Mn(n;p)
30
hwu Testing goodness of fit H 0 : p i = i, i = 1,2, …, r This is the (Pearson) chi-square statistic
31
hwu The statistic often appears as
32
hwu
33
An alternative statistic is the LR statistic
34
hwu Sparse data/small expected frequencies ensure m i 1 for all cells, and m i 5 for at least about 80% of the cells if not - combine adjacent cells sensibly
35
hwu Goodness-of-fit tests for frequency distributions - very well-known application of the statistic (see Illustration 3.4 p 22/23)
36
hwu Residuals (standardised)
37
hwu Residuals (standardised) simpler version
38
hwu Number of papers per author 1 2 3 4 5 6 7 8 9 10 11 Number of authors 1062 263 120 50 22 7 6 2 0 1 1 MAJOR ILLUSTRATION 1 Publish and be modelled Model
39
hwu MAJOR ILLUSTRATION 2 Birds in hedges Hedge type i A B C D EF G Hedge length (m) l i 2320 2460 2455 2805 2335 2645 2099 Number of pairs n i 14 16 14 2615 40 71 Model N i ~ Pn( i l i )
40
hwu Example 14 Numbers of mice bearing tumours in treated and control groups TreatedControlTotal Tumours459 No tumours127486 Total167995 §4 TWO-WAY CLASSIFICATIONS
41
hwu Example 15 Patients in clinical trial DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100
42
hwu Patients in clinical trial – take 2 DrugPlaceboTotal Side-effects15 30 No side-effects35 70 Total50 100
43
4.1 Factors and responses F × R tables R × F, R × R (F × F ?) Qualitative, ordered, quantitative Analysis the same - interpretation may be different hwu
44
A two-way table is often called a “contingency table” (especially in R R case). hwu
45
ExposedNot exposedTotal Diseasen 11 n 12 n 1● No diseasen 21 n 22 n 2● Totaln ●1 n ●2 n ●● = n Notation (2 2 case, easily extended)
46
hwu Three possibilities One overall sample, each subject classified according to 2 attributes - this is R × R Retrospective study Prospective study (use of treated and control groups; drug and placebo etc)
47
hwu (a) R × R case (a1) N ij ~ Pn( ij ), independent or, with fixed table total (a2) Condition on n = n ij : N|n ~ Mn(n ; p) where N = {N ij }, p = {p ij }. 4.2 Distribution theory and tests for r × s tables
48
hwu (b) F × R case Condition on the observed marginal totals nj = n ij for the s categories of F ( condition on n and n 1 ) s independent multinomials
49
hwu Usual hypotheses (a1) N ij ~ Pn( ij ), independent H 0 : variables/responses are independent ij = i j / = k i (a2) Multinomial data (table total fixed) H 0 : variables/responses are independent P(row i and column j) = P(row i)P(column j)
50
hwu (b) Condition on n and n j (fixed column totals) N ij ~ Bi( n j, p ij ) j = 1,2, …, s ; independent H 0 : response is homogeneous (p ij = p i for all j) i.e. response has the same distribution for all levels of the factor
51
hwu where m ij = n i n j /n as before Tests of H 0 The χ 2 (Pearson) statistic:
52
hwu where m ij = n i n j /n as before Tests of H 0 The χ 2 (Pearson) statistic:
53
hwu OR: test based on the LR statistic Y 2 Illustration: tonsils data – see p27 In R Pearson/X 2 : read data in using “matrix” then use “chisq.test” LR Y 2 : calculate it directly (or get it from the results of fitting a “log-linear model”- see later)
54
hwu Statistical tests (a) Using Pearson’s χ2 4.3 The 2 2 table DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100
55
hwu where m ij = n i n j /n
56
hwu Yates (continuity) correction Subtract 0.5 from |O – E| before squaring it Performing the test in R n.pat=matrix(c(15,35,4,46),2,2) chisq.test(n.pat)
57
hwu (b) Using deviance/LR statistic Y 2 (c) Comparing binomial probabilities (d) Fisher’s exact test
58
hwu DrugPlaceboTotal Side-effects154 N 19 No side-effects354681 Total50 100
59
hwu Under a random allocation one-sided P-value = P(N 4) = 0.0047
60
hwu In the 2 2 table, the H 0 : independence condition is equivalent to 11 22 = 12 21 Let λ = log( 11 22 / 12 21 ) Then we have H 0 : λ = 0 λ is the “log odds ratio” 4.4 Log odds, combining and collapsing tables, interactions
61
hwu The “λ = 0” hypothesis is often called the “no association” hypothesis.
62
hwu The odds ratio is 11 22 / 12 21 Sample equivalent is
63
hwu The odds ratio (or log odds ratio) provides a measure of association for the factors in the table. no association odds ratio = 1 log odds ratio = 0
64
hwu Don’t combine heterogeneous tables!
65
hwu Interaction An interaction exists between two factors when the effect of one factor is different at different levels of another factor.
66
hwu
68
§5 INTRODUCTION TO GENERALISED LINEAR MODELS (GLMs) Normal linear model Y|x ~ N with E[Y|x]= + x or E[Y|x]= 0 + 1 x 1 + 2 x 2 + … + r x r = x i.e. E[Y|x] = (x) = x
69
hwu We are explaining (x) using a linear predictor (a linear function of the explanatory data) Generalised linear model Now we set g( (x)) = x for some function g We explain g( (x)) using a linear function of the explanatory data, where g is called the link function
70
hwu e.g. modelling a Poisson mean we use a log link g( ) = log We use a linear predictor to explain log rather than itself : the model is Y|x ~ Pn with mean λ x with log λ x = + x or log λ x = x This is a log-linear model
71
hwu An example is a trend model in which we use log i = + i Another example is a cyclic model in which we use log i = 0 + 1 cosθ i + 2 sinθ i
72
hwu §6 MODELS FOR SINGLE CLASSIFICATIONS 6.1 Single classifications - trend models Data: numbers in r categories Model: N i, i = 1, 2, …, r, independent Pn(λ i )
73
hwu Basic case H 0 : λ i ’s equal v H 1 : λ i ’s follow a trend Let X j be category of observation j P(X j = i) = 1/r Test based on see Illustration 6.1
74
hwu A more general model N i independent Pn(λ i ) with Log-linear model
75
hwu It is a linear regression model for logλ i and a non-linear regression model for λ i. It is a generalised linear model. Here the link between the parameter we are estimating and the linear estimator is the log function - it is a “log link”.
76
hwu Fitting in R Example 13: stressful events data >n=c(15,11, …, 1, 4) >r=length(n) >i=1:r
77
hwu >n=c(15,11, …, 1, 4) response vector >r=length(n) >i=1:r explanatory vector model >stress=glm(n~i,family=poisson)
78
hwu >summary(stress) Call: glm(formula = n ~ i, family = poisson) model being fitted Deviance Residuals: Min 1Q Median 3Q Max -1.9886 -0.9631 0.1737 0.5131 2.0362 summary information on the residuals Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.80316 0.14816 18.920 < 2e-16 *** i -0.08377 0.01680 -4.986 6.15e-07 *** information on the fitted parameters
79
hwu Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 50.843 on 17 degrees of freedom Residual deviance: 24.570 on 16 degrees of freedom deviances (Y2 statistics) AIC: 95.825 Number of Fisher Scoring iterations: 4
80
hwu Fitted mean is e.g. for date 6, i = 6 and fitted mean is exp(2.30054) = 9.980
81
hwu Fitted model
82
hwu Test of H 0 : no trend the null fit, all fitted values equal (to the observed mean) Y 2 = 50.84 (~ 2 on 17df) The trend model fitted values exp(2.80316-0.08377i) Y 2 = 24.57 (~ 2 on 16df) Crude 95% CI for slope is -0.084 ± 2(0.0168) i.e. -0.084 ± 0.034
83
hwu The lower the value of the residual deviance, the better in general is the fit of the model.
84
hwu Basic residuals
85
hwu 6.2 Taking into account a deterministic denominator – using an “offset” for the “exposure” Model: N x ~ Pn(λ x ) where E[N x ] = λ x = E x bθ x logλ x = logE x + c + dx See the Gompertz model example (p 40, data in Example 26)
86
hwu We include a term “offset(logE)” in the formula for the linear predictor: in R model = glm(n.deaths ~ age + offset(log(exposure)), family = poisson) Fitted value is the estimate of the expected response per unit of exposure (i.e. per unit of the offset E)
87
hwu §7 LOGISTIC REGRESSION for modelling proportions we have a binary response for each item and a quantitative explanatory variable for example: dependence of the proportion of insects killed in a chamber on the concentration of a chemical present – we want to predict the proportion killed from the concentration
88
hwu for example: dependence of the proportion of women who smoke - on age metal bars on test which fail - on pressure applied policies which give rise to claims – on sum insured Model: # successes at value x i of explanatory variable: N i ~ bi(n i, π i )
89
hwu We use a glm – we do not predict π i directly; we predict a function of π i called the logit of π i. The logit function is given by: It is the “log odds”.
90
See Illustration 7.1 p 43: proportion v dose
91
logit(proportion) v dose
92
hwu This leads to the “logistic regression” model [ c.f. log linear model N i ~ Poisson(λ i ) with log λ i = a + bx i ]
93
hwu We are using a logit link We use a linear predictor to explain rather than itself
94
hwu The method based on the use of this model is called logistic regression
95
hwu Data: explanatory # successes group observed variable value size proportion x 1 n 11 n 1 n 11 /n 1 x 2 n 21 n 2 n 21 /n 2 ……. x s n s1 n s n s1 /n s
96
hwu In R we declare the proportion of successes as the response and include the group sizes as a set of weights drug.mod1 = glm(propdead ~ dose, weights = groupsize, family = binomial) explanatory vector is dose note the family declaration
97
hwu RHS of model can be extended if required to include additional explanatory variables and factors e.g. mod3 = glm(mat3 ~ age+socialclass+gender)
98
hwu drug.mod – see output p44 Coefficients very highly significant (***) Null deviance 298 on 9df Residual deviance 17.2 on 8df But … residual v fitted plot and … fitted v observed proportions plot
99
hwu
101
model with a quadratic term (dose^2)
102
hwu 8.1 Log-linear models for two-way classifications N ij ~ Pn( ij ), i= 1,2, …, r ; j = 1,2, …, s H 0 : variables are independent ij = i j / §8 MODELS FOR TWO-WAY AND THREE-WAY CLASSIFICATIONS
103
hwu log ij = log i + log j log row effect overall effect column effect
104
hwu We “explain” log ij in terms of additive effects: log ij = + α i + β j Fitted values are the expected frequencies Fitting process gives us the value of Y 2 = -2logλ
105
hwu N ij ~ Pn( ij ), independent, with log ij = + α i + β j Declare the response vector (the cell frequencies) and the row/column codes as factors then use > name = glm(…) Fitting a log-linear model
106
hwu Tonsils data (Example 16) n.tonsils = c(19,497,29,560,24,269) rc = factor(c(1,2,1,2,1,2)) cc = factor(c(1,1,2,2,3,3)) tonsils.mod1 = glm(n.tonsils ~ rc + cc, family=poisson)
107
Call: glm(formula = n.tonsils2 ~ rc + cc, family = poisson) Deviance Residuals: 1 2 3 4 5 6 -1.54915 0.34153 -0.24416 0.05645 2.11018 -0.53736 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.27998 0.12287 26.696 < 2e-16 *** rc2 2.91326 0.12094 24.087 < 2e-16 *** cc2 0.13232 0.06030 2.195 0.0282 * cc3 -0.56593 0.07315 -7.737 1.02e-14 *** --- Null deviance: 1487.217 on 5 degrees of freedom Residual deviance: 7.321 on 2 degrees of freedom Y 2 = - 2logλ
108
hwu The fit of the “independent attributes” model is not good
109
hwu > n.patients = c(15, 4, 35, 46) > rc = factor(c(1, 1, 2, 2)) > cc = factor(c(1, 2, 1, 2)) > pat.mod1 = glm(n.patients ~ rc + cc, family = poisson) Patients data (Example 15)
110
Call: glm(formula = n.patients ~ rc + cc, family = poisson) Deviance Residuals: 1 2 3 4 1.6440 -2.0199 -0.8850 0.8457 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.251e+00 2.502e-01 8.996 < 2e-16 *** rc2 1.450e+00 2.549e-01 5.689 1.28e-08 *** cc2 2.184e-10 2.000e-01 1.09e-09 1 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 49.6661 on 3 degrees of freedom Residual deviance: 8.2812 on 1 degrees of freedom AIC: 33.172
111
hwu fitted coefficients: coef(pat.mod1) (Intercept) rc2 cc2 2.251292e+00 1.450010e+00 2.183513e-10 fitted values: fitted(pat.mod1) 1 2 3 4 9.5 9.5 40.5 40.5
112
hwu Estimates are Predictors for cells 1,1 and 1,2 are 2.251292 : exp(2.251292) = 9.5 exp(3.701302) = 40.5 Predictors for cells 2,1 and 2,2 are 2.251292 + 1.450010 = 3.701302 :
113
hwu Residual deviance: 8.2812 on 1 degree of freedom Y 2 for testing the model i.e. for testing H 0 : response is homogeneous/ column distributions are the same/ no association between response and treatment group The lower the value of the residual deviance, the better in general is the fit of the model. Here the fit of the additive model is very poor (we have of course already concluded that there is an association – P-value about 1%).
114
hwu 8.2 Two-way classifications - taking into account a deterministic denominator See the grouse data (Illustration 8.3 p50, data in Example 25) Model: N ij ~ Pn(λ ij ) where E[N ij ] = λ ij = E ij exp( + α i + β j ) logE[N ij /E ij ] = + α i + β j i.e. logλ ij = logE ij + + α i + β j
115
hwu We include a term “offset(logE)” in the formula for the linear predictor Fitted value is the estimate of the expected response per unit of exposure (i.e. per unit of the offset E)
116
hwu 8.3 Log-linear models for three-way classifications Each subject classified according to 3 factors/variables with r,s,t levels respecitvely N ijk ~ Pn( ijk ) with log ijk = + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk r s t parameters
117
hwu Model with two factors and an interaction (no longer additive) is log ij = + α i + β j + (αβ) ij Recall “interaction”
118
hwu Range of possible models/dependencies From 1 Complete independence model formula: A + B + C link: log ijk = + α i + β j + γ k notation: [A][B][C] df: rst – r – s – t + 2 8.4 Hierarchic log-linear models Interpretation!
119
hwu …. through 2 One interaction (B and C say) model formula: A + B*C link: log ijk = + α i + β j + γ k + (βγ) jk notation: [A][BC] df: rst – r – st + 1
120
hwu …. to 5 All possible interactions model formula: A*B*C notation: [ABC] df: 0
121
hwu Model selection: by backward elimination or forward selection through the hierarchy of models containing all 3 variables
122
hwu saturated [ABC] [AB] [AC] [BC] [AB] [AC] [AB] [BC] [AC][BC] [AB] [C] [A] [BC] [AC] [B] [A] [B] [C] independence
123
hwu Our models can include mean (intercept) + factor effects + 2-way interactions + 3-way interaction
124
hwu Illustration 8.4 Models for lizards data (Example 29) liz = array(c(32, 86, 11, 35, 61, 73, 41, 70), dim = c(2, 2, 2)) n.liz = as.vector(liz) s = factor(c(1,1,1,1,2,2,2,2)) species d = factor(c(1, 1, 2, 2, 1, 1, 2, 2)) diameter of perch h = factor(c(1,2,1,2,1,2,1,2)) height of perch
125
hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) liz.mod3 = glm(n.liz ~ s + d*h, family = poisson) liz.mod4 = glm(n.liz ~ s*h + d, family = poisson) liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson) liz.mod6 = glm(n.liz ~ s*d + d*h, family = poisson )
126
hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) 25.04 on 4df liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) † 12.43 on 3df liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson) liz.mod6 = glm(n.liz ~ s*d + d*h, family = poisson )
127
hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) † liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson)† 2.03 on 2df
128
hwu > summary(liz.mod5) Call: glm(formula = n.liz ~ s * d + s * h, family = poisson) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.4320 0.1601 21.436 < 2e-16 *** s2 0.5895 0.1970 2.992 0.002769 ** d2 -0.9420 0.1738 -5.420 5.97e-08 *** h2 1.0346 0.1775 5.827 5.63e-09 *** s2:d2 0.7537 0.2161 3.488 0.000486 *** s2:h2 -0.6967 0.2198 -3.170 0.001526 ** Null deviance: 98.5830 on 7 degrees of freedom Residual deviance: 2.0256 on 2 degrees of freedom
129
hwu
131
FIN
132
hwu Number of papers per author 1 2 3 4 5 6 7 8 9 10 11 Number of authors 1062 263 120 50 22 7 6 2 0 1 1 MAJOR ILLUSTRATION 1 Model
133
hwu
136
MAJOR ILLUSTRATION 2 Hedge type i A B C D EF G Hedge length (m) l i 2320 2460 2455 2805 2335 2645 2099 Number of pairs n i 14 16 14 2615 40 71 Model N i ~ Pn( i l i )
137
hwu
138
Cyclic models
139
hwu Model N i independent Pn(λ i ) with Explanatory variable: the category/month i has been transformed into an angle i
140
hwu It is another example of a non-linear regression model for Poisson responses. It is a generalised linear model.
141
hwu Fitting in R >n=c(40, 34, …, 33, 38) response vector >r=length(n) >i=1:r >th=2*pi*i/r explanatory vector model >leuk=glm(n~cos(th) + sin(th),family=poisson)
142
hwu Fitted mean is
143
hwu Fitted model
144
hwu MaleFemale Cinema often2221 Not often2012 F73DB3 CDA Data from class
145
hwu MaleFemale Cinema often222143 Not often201232 423375
146
MaleFemale Cinema often222143 Not often201232 423375 P(often|male) = 22/42 = 0.524 P(often|female) = 21/33 = 0.636 significant difference (on these numbers)? is there an association between gender and cinema attendance?
147
hwu Null hypothesis H 0 : no association between gender and cinema attendance Alternative: not H 0 Under H 0 we expect 42 43/75 = 24.08 in cell 1,1 etc.
148
hwu > matcinema=matrix(c(22,20,21,12),2,2) > chisq.test(matcinema) Pearson's Chi-squared test with Yates' continuity correction data: matcinema X-squared = 0.5522, df = 1, p-value = 0.4574 > chisq.test(matcinema)$expected [,1] [,2] [1,] 24.08 18.92 [2,] 17.92 14.08
149
hwu > matcinema=matrix(c(22,20,21,12),2,2) > chisq.test(matcinema) Pearson's Chi-squared test with Yates' continuity correction data: matcinema X-squared = 0.5522, df = 1, p-value = 0.4574 > chisq.test(matcinema)$expected [,1] [,2] null hypothesis can stand [1,] 24.08 18.92 no association between gender [2,] 17.92 14.08 and cinema attendance
150
hwu MaleFemale Cinema often110105215 Not often10060160 210165 P(often|male) = 110/210 = 0.524 P(often|female) = 105/60 = 0.636 significant difference (on these numbers)? more students, same proportions
151
hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2
152
hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2 X-squared = 4.3361, df = 1, p-value = 0.03731 > chisq.test(matcinema2)$expected [,1] [,2] [1,] 120.4 94.6 [2,] 89.6 70.4
153
hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2 X-squared = 4.3361, df = 1, p-value = 0.03731 > chisq.test(matcinema2)$expected [,1] [,2] null hypothesis is rejected [1,] 120.4 94.6 there IS an association between [2,] 89.6 70.4 gender and cinema attendance
154
hwu FIN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.