F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)

F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R) hwu

Examples Single classifications (1-13) Two-way classifications (14-27) Three-way classifications (28-32) hwu

Example 1 Eye colours Colour A B C D Frequency observed 89 66 60 85

hwu Example 2 Prussian cavalry deaths (a)Numbers killed in each unit in each year - frequency table Number killed 0 1 2 3 4  5 Total Frequency observed 144 91 32 11 2 0280

hwu Example 2 Prussian cavalry deaths (b) Numbers killed in each unit in each year – raw data 0 0 1 0 0 2 0 0 0 0....................... 0 0 0 2 0 1 0 1 2 0 1........................0 ….. 3 0 0 1 0 0 2 1 0 0 1 0 0 1 0 0 1 1 2 0 1 0 1 1

hwu Example 2 Prussian cavalry deaths (c) Total numbers killed each year 1875 ’76 ’77 ’78 ’79 ’80 ’81 ’82 ’83 ’84 ’85 ’86 ’87 ’88 ’89 ’90 ’91 ’92 ’93 ‘94 3 5 7 9 10 18 6 14 11 9 5 11 15 6 11 17 12 15 8 4

hwu Example 4 Political views 1 2 3 4 5 6 7 (very L) (centre) (very R) Don’t Know Total 46 179 196 559 232 150 35931490

hwu Example 7 Vehicle repair visits Number of visits 0 1 2 3 4 5  6 Total Frequency observed 295 190 53 5 5 2 0550

hwu Example 15 Patients in clinical trial DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100

§1 INTRODUCTION Data are counts/frequencies (not measurements) Categories (explanatory variable) Distribution in the cells (response) Frequency distribution Single classifications Two-way classifications hwu

B: Cause of death CancerOther A: Smoking status Smoker3020 Not smoker1535 Illustration 1.1

Data may arise as Bernoulli/binomial data (2 outcomes) Multinomial data (more than 2 outcomes) Poisson data [+ Negative binomial data – the version with range x = 0,1,2, …] hwu

§2 POISSON PROCESS AND ASSOCIATED DISTRIBUTIONS

hwu 2.1 Bernoulli trials and related distributions Number of successes – binomial distribution [Time before k th success – negative binomial distribution Time to first success – geometric distribution] Conditional distribution of success times

hwu 2.2 Poisson process and related distributions    time 

hwu Poisson process with rate λ Number of events in a time interval of length t, N t, has a Poisson distribution with mean t

hwu Poisson process with rate λ Inter-event time, T, has an exponential distribution with parameter ( mean 1/ )

hwu given n events in time (0,t)  how many in time (0,s) (s < t)?  Conditional distribution of number of events

hwu given n events in time (0,t)  how many in time (0,s) (s < t)?  Conditional distribution of number of events Answer N s |N t = n ~ B(n,s/t)

hwu Splitting into subprocesses    time 

hwu Realisation of a Poisson process # events time

hwu X ~ Pn(  ), Y ~ Pn(  ) X,Y independent then we know X + Y ~ Pn(  +  ) Given X + Y = n, what is distribution of X?

hwu X ~ Pn(  ), Y ~ Pn(  ) X,Y independent then we know X + Y ~ Pn(  +  ) Given X + Y = n, what is distribution of X? Answer X|X+Y=n ~ B(n,p) where p =  /(  +  )

hwu 2.3 Inference for the Poisson distribution N i, i = 1, 2, …, r, i. i. d. Pn(λ), N=ΣN i

hwu CI for.

hwu 2.4 Dispersion and LR tests for Poisson data Homogeneity hypothesis H 0 : the N i s are i. i. d. Pn( ) (for some unknown ) Dispersion statistic (M = sample mean)

hwu Likelihood ratio statistic form for calculation – see p18 ◄◄

hwu §3 SINGLE CLASSIFICATIONS Binary classifications (a) N 1, N 2 independent Poisson, with N i ~ Pn( i ) or (b) fixed sample size, N 1 + N 2 = n, with N 1 ~ B(n,p 1 ) where p 1 = 1 /( 1 + 2 )

hwu Qualitative categories (a) N 1, N 2, …, N r independent Poisson, with N i ~ Pn(λ i ) or (b) fixed sample size n, with joint multinomial distribution Mn(n;p)

hwu Testing goodness of fit H 0 : p i =  i, i = 1,2, …, r This is the (Pearson) chi-square statistic

hwu The statistic often appears as

An alternative statistic is the LR statistic

hwu Sparse data/small expected frequencies ensure m i  1 for all cells, and m i  5 for at least about 80% of the cells if not - combine adjacent cells sensibly

hwu Goodness-of-fit tests for frequency distributions - very well-known application of the statistic (see Illustration 3.4 p 22/23)

hwu Residuals (standardised)

hwu Residuals (standardised) simpler version

hwu Number of papers per author 1 2 3 4 5 6 7 8 9 10 11 Number of authors 1062 263 120 50 22 7 6 2 0 1 1 MAJOR ILLUSTRATION 1 Publish and be modelled Model

hwu MAJOR ILLUSTRATION 2 Birds in hedges Hedge type i A B C D EF G Hedge length (m) l i 2320 2460 2455 2805 2335 2645 2099 Number of pairs n i 14 16 14 2615 40 71 Model N i ~ Pn( i l i )

hwu Example 14 Numbers of mice bearing tumours in treated and control groups TreatedControlTotal Tumours459 No tumours127486 Total167995 §4 TWO-WAY CLASSIFICATIONS

hwu Example 15 Patients in clinical trial DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100

hwu Patients in clinical trial – take 2 DrugPlaceboTotal Side-effects15 30 No side-effects35 70 Total50 100

4.1 Factors and responses F × R tables R × F, R × R (F × F ?) Qualitative, ordered, quantitative Analysis the same - interpretation may be different hwu

A two-way table is often called a “contingency table” (especially in R  R case). hwu

ExposedNot exposedTotal Diseasen 11 n 12 n 1● No diseasen 21 n 22 n 2● Totaln ●1 n ●2 n ●● = n Notation (2  2 case, easily extended)

hwu Three possibilities One overall sample, each subject classified according to 2 attributes - this is R × R Retrospective study Prospective study (use of treated and control groups; drug and placebo etc)

hwu (a) R × R case (a1) N ij ~ Pn(  ij ), independent or, with fixed table total (a2) Condition on n =  n ij : N|n ~ Mn(n ; p) where N = {N ij }, p = {p ij }. 4.2 Distribution theory and tests for r × s tables

hwu (b) F × R case Condition on the observed marginal totals nj =  n ij for the s categories of F (  condition on n and n 1 )  s independent multinomials

hwu Usual hypotheses (a1) N ij ~ Pn(  ij ), independent H 0 : variables/responses are independent  ij =  i  j /  = k  i (a2) Multinomial data (table total fixed) H 0 : variables/responses are independent P(row i and column j) = P(row i)P(column j)

hwu (b) Condition on n and n j (fixed column totals) N ij ~ Bi( n j, p ij ) j = 1,2, …, s ; independent H 0 : response is homogeneous (p ij = p i for all j) i.e. response has the same distribution for all levels of the factor

hwu where m ij = n i  n j /n as before Tests of H 0 The χ 2 (Pearson) statistic:

hwu OR: test based on the LR statistic Y 2 Illustration: tonsils data – see p27 In R Pearson/X 2 : read data in using “matrix” then use “chisq.test” LR Y 2 : calculate it directly (or get it from the results of fitting a “log-linear model”- see later)

hwu Statistical tests (a) Using Pearson’s χ2 4.3 The 2  2 table DrugPlaceboTotal Side-effects15419 No side-effects354681 Total50 100

hwu where m ij = n i  n j /n

hwu Yates (continuity) correction Subtract 0.5 from |O – E| before squaring it Performing the test in R n.pat=matrix(c(15,35,4,46),2,2) chisq.test(n.pat)

hwu (b) Using deviance/LR statistic Y 2 (c) Comparing binomial probabilities (d) Fisher’s exact test

hwu DrugPlaceboTotal Side-effects154 N 19 No side-effects354681 Total50 100

hwu Under a random allocation one-sided P-value = P(N  4) = 0.0047

hwu In the 2  2 table, the H 0 : independence condition is equivalent to  11  22 =  12  21 Let λ = log(  11  22 /  12  21 ) Then we have H 0 : λ = 0 λ is the “log odds ratio” 4.4 Log odds, combining and collapsing tables, interactions

hwu The “λ = 0” hypothesis is often called the “no association” hypothesis.

hwu The odds ratio is  11  22 /  12  21 Sample equivalent is

hwu The odds ratio (or log odds ratio) provides a measure of association for the factors in the table. no association  odds ratio = 1  log odds ratio = 0

hwu Don’t combine heterogeneous tables!

hwu Interaction An interaction exists between two factors when the effect of one factor is different at different levels of another factor.

§5 INTRODUCTION TO GENERALISED LINEAR MODELS (GLMs) Normal linear model Y|x ~ N with E[Y|x]=  +  x or E[Y|x]=  0 +  1 x 1 +  2 x 2 + … +  r x r =  x i.e. E[Y|x] =  (x) =  x

hwu We are explaining  (x) using a linear predictor (a linear function of the explanatory data) Generalised linear model Now we set g(  (x)) =  x for some function g We explain g(  (x)) using a linear function of the explanatory data, where g is called the link function

hwu e.g. modelling a Poisson mean we use a log link g( ) = log We use a linear predictor to explain log rather than itself : the model is Y|x ~ Pn with mean λ x with log λ x =  +  x or log λ x =  x This is a log-linear model

hwu An example is a trend model in which we use log i =  +  i Another example is a cyclic model in which we use log i =  0 +  1 cosθ i +  2 sinθ i

hwu §6 MODELS FOR SINGLE CLASSIFICATIONS 6.1 Single classifications - trend models Data: numbers in r categories Model: N i, i = 1, 2, …, r, independent Pn(λ i )

hwu Basic case H 0 : λ i ’s equal v H 1 : λ i ’s follow a trend Let X j be category of observation j P(X j = i) = 1/r Test based on see Illustration 6.1

hwu A more general model N i independent Pn(λ i ) with Log-linear model

hwu It is a linear regression model for logλ i and a non-linear regression model for λ i. It is a generalised linear model. Here the link between the parameter we are estimating and the linear estimator is the log function - it is a “log link”.

hwu Fitting in R Example 13: stressful events data >n=c(15,11, …, 1, 4) >r=length(n) >i=1:r

hwu >n=c(15,11, …, 1, 4) response vector >r=length(n) >i=1:r explanatory vector model >stress=glm(n~i,family=poisson)

hwu >summary(stress) Call: glm(formula = n ~ i, family = poisson) model being fitted Deviance Residuals: Min 1Q Median 3Q Max -1.9886 -0.9631 0.1737 0.5131 2.0362 summary information on the residuals Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.80316 0.14816 18.920 < 2e-16 *** i -0.08377 0.01680 -4.986 6.15e-07 *** information on the fitted parameters

hwu Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 50.843 on 17 degrees of freedom Residual deviance: 24.570 on 16 degrees of freedom deviances (Y2 statistics) AIC: 95.825 Number of Fisher Scoring iterations: 4

hwu Fitted mean is e.g. for date 6, i = 6 and fitted mean is exp(2.30054) = 9.980

hwu Fitted model

hwu Test of H 0 : no trend  the null fit, all fitted values equal (to the observed mean) Y 2 = 50.84 (~  2 on 17df) The trend model  fitted values exp(2.80316-0.08377i) Y 2 = 24.57 (~  2 on 16df) Crude 95% CI for slope is -0.084 ± 2(0.0168) i.e. -0.084 ± 0.034

hwu The lower the value of the residual deviance, the better in general is the fit of the model.

hwu Basic residuals

hwu 6.2 Taking into account a deterministic denominator – using an “offset” for the “exposure” Model: N x ~ Pn(λ x ) where E[N x ] = λ x = E x bθ x logλ x = logE x + c + dx See the Gompertz model example (p 40, data in Example 26)

hwu We include a term “offset(logE)” in the formula for the linear predictor: in R model = glm(n.deaths ~ age + offset(log(exposure)), family = poisson) Fitted value is the estimate of the expected response per unit of exposure (i.e. per unit of the offset E)

hwu §7 LOGISTIC REGRESSION for modelling proportions we have a binary response for each item and a quantitative explanatory variable for example: dependence of the proportion of insects killed in a chamber on the concentration of a chemical present – we want to predict the proportion killed from the concentration

hwu for example: dependence of the proportion of  women who smoke - on age  metal bars on test which fail - on pressure applied  policies which give rise to claims – on sum insured Model: # successes at value x i of explanatory variable: N i ~ bi(n i, π i )

hwu We use a glm – we do not predict π i directly; we predict a function of π i called the logit of π i. The logit function is given by: It is the “log odds”.

See Illustration 7.1 p 43: proportion v dose

logit(proportion) v dose

hwu This leads to the “logistic regression” model [ c.f. log linear model N i ~ Poisson(λ i ) with log λ i = a + bx i ]

hwu We are using a logit link We use a linear predictor to explain rather than  itself

hwu The method based on the use of this model is called logistic regression

hwu Data: explanatory # successes group observed variable value size proportion x 1 n 11 n 1 n 11 /n 1 x 2 n 21 n 2 n 21 /n 2 ……. x s n s1 n s n s1 /n s

hwu In R we declare the proportion of successes as the response and include the group sizes as a set of weights drug.mod1 = glm(propdead ~ dose, weights = groupsize, family = binomial) explanatory vector is dose note the family declaration

hwu RHS of model can be extended if required to include additional explanatory variables and factors e.g. mod3 = glm(mat3 ~ age+socialclass+gender)

hwu drug.mod – see output p44 Coefficients very highly significant (***) Null deviance 298 on 9df Residual deviance 17.2 on 8df But … residual v fitted plot and … fitted v observed proportions plot

model with a quadratic term (dose^2)

hwu 8.1 Log-linear models for two-way classifications N ij ~ Pn(  ij ), i= 1,2, …, r ; j = 1,2, …, s H 0 : variables are independent  ij =  i  j /  §8 MODELS FOR TWO-WAY AND THREE-WAY CLASSIFICATIONS

hwu  log  ij = log  i + log  j  log     row effect  overall effect  column effect

hwu We “explain” log  ij in terms of additive effects: log  ij =  + α i + β j Fitted values are the expected frequencies Fitting process gives us the value of Y 2 = -2logλ

hwu N ij ~ Pn(  ij ), independent, with log  ij =  + α i + β j Declare the response vector (the cell frequencies) and the row/column codes as factors then use > name = glm(…) Fitting a log-linear model

hwu Tonsils data (Example 16) n.tonsils = c(19,497,29,560,24,269) rc = factor(c(1,2,1,2,1,2)) cc = factor(c(1,1,2,2,3,3)) tonsils.mod1 = glm(n.tonsils ~ rc + cc, family=poisson)

Call: glm(formula = n.tonsils2 ~ rc + cc, family = poisson) Deviance Residuals: 1 2 3 4 5 6 -1.54915 0.34153 -0.24416 0.05645 2.11018 -0.53736 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.27998 0.12287 26.696 < 2e-16 *** rc2 2.91326 0.12094 24.087 < 2e-16 *** cc2 0.13232 0.06030 2.195 0.0282 * cc3 -0.56593 0.07315 -7.737 1.02e-14 *** --- Null deviance: 1487.217 on 5 degrees of freedom Residual deviance: 7.321 on 2 degrees of freedom  Y 2 = - 2logλ

hwu The fit of the “independent attributes” model is not good

hwu > n.patients = c(15, 4, 35, 46) > rc = factor(c(1, 1, 2, 2)) > cc = factor(c(1, 2, 1, 2)) > pat.mod1 = glm(n.patients ~ rc + cc, family = poisson) Patients data (Example 15)

Call: glm(formula = n.patients ~ rc + cc, family = poisson) Deviance Residuals: 1 2 3 4 1.6440 -2.0199 -0.8850 0.8457 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.251e+00 2.502e-01 8.996 < 2e-16 *** rc2 1.450e+00 2.549e-01 5.689 1.28e-08 *** cc2 2.184e-10 2.000e-01 1.09e-09 1 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 49.6661 on 3 degrees of freedom Residual deviance: 8.2812 on 1 degrees of freedom AIC: 33.172

hwu fitted coefficients: coef(pat.mod1) (Intercept) rc2 cc2 2.251292e+00 1.450010e+00 2.183513e-10 fitted values: fitted(pat.mod1) 1 2 3 4 9.5 9.5 40.5 40.5

hwu Estimates are Predictors for cells 1,1 and 1,2 are 2.251292 : exp(2.251292) = 9.5 exp(3.701302) = 40.5 Predictors for cells 2,1 and 2,2 are 2.251292 + 1.450010 = 3.701302 :

hwu Residual deviance: 8.2812 on 1 degree of freedom  Y 2 for testing the model i.e. for testing H 0 : response is homogeneous/ column distributions are the same/ no association between response and treatment group The lower the value of the residual deviance, the better in general is the fit of the model. Here the fit of the additive model is very poor (we have of course already concluded that there is an association – P-value about 1%).

hwu 8.2 Two-way classifications - taking into account a deterministic denominator See the grouse data (Illustration 8.3 p50, data in Example 25) Model: N ij ~ Pn(λ ij ) where E[N ij ] = λ ij = E ij exp(  + α i + β j ) logE[N ij /E ij ] =  + α i + β j i.e. logλ ij = logE ij +  + α i + β j

hwu We include a term “offset(logE)” in the formula for the linear predictor Fitted value is the estimate of the expected response per unit of exposure (i.e. per unit of the offset E)

hwu 8.3 Log-linear models for three-way classifications Each subject classified according to 3 factors/variables with r,s,t levels respecitvely N ijk ~ Pn(  ijk ) with log  ijk =  + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk + (αβγ) ijk r  s  t parameters

hwu Model with two factors and an interaction (no longer additive) is log  ij =  + α i + β j + (αβ) ij Recall “interaction”

hwu Range of possible models/dependencies From 1 Complete independence model formula: A + B + C link: log  ijk =  + α i + β j + γ k notation: [A][B][C] df: rst – r – s – t + 2 8.4 Hierarchic log-linear models Interpretation!

hwu …. through 2 One interaction (B and C say) model formula: A + B*C link: log  ijk =  + α i + β j + γ k + (βγ) jk notation: [A][BC] df: rst – r – st + 1

hwu …. to 5 All possible interactions model formula: A*B*C notation: [ABC] df: 0

hwu Model selection: by backward elimination or forward selection through the hierarchy of models containing all 3 variables

hwu saturated [ABC] [AB] [AC] [BC] [AB] [AC] [AB] [BC] [AC][BC] [AB] [C] [A] [BC] [AC] [B] [A] [B] [C] independence

hwu Our models can include mean (intercept) + factor effects + 2-way interactions + 3-way interaction

hwu Illustration 8.4 Models for lizards data (Example 29) liz = array(c(32, 86, 11, 35, 61, 73, 41, 70), dim = c(2, 2, 2)) n.liz = as.vector(liz) s = factor(c(1,1,1,1,2,2,2,2))  species d = factor(c(1, 1, 2, 2, 1, 1, 2, 2))  diameter of perch h = factor(c(1,2,1,2,1,2,1,2))  height of perch

hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) liz.mod3 = glm(n.liz ~ s + d*h, family = poisson) liz.mod4 = glm(n.liz ~ s*h + d, family = poisson) liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson) liz.mod6 = glm(n.liz ~ s*d + d*h, family = poisson )

hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) 25.04 on 4df liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) † 12.43 on 3df liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson) liz.mod6 = glm(n.liz ~ s*d + d*h, family = poisson )

hwu Forward selection liz.mod1 = glm(n.liz ~ s + d + h, family = poisson) liz.mod2 = glm(n.liz ~ s*d + h, family = poisson) † liz.mod5 = glm(n.liz ~ s*d + s*h, family = poisson)† 2.03 on 2df

hwu > summary(liz.mod5) Call: glm(formula = n.liz ~ s * d + s * h, family = poisson) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.4320 0.1601 21.436 < 2e-16 *** s2 0.5895 0.1970 2.992 0.002769 ** d2 -0.9420 0.1738 -5.420 5.97e-08 *** h2 1.0346 0.1775 5.827 5.63e-09 *** s2:d2 0.7537 0.2161 3.488 0.000486 *** s2:h2 -0.6967 0.2198 -3.170 0.001526 ** Null deviance: 98.5830 on 7 degrees of freedom Residual deviance: 2.0256 on 2 degrees of freedom

hwu Number of papers per author 1 2 3 4 5 6 7 8 9 10 11 Number of authors 1062 263 120 50 22 7 6 2 0 1 1 MAJOR ILLUSTRATION 1 Model

MAJOR ILLUSTRATION 2 Hedge type i A B C D EF G Hedge length (m) l i 2320 2460 2455 2805 2335 2645 2099 Number of pairs n i 14 16 14 2615 40 71 Model N i ~ Pn( i l i )

Cyclic models

hwu Model N i independent Pn(λ i ) with Explanatory variable: the category/month i has been transformed into an angle  i

hwu It is another example of a non-linear regression model for Poisson responses. It is a generalised linear model.

hwu Fitting in R >n=c(40, 34, …, 33, 38) response vector >r=length(n) >i=1:r >th=2*pi*i/r explanatory vector model >leuk=glm(n~cos(th) + sin(th),family=poisson)

hwu Fitted mean is

hwu Fitted model

hwu MaleFemale Cinema often2221 Not often2012 F73DB3 CDA Data from class

hwu MaleFemale Cinema often222143 Not often201232 423375

MaleFemale Cinema often222143 Not often201232 423375 P(often|male) = 22/42 = 0.524 P(often|female) = 21/33 = 0.636 significant difference (on these numbers)? is there an association between gender and cinema attendance?

hwu Null hypothesis H 0 : no association between gender and cinema attendance Alternative: not H 0 Under H 0 we expect 42  43/75 = 24.08 in cell 1,1 etc.

hwu > matcinema=matrix(c(22,20,21,12),2,2) > chisq.test(matcinema) Pearson's Chi-squared test with Yates' continuity correction data: matcinema X-squared = 0.5522, df = 1, p-value = 0.4574 > chisq.test(matcinema)$expected [,1] [,2] [1,] 24.08 18.92 [2,] 17.92 14.08

hwu > matcinema=matrix(c(22,20,21,12),2,2) > chisq.test(matcinema) Pearson's Chi-squared test with Yates' continuity correction data: matcinema X-squared = 0.5522, df = 1, p-value = 0.4574 > chisq.test(matcinema)$expected [,1] [,2] null hypothesis can stand [1,] 24.08 18.92 no association between gender [2,] 17.92 14.08 and cinema attendance

hwu MaleFemale Cinema often110105215 Not often10060160 210165 P(often|male) = 110/210 = 0.524 P(often|female) = 105/60 = 0.636 significant difference (on these numbers)? more students, same proportions

hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2

hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2 X-squared = 4.3361, df = 1, p-value = 0.03731 > chisq.test(matcinema2)$expected [,1] [,2] [1,] 120.4 94.6 [2,] 89.6 70.4

hwu > matcinema2=matrix(c(110,100,105,60),2,2) > chisq.test(matcinema2) Pearson's Chi-squared test with Yates' continuity correction data: matcinema2 X-squared = 4.3361, df = 1, p-value = 0.03731 > chisq.test(matcinema2)$expected [,1] [,2] null hypothesis is rejected [1,] 120.4 94.6 there IS an association between [2,] 89.6 70.4 gender and cinema attendance

hwu FIN

F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)

Similar presentations

Presentation on theme: "F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)

Similar presentations

Presentation on theme: "F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)"— Presentation transcript:

Similar presentations

About project

Feedback