Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Testing Tukey’s Multiple comparison procedure

Similar presentations


Presentation on theme: "Multiple Testing Tukey’s Multiple comparison procedure"— Presentation transcript:

1 Multiple Testing Tukey’s Multiple comparison procedure
Scheffe’s multiple comparison procedure

2 Multiple Testing – a Simple Example
Suppose we are interested in testing to see if two parameters (q1 and q2) are equal to zero. There are two approaches We could test each parameter separately H0: q1 = 0 against HA: q1  0 , then H0: q2 = 0 against HA: q2  0 We could develop an overall test H0: q1 = 0, q2= 0 against HA: q1  0 or q2  0

3 To test each parameter separately
then We might use the following test: then is chosen so that the probability of a Type I errorof each test is a.

4 To perform an overall test
H0: q1 = 0, q2= 0 against HA: q1  0 or q2  0 we might use the test is chosen so that the probability of a Type I error is a.

5

6

7

8

9

10

11

12 Multiple Comparison Tests
Post-hoc Tests Multiple Comparison Tests

13 Suppose we have p means An F-test has revealed that there are significant differences amongst the p means We want to perform an analysis to determine precisely where the differences exist.

14 Tukey’s Multiple Comparison Test

15 Let denote the standard error of each Tukey's Critical Differences Two means are declared significant if they differ by more than this amount. = the tabled value for Tukey’s studentized range p = no. of means, n = df for Error

16 Scheffe’s Multiple Comparison Test

17 Scheffe's Critical Differences (for Linear contrasts)
A linear contrast is declared significant if it exceeds this amount. = the tabled value for F distribution (p -1 = df for comparing p means, n = df for Error)

18 Scheffe's Critical Differences
(for comparing two means) Two means are declared significant if they differ by more than this amount.

19

20

21 Underlined groups have no significant differences

22 There are many multiple (post hoc) comparison procedures
Tukey’s Scheffe’, Duncan’s Multiple Range Neumann-Keuls etc Considerable controversy: “I have not included the multiple comparison methods of D.B. Duncan because I have been unable to understand their justification” H. Scheffe, Analysis of Variance

23 Logistic regression

24 Recall the simple linear regression model:
y = b0 + b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0 + b1x1 + b2x2 + … + + bpxp + e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1 , x2 , … , xp .

25 Now suppose the dependent variable y is binary.
It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used

26 Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum

27 The logisitic Regression Model
Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

28 Example: odds ratio, log odds ratio
Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

29 The logisitic Regression Model
Assumes the log odds ratio is linearly related to x. i. e. : In terms of the odds ratio

30 The logisitic Regression Model
Solving for p in terms x. or

31 Interpretation of the parameter b0 (determines the intercept)
x

32 Interpretation of the parameter b1 (determines when p is 0
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0)) p when x

33 Also when is the rate of increase in p with respect to x when p = 0.50

34 Interpretation of the parameter b1 (determines slope when p is 0.50 )
x

35 The data The data will for each case consist of
a value for x, the continuous independent variable a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

36

37 Estimation of the parameters
The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

38 Using SPSS to perform Logistic regression
Open the data file:

39 Choose from the menu: Analyze -> Regression -> Binary Logistic

40 The following dialogue box appears
Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

41 Here is the output The Estimates and their S.E.

42 The parameter Estimates

43 Interpretation of the parameter b0 (determines the intercept)
Interpretation of the parameter b1 (determines when p is 0.50 (along with b0))

44 Another interpretation of the parameter b1
is the rate of increase in p with respect to x when p = 0.50

45 The Multiple Logistic Regression model

46 Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc

47 Multiple Logistic Regression an example
In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight

48 For n = 223 infants in prenatal ward the following measurements were determined
X1 = gestational Age (weeks), X2 = Birth weight (grams) and Y = presence of BPD

49 The data

50 The results

51 Graph: Showing Risk of BPD vs GA and BrthWt

52 Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data

53 Example 1 In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol

54 Example 2 The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).

55 The study involved a dichotomous response Y
Success (no major parole violation) or Failure (returned to prison either as technical violators or with a new conviction) based on a one-year follow-up. The predictors of parole success included are: type of committed offence (Person offense or Other offense), Age (25 or Older or Under 25), Prior Record (No prior sentence or Prior Sentence), and Drug or Alcohol Dependency (No drug or Alcohol dependency or Drug and/or Alcohol dependency).

56 The data were randomly split into two parts
The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses. The second part of the data was set aside for a validation study of the model to be fitted in the first part.

57 Table

58 Analysis of a Two-way Frequency Table:

59 Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)

60 Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure)
The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.

61 Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol )
The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.

62 Conditional Distributions (Serum Cholesterol given Systolic Blood Pressure)

63 GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol

64 Notation: Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.

65 Different Models The Multinomial Model:
Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters pij

66 The Product Multinomial Model:
Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters pj|i

67 The Poisson Model: In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let mij denote the mean of xij.

68 Independence

69 Multinomial Model if independent and The estimated expected frequency in cell (i,j) in the case of independence is:

70 The same can be shown for the other two models – the Product Multinomial model and the Poisson model
namely The estimated expected frequency in cell (i,j) in the case of independence is: Standardized residuals are defined for each cell:

71 The Chi-Square Statistic
The Chi-Square test for independence Reject H0: independence if

72 Table Expected frequencies, Observed frequencies, Standardized Residuals
c2 = (p = )

73 Example In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied. The crime of the first victimization (X) and the crime of the second victimization (Y) were noted. The data were tabulated on the following slide

74 Table 1: Frequencies

75 Table 2: Standardized residuals

76 Table 3: Conditional distribution of second victimization given the first victimization (%)

77 Log Linear Model

78 Recall, if the two variables, rows (X) and columns (Y) are independent then

79 In general let then (1) where Equation (1) is called the log-linear model for the frequencies xij.

80 Note: X and Y are independent if
In this case the log-linear model becomes

81 Three-way Frequency Tables

82 Example Data from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962]) Variables Systolic Blood Pressure (X) < 127, , , 167+ Serum Cholesterol <200, , , 260+ Heart Disease Present, Absent The data is tabulated on the next slide

83 Three-way Frequency Table

84 Log-Linear model for three-way tables
Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

85 Hierarchical Log-linear models for categorical Data
For three way tables The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

86 1. Model: (All Main effects model)
ln mijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.

87 2. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.

88 3. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.

89 4. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.

90 5. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.

91 6. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.

92 7. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.

93 8. Model: ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

94 9. Model: (the saturated model)
ln mijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.

95 Hierarchical Log-linear models for 3 way table
Description [1][2][3] Mutual independence between all three variables. [1][23] Independence of Variable 1 with variables 2 and 3. [2][13] Independence of Variable 2 with variables 1 and 3. [3][12] Independence of Variable 3 with variables 1 and 2. [12][13] Conditional independence between variables 2 and 3 given variable 1. [12][23] Conditional independence between variables 1 and 3 given variable 2. [13][23] Conditional independence between variables 1 and 2 given variable 3. [12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable. [123] The saturated model

96 Maximum Likelihood Estimation
Log-Linear Model

97 For any Model it is possible to determine the maximum Likelihood Estimators of the parameters
Example Two-way table – independence – multinomial model or

98 Log-likelihood where With the model of independence

99 and with also

100 Let Now

101 Since

102 Now or

103 Hence and Similarly Finally

104 Hence Now and

105 Hence Note or

106 Comments Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables) In certain situations the equations need to be solved numerically For the saturated model (all interactions and main effects), the estimate of mijk… is xijk… .

107 Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the observed frequency table

108 Goodness of Fit Statistics
The Chi-squared statistic The Likelihood Ratio statistic: d.f. = # cells - # parameters fitted We reject the model if c2 or G2 is greater than

109 Example: Variables Systolic Blood Pressure (B) Serum Cholesterol (C) Coronary Heart Disease (H)

110 Goodness of fit testing of Models
MODEL DF LIKELIHOOD- PROB PEARSON PROB RATIO CHISQ CHISQ B,C,H B,CH C,BH H,BC BC,BH BH,CH n.s. CH,BC BC,BH,CH n.s. Possible Models: 1. [BH][CH] – B and C independent given H [BC][BH][CH] – all two factor interaction model

111 Model 1: [BH][CH] Log-linear parameters
Heart disease -Blood Pressure Interaction

112 Multiplicative effect
Log-Linear Model

113 Heart Disease - Cholesterol Interaction

114 Multiplicative effect

115 Model 2: [BC][BH][CH] Log-linear parameters
Blood pressure-Cholesterol interaction:

116 Multiplicative effect

117 Heart disease -Blood Pressure Interaction

118 Multiplicative effect

119 Heart Disease - Cholesterol Interaction

120 Multiplicative effect


Download ppt "Multiple Testing Tukey’s Multiple comparison procedure"

Similar presentations


Ads by Google