The use of the Chi-square test when observations are dependent by Austina S S Clark University of Otago, New Zealand.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Chapter 6 Sampling and Sampling Distributions
June 9, 2008Stat Lecture 8 - Sampling Distributions 1 Introduction to Inference Sampling Distributions Statistics Lecture 8.
i) Two way ANOVA without replication
Repeated Measure Ideally, we want the data to maintain compound symmetry if we want to justify using univariate approaches to deal with repeated measures.
Inference about the Difference Between the
Chapter 13: Inference for Distributions of Categorical Data
Do Angry People Have More Heart Disease?
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of.
1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Chapter 7 Sampling and Sampling Distributions
/ 151 Profile of 213 cases infected with influenza A (H1N1) in Eastern Anatolia Serhat Vancelik, Zekeriya Akturk, Rukiye Cetin Seckin, Hamit Acemoglu
BCOR 1020 Business Statistics
Class notes for ISE 201 San Jose State University
Inferences About Process Quality
Analysis of Variance & Multivariate Analysis of Variance
5-3 Inference on the Means of Two Populations, Variances Unknown
Getting Started with Hypothesis Testing The Single Sample.
Correlation. The sample covariance matrix: where.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Hypothesis Testing.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
Correlation.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
Chi-square test or c2 test
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
The binomial applied: absolute and relative risks, chi-square.
Exploratory Data Analysis Observations of a single variable.
1 Nonparametric Statistical Techniques Chapter 17.
+ Chi Square Test Homogeneity or Independence( Association)
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Chapter Outline Goodness of Fit test Test of Independence.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 Math 4030 – 10b Inferences Concerning Proportions.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 3: Uncertainty "variation arises in data generated by a model" "how to transform knowledge of this variation into statements about the uncertainty.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Chapter 6 Sampling and Sampling Distributions
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Estimating standard error using bootstrap
Lecture8 Test forcomparison of proportion
Psychology 202a Advanced Psychological Statistics
Independent Samples: Comparing Means
Chapter 10: One- and Two-Sample Tests of Hypotheses:
Multivariate Statistical Methods
ESTIMATION.
Presentation transcript:

The use of the Chi-square test when observations are dependent by Austina S S Clark University of Otago, New Zealand

Outline of the talk Motivation Introduction Methodology Example Simulation

Introduction When the Chi-square test is applied to test the association between two binomial distributions, we usually assume that cell observations are independent. If some of the cells are dependent we would like to investigate: 1. how to implement the Chi-square test and 2. how to find the test statistics and the associated degrees of freedom.

We will use an example of influenza symptoms of two groups of patients to illustrate this method. One group of patients suffered from H1N1 influenza 09 and the other from seasonal influenza. There were twelve symptoms collected for each patient and these symptoms were not totally independent.

Methods We review the medical records of all sixty four adult patients (18 years old) with a laboratory confirmed diagnosis of two types of influenza, namely seasonal influenza (F) and H1N1 influenza 09 (S), between 17 June and 31 July, 2009 in an Australian hospital. Twelve symptoms were extracted from each patient’s records using 0 for no symptom and 1 for the symptom. Some of the symptoms are not independent.

. We examined the correlation matrices for the two groups of patients, F (seasonal influenza) and S (H1N1 09). If the correlation was significant then we calculated the two covariance matrices respectively and then pooled them together to form a pooled covariance matrix Next we found out the mean proportion of symptoms for each of the symptoms, say p. and

The layout of the results are as shown below S1S2S Sp F S

In order to find the true proportion difference between the two groups we need to find the difference between and. Since there is correlation between the p variables we can not use the Penrose distance (Manly B F J, 1994). However, we have instead two alternatives to incorporate the correlation. Firstly we apply the Mahalanobis distance,, (Manly, 1994), which takes into account the correlations between variables, where

can be thought of as a multivariate difference for the two observations and, taking account of all p variables. We assume that the populations which and come from are multivariate normally distributed - then the values of will follow a chi-square distribution with p degrees of freedom. Alternatively we may apply the method suggested by Greenhouse S W and Geisser S (1959) by transforming.

Let then, where are not independent. Now let. The values of follows a chi-square distribution, where is a multiplier and can be approximated (Satterthwaite F E, 1941, 1946).

Next we find the eigenvectors,, and eigenvalues,, of the covariance matrix. Let, then, where are independent. Next let and

This indicates that the values of also follows the chi-square distribution. The properties of the expected value and variance of and can be used to find values of and. It can be deduced that where are the eigenvalues of.

We also find that This follows that and

Example As mentioned early, we review the medical records of sixty four adult patients with a laboratory confirmed diagnosis of two types of influenza. Of these 64 patients,16 had seasonal influenza (F) and 48 had H1N1 09(S). All patients were admitted between 17 June and 31 July, 2009 in an Australian hospital. The aim here is to compare the twelve clinical symptoms presented by these two groups of patients.

These 12 symptoms are listed below: S1: coryza S2: fever S3: cough S4: breathlessness S5: chest pain S6: sore throat S7: lethargy S8: myalgia S9: vomiting S10: diarrhoea S11: abdominal pain S12: other gastro-intestine upset

Since these symptoms are not totally independent, we will use the methods mentioned above. The results are: Method 1: = , which follows a distribution with p-value= Method 2: = , which follows a distribution with =0.2873, = and p-value=

Results Both methods showed that there is no significant difference of the twelve symptoms between the two types of influenza. Patients with H1N109 (S) were significantly younger than patients with seasonal influenza (F), vs with p-value < The mean duration of symptoms prior to presentation was 4 days, with fever, cough and dyspnoea being the most common symptoms in both groups. Pneumonia occurred in 44% and 38% of H1N1 09 and seasonal influenza patients respectively.

Conclusion This study shows that the H1N1 09 influenza virus causes clinical disease in humans comparable to the seasonal influenza strains in this Australian city during the period 17 June to 31 July, 2009.

Simulation We used MATLAB and simulated 200,000 times of the proportions of the twelve symptoms (for both methods) for the two groups of influenza respectively. The results are shown below.

References Greenhouse S. W. and Geisser S. (1959). On methods in the analysis of profile data. Psychometrika, 24, Huynh H. and Feldt L.S. (1976). Estimation of the Box correction for degree of freedom from sample data in randomized block and split plot designs. JEBS, 1, Manly B. F. J. (1994). Multivariate statistical Methods. A Primer. Chapman & Hall. Satterthwaite F.E. (1946). An approximate distribution of estimates of variance components. Biometrics bulletin, 2,

The end and thank you.