Hypothesis testing. Chi-square test

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
Hypothesis Testing IV Chi Square.
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
Chapter 13: The Chi-Square Test
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Presentation 12 Chi-Square test.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
EDRS 6208 Analysis and Interpretation of Data Non Parametric Tests
Common Nonparametric Statistical Techniques in Behavioral Sciences Chi Zhang, Ph.D. University of Miami June, 2005.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Angela Hebel Department of Natural Sciences
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chi Square Tests Chapter 17. Assumptions for Parametrics >Normal distributions >DV is at least scale >Random selection Sometimes other stuff: homogeneity,
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi Square Test Dr. Asif Rehman.
I. ANOVA revisited & reviewed
The Chi-square Statistic
Nonparametric Statistics
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chi-Square hypothesis testing
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Active Learning Lecture Slides
CHOOSING A STATISTICAL TEST
Qualitative data – tests of association
Statistics.
Environmental Modeling Basic Testing Methods - Statistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Data Analysis for Two-Way Tables
Chapter 11 Goodness-of-Fit and Contingency Tables
Nonparametric Statistical Methods: Overview and Examples
Nonparametric Statistics
The Chi-Square Distribution and Test for Independence
Nonparametric Statistical Methods: Overview and Examples
Chi Square Two-way Tables
Descriptive and inferential statistics. Confidence interval
Some Nonparametric Methods
Nonparametric Statistical Methods: Overview and Examples
Hypothesis testing. Chi-square test
Nonparametric Statistical Methods: Overview and Examples
Association, correlation and regression in biomedical research
Chapter 10 Analyzing the Association Between Categorical Variables
Hypothesis testing. Association and regression
Non – Parametric Test Dr. Anshul Singh Thapa.
Analyzing the Association Between Categorical Variables
Parametric versus Nonparametric (Chi-square)
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Non-parametric methods in statistical testing
UNIT-4.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Presentation transcript:

Hypothesis testing. Chi-square test Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Parametric and non-parametric tests Parametric test – to estimate at least one population parameter from sample statistics Assumption: the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population

Choosing a statistical test Choice of a statistical test depends on: Level of measurement for the dependent and independent variables Number of groups or dependent measures Number of units of observation Type of distribution The population parameter of interest (mean, variance, differences between means and/or variances)

Parametric and non-parametric tests

Normality test Normality tests are used to determine if a data set is modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. In descriptive statistics terms, a normality test measures a goodness of fit of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it is normally distributed.

Normality test Graphical methods An informal approach to testing normality is to compare a histogram of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small.

Normality test Frequentist tests Tests of univariate normality include the following: D'Agostino's K-squared test Jarque–Bera test Anderson–Darling test Cramér–von Mises criterion Lilliefors test Kolmogorov–Smirnov test Shapiro–Wilk test Etc.

Normality test Kolmogorov–Smirnov test K–S test is a nonparametric test of the equality of distributions that can be used to compare a sample with a reference distribution (1-sample K–S test), or to compare two samples (2-sample K–S test). K–S statistic quantifies a distance between the empirical distribution of the sample and the cumulative distribution of the reference distribution, or between the empirical distributions of two samples. The null hypothesis is that the sample is drawn from the reference distribution (in the 1-sample case) or that the samples are drawn from the same distribution (in the 2-sample case).

Normality test Kolmogorov–Smirnov test In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic.

Is there an association? Chi-square test Chi-square test is used to check for an association between 2 categorical variables. H0: There is no association between the variables. HA: There is an association between the variables. If two categorical variables are associated, it means the chance that an individual falls into a particular category for one variable depends upon the particular category they fall into for the other variable. Is there an association?

Assumptions A large sample of independent observations All expected counts should be ≥ 1 (no zeros) At least 80% of expected counts should ≥ 5

Chi-square test The following table presents the data on place of birth and alcohol consumption. The two variables of interest, place of birth and alcohol consumption, have r = 4 and c = 2, resulting in 4 x 2 = 8 combinations of categories. Place of birth Alcohol No alcohol Big city 620 75 Rural 240 41 Small town 130 29 Suburban 190 38

Chi-square test E11 = (695 x 1180) / 1363 E12 = (695 x 183) / 1363 Place of birth Alcohol No alcohol Total Big city O11 = 620 O12 = 75 R1 = 695 Rural O21 = 240 O22 = 41 R2 = 281 Small town O31 = 130 O32 = 29 R3 = 159 Suburb O41 = 190 O42 = 38 R4 = 228 C1 = 1180 C2 = 183 n=1363 E11 = (695 x 1180) / 1363 E12 = (695 x 183) / 1363 E21 = (281 x 1180) / 1363 E22 = (281 x 183) / 1363 E31 = (159 x 1180) / 1363 E32 = (159 x 183) / 1363 E41 = (228 x 1180) / 1363 E42 = (228 x 183) / 1363

Chi-square test The test statistic measures the difference between the observed the expected counts assuming independence. If the statistic is large, it implies that the observed counts are not close to the counts we would expect to see if the two variables were independent. Thus, 'large' χ2 gives evidence against H0, and supports HA. The p-value of the χ2 test is the probability that the χ2 statistic is as large or larger than the value we obtained if H0 is true. To get this probability we need to use a chi-square distribution with (r-1) x (c-1) df.

Association is not causation. Beware! Association is not causation. The observed association between two variables might be due to the action of a third, unobserved variable.

Special case In a lot of cases the categorical variables of interest have two levels each. In this case, we can summarize the data using a contingency table having 2 rows and 2 columns (2x2 table): In this case, the χ2 statistic has a simplified form: Under the null hypothesis, χ2 statistic has chi-square distribution with (2-1) x (2-1) = 1 degrees of freedom. Column 1 Column 2 Total Row 1 A B R1 Row 2 C D R2 C1 C2 n

Special case Gender Alcohol No alcohol Total Male 540 52 592 Female 325 31 356 865 83 948

Limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* When there is only 1 degree of freedom, regular chi-test should not be used Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

Fisher’s exact test This test is only available for 2 x 2 tables. For small n, the probability can be computed exactly by counting all possible tables that can be constructed based on the marginal frequencies. Thus, the Fisher exact test computes the exact probability under the null hypothesis of obtaining the current distribution of frequencies across cells, or one that is more uneven.

Fisher’s Exact Test Gender Dieting Non-dieting Total Male 1 11 12 Female 9 3 10 14 24 Gender Dieting Non-dieting Total Male a c a + c Female b d b + d a + b c + d a + b + c + d

Fisher’s Exact Test Gender Dieting Non-dieting Total Male 1 11 12 Female 9 3 10 14 24

Ordinal data independent samples. Mann-Whitney test Null hypothesis: Two sampled populations are equivalent in location (they have the same mean ranks). The observations from both groups are combined and ranked, with the average rank assigned in the case of ties. If the populations are identical in location, the ranks should be randomly mixed between the two samples.

Ordinal data independent samples. Kruskal-Wallis test Null hypothesis: K sampled populations are equivalent in location. The observations from all groups are combined and ranked, with the average rank assigned in the case of ties. If the populations are identical in location, the ranks should be randomly mixed between the K samples.

Ordinal data 2 related samples. Wilcoxon signed rank test Two related variables. No assumptions about the shape of distributions of the variables. Null hypothesis: Two variables have the same distribution. Takes into account information about the magnitude of differences within pairs and gives more weight to pairs that show large differences than to pairs that show small differences. Based on the ranks of the absolute values of the differences between the two variables.