Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics.

Slides:



Advertisements
Similar presentations
Chi-square, Goodness of fit, and Contingency Tables
Advertisements

Contingency Table Analysis Mary Whiteside, Ph.D..
Chi Square Test X2.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Hypothesis Testing IV Chi Square.
Statistical Inference for Frequency Data Chapter 16.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Independent Sample T-test Formula
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
Chi Square Test Dealing with categorical dependant variable.
Chi-square Test of Independence
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Categorical Data Prof. Andy Field.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Statistical Analysis Statistical Analysis
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Chapter Outline Goodness of Fit test Test of Independence.
State the ‘null hypothesis’ State the ‘alternative hypothesis’ State either one-tailed or two-tailed test State the chosen statistical test with reasons.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chapter 13 Understanding research results: statistical inference.
Statistics 300: Elementary Statistics Section 11-3.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.
Chi Square Test Dr. Asif Rehman.
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
Test of independence: Contingency Table
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chi-Square hypothesis testing
Presentation 12 Chi-Square test.
Association between two categorical variables
Chapter 12 Tests with Qualitative Data
Hypothesis testing. Chi-square test
Categorical Data Aims Loglinear models Categorical data
Qualitative data – tests of association
Data Analysis for Two-Way Tables
The Chi-Square Distribution and Test for Independence
Consider this table: The Χ2 Test of Independence
Chi Square Two-way Tables
Hypothesis testing. Chi-square test
Chapter 11: Inference for Distributions of Categorical Data
Association, correlation and regression in biomedical research
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Analyzing the Association Between Categorical Variables
Inference for Two Way Tables
Chapter Outline Goodness of Fit test Test of Independence.
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Presentation transcript:

Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL Biostatistics

Purpose To measure discontinuous categorical/binned data in which a number of subjects fall into categories We want to compare our observed data to what we expect to see. Due to chance? Due to association? When can we use the Chi-Square Test? ◦ Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions

Assumptions: 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5

Conducting Chi-Square Analysis 1) Make a hypothesis based on your basic biological question 2) Determine the expected frequencies 3) Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E) 2 E 4) Find the degrees of freedom: (c-1)(r-1) 5) Find the chi-square statistic in the Chi-Square Distribution table 6) If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa.

Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black AntsTotal Observed Expected20 60 O-E (O-E) 2 E χ 2 = 1.90 H O : Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. H A : Horned lizards eat more amounts of one species of ants than the others. χ 2 = Sum of all: (O-E) 2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α = 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.

Example 1: Testing for Proportions χ 2 α=0.05 = 5.991

Example 1: Testing for Proportions Chi-square statistic: χ 2 = Our calculated value: χ 2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance > 1.90 ∴ We do not reject our null hypothesis. Leaf Cutter Ants Carpenter Ants Black AntsTotal Observed Expected20 60 O-E (O-E) 2 E χ 2 = 1.90

SAS: Example 1 Included to format the table Define your data Indicate what your want in your output

SAS: Example 1

SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis.

SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis.

SAS: Example 1

Example 2: Testing Association c cellchi2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics H O : Gender and eye colour are not associated with each other. H A : Gender and eye colour are associated with each other.

Example 2: More SAS Examples

(2-1)(3-1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis.

Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value.

Limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 ◦ To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* ◦ When there is only 1 degree of freedom, regular chi- test should not be used ◦ Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

What do these mean?

Likelihood Ratio Chi Square

Continuity-Adjusted Chi-Square Test

Mantel-Haenszel Chi-Square Test Q MH = (n-1)r 2 r 2 is the Pearson correlation coefficient (which also measures the linear association between row and column) ◦ ault/viewer.htm#procstat_freq_a htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom

Phi Coefficient

Contigency Coefficient

Cramer’s V

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet. Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION High Cholesterol Low Cholesterol Total Heart Disease15722 Expected Chi-Square No Heart Disease81018 Expected Chi-Square TOTAL Chi-Square Total2.28

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease15722 Expected Chi-Square No Heart Disease81018 Expected Chi-Square TOTAL Chi-Square Total1.42 (| | - 0.5) = 0.27

Example 1: Testing for Proportions χ 2 α=0.05 = 3.841

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet > 1.42 ∴ We do not reject our null hypothesis. High Cholesterol Low Cholesterol Total Heart Disease15722 Expected Chi-Square No Heart Disease81018 Expected Chi-Square TOTAL Chi-Square Total1.42

Fisher’s Exact Test Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. Two-Tail: Use this when there is no prior alternative.

Yates & 2 x 2 Contingency Tables

H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet.

Conclusion The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq

References Chi-Square Test Descriptions: %20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2): SAS Support website: “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001):