Statistics for Everyone Workshop Fall 2010 Part 5 Comparing the Proportion of Scores in Different Categories With a Chi Square Test Workshop presented.

Slides:



Advertisements
Similar presentations
Hypothesis Testing and Comparing Two Proportions Hypothesis Testing: Deciding whether your data shows a “real” effect, or could have happened by chance.
Advertisements

Bivariate Analysis Cross-tabulation and chi-square.
Hypothesis Testing IV Chi Square.
Statistical Inference for Frequency Data Chapter 16.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
CJ 526 Statistical Analysis in Criminal Justice
Chi-square Test of Independence
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Statistics for Everyone Workshop Fall 2010 Part 6A Assessing the Relationship Between 2 Numerical Variables Using Correlation Workshop presented by Linda.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Statistics for Everyone Workshop Fall 2010 Part 3 Comparing 2 Conditions With a T Test Workshop by Linda Henkel and Laura McSweeney of Fairfield University.
Inferential Statistics
Presentation 12 Chi-Square test.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
How Can We Test whether Categorical Variables are Independent?
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
CJ 526 Statistical Analysis in Criminal Justice
Statistics for Everyone Workshop Summer 2011 Part 3 Two-Way Analysis of Variance: Examining the Individual and Joint Effects of 2 Independent Variables.
Statistics for Everyone Workshop Fall 2010 Part 6B Assessing the Relationship Between More than 2 Numerical Variables Using Correlation and Multiple Regression.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Chapter Outline Goodness of Fit test Test of Independence.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Chi-Square Analyses.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
PART 2 SPSS (the Statistical Package for the Social Sciences)
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Chapter 12 Chi-Square Tests and Nonparametric Tests
Presentation 12 Chi-Square test.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Hypothesis Testing Review
Qualitative data – tests of association
The Chi-Square Distribution and Test for Independence
Hypothesis Testing and Comparing Two Proportions
Chapter 10 Analyzing the Association Between Categorical Variables
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Analyzing the Association Between Categorical Variables
Presentation transcript:

Statistics for Everyone Workshop Fall 2010 Part 5 Comparing the Proportion of Scores in Different Categories With a Chi Square Test Workshop presented by Linda Henkel and Laura McSweeney of Fairfield University Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University

What if the data we want to analyze are categorical? Gender People who have vs. do not have this disease The nature of the research question is about “how many…” Are there more male than female fish in this bay compared to in other bays? Are there more people who have this disease when people drink lots of water vs. little water? Statistics as a Tool in Scientific Research: Comparing the Proportion of Scores in Different Categories With a Chi Square Test

The Chi Square Test Statistical test:  2 (pronounced with a hard “k” sound like kid, rhymes with tie) Used for: Analyzing categorical variables to determine if how many instances there are in the different categories is about what one would expect based on chance or some known expected distribution, or if the number of instances in some of the categories really are different Use when: One variable is categorical (with 2 or more levels) and is measured by the frequency of instances in each category e.g., the number of overweight, normal, or underweight people in a sample: Do people fall equally into the 3 categories? # of people with a disease as a function of gender: Do women show higher, lower, or similar rates of lung cancer as do men?

One-way Table : Used to determine if the probability of the outcomes of one categorical variable are significantly different from the predetermined probabilities (also called a goodness of fit test) Two-way Table : Used to determine if there is evidence of a relationship between 2 categorical variables (also called a test for independence) Same basic type of test is used for both cases, although the null (H 0 ) and research (H A ) hypotheses are different Chi Square Tests for:

Hypothesis Testing for a One-Way Table The chi square test on a one-way table allows a scientist to determine whether their research hypothesis was supported. The research hypothesis is pitted against a null hypothesis, which is based on predetermined probabilities: Equal probabilities in each category (e.g., you have 3 different weight classifications [underweight, normal, obese] thus 1/3 of the people in your sample should fall into each of those different categories) Unequal probabilities in each category (e.g., you already know from baseline rates of weight in the population at large that there are more overweight people than the other two categories, so if you want to find out if your sample is especially overweight, you need to consider those baseline proportions)

Example 1: Expected Proportions Are Same Is the incidence of gastrointestinal disease during an epidemic related to water consumption? Daily consumption of 8 oz. of water or more Total Observed Number with GI Disease

Calculating Expected Frequencies When Equal Proportions Are Expected Based on Chance How many people with GI disease would you expect to fall into each category if water consumption was not related to them having the disease? Divide the total number of observations you have (N) and divide by the number of categories (k) N = 40 # of categories = 4 So, the expected frequency in each category = N/k = 40/4 = 10

Is the incidence of gastrointestinal disease during an epidemic related to water consumption? H 0 : Proportion of population with GI disease should be the same (p = 1/4) for each water consumption level H A : At least one proportion is different from 1/4 Daily consumption of 8 oz. of water or more Total Observed Number with GI Disease Expected Number ¼ of 40 = 10 Example 1: Expected Proportions Are Same

Example 2: Expected Proportions Are Not the Same In 2009 a study of the primary disabilities of special education students in CA revealed that 43% of all special education students had learning disabilities, 25% had speech or language impairments, 8% had autism and the remaining students had other disabilities. The following table lists the primary disabilities of a random sample of special education students in Is there evidence that the distribution of primary disabilities is now different from the 2009 distribution? DisabilityObserved Number Learning Disability680 Speech/Language390 Autism120 Other435 Total1625

Calculating Expected Frequencies Based On Comparison Population H 0 : p learning = 43%, p speech = 25%, p autism = 8%, p other = 24% H A : At least one proportion is different DisabilityObserved NumberExpected Number Learning Disability680 Speech/Language390 Autism120 Other435 Total1625

Example 2: Expected Proportions are Not the Same DisabilityObserved NumberExpected Number Learning Disability *.43 =699 Speech/Language *.25 =406 Autism *.08=130 Other *.24 =390 Total1625

Hypothesis Testing for a One-Way Table The chi square test allows a researcher to determine whether the research hypothesis was supported Null hypothesis (H 0 ): There is no real difference in the proportions observed in each category in your sample and what you would expect to observe based on chance or prespecified values Research Hypothesis (H A ): There is a real difference in at least one proportion in your categories

Hypothesis Testing for a One-Way Table Do the data support the research hypothesis or not? Is there a real difference in the proportions seen in the categories, or are the obtained differences in proportions between categories no different than the comparison population (based on either chance or already known rates?)  Teaching tip: In order to understand what we mean by “a real difference,” students must understand probability

Understanding Probability p value = probability of results being due to chance When the p value is high (p >.05), the obtained difference is probably due to chance When the p value is low (p <.05), the obtained difference is probably NOT due to chance and more likely reflects a real difference

Understanding Probability p value = probability of results being due to chance [Probability of observing your data (or more severe) if H 0 were true] When the p value is high (p >.05), the obtained difference is probably due to chance [Data likely if H 0 were true] When the p value is low (p <.05), the obtained difference is probably NOT due to chance and more likely reflects a real difference [Data unlikely if H 0 were true, so data support H A ]

Understanding Probability In science, a p value of.05 is a conventionally accepted cutoff point for saying when a result is more likely due to chance or more likely due to a real effect Not significant = the obtained difference is probably due to chance; the proportions in the different categories do not appear to really differ from what would be expected based on chance; p >.05 Statistically significant = the obtained difference is probably NOT due to chance and is likely a real difference between your sample and what would be expected based on chance; p <.05

The Essence of a  2 Test To determine whether the data support the research hypothesis or the null hypothesis, a  2 value is calculated The  2 test basically examines how different the observed scores are from the expected scores; whether the observed scores are different enough from the expected scores for the researcher to confidently conclude that the research hypothesis was supported, that there is a real difference here Are the obtained values likely due to chance or not?

Chi-Square Test Statistic We want to see if the values we observed are close to what we would expect if H 0 were true (i.e., if there really was no difference) This can be taught without formulas but for the sake of thoroughness, here is the formula for the chi square test statistic: Here, O = observed frequency and E = expected frequency

Chi-Square Test Statistic Excel will calculate the value of chi square but you need to know the values of the observed frequencies (your data) and the expected proportions (based on either chance or some predetermined comparison population) Excel will also calculate a piece of information that is standard to report when reporting chi square which is the degrees of freedom (df). Here, df = # of categories – 1 = k – 1

Running a  2 test for a One-Way Table in Excel One-way: Use when one variable is categorical (with 2 or more levels) and is measured by the frequency of instances in each category Need: Observed frequencies (from your data) Note: These should be raw #s, not converted to percentages Expected proportions (ex: 1/3 =.33) based on chance or comparison population Sample size (N) and number of categories (k) To run: Open Excel file “SFE Statistical Tests” and go to page called Chi square test for One-way table Enter observed frequencies, expected proportions, the sample size and the number of categories Output: Computer calculates chi square value, df, and p value

Running a  2 test for a One-Way Table in SPSS One-way: Use when one variable is categorical (with 2 or more levels) and is measured by the frequency of instances in each category To run: Enter data in SPSS Analyze  Nonparametric  Chi-Square; Move the data to Test Variable List If each subcategory is equally likely, select “All expected frequencies equal”. If each subcategory is not equally likely, select “Values” then type in the probability for the first subcategory and then click Add. Continue adding probabilities until all subcategories have an associated probability. Output: Computer calculates chi square value, df, and p value

The Essence of a  2 Test Each  2 test gives you a  2 score, which can range from 0 and up The smaller the  2 score, the more likely the difference between the observed and expected proportions in the different categories are just due to chance The bigger the  2 score, the less likely the difference between the observed and expected proportions in the different categories are due to chance and reflect a real difference So big values of  2 will be associated with small p values that indicate the differences or relationship are significant (p <.05) Little values of  2 (i.e., close to 0) will be associated with larger p values that indicate that the differences are not significant (p >.05)

Example 1: Is the incidence of gastrointestinal disease during an epidemic related to water consumption? Chi-Square Test Statistic: X^2 =9.8 degrees of freedom: df = k – 1 =3 p-value =0.021 The p value of.021 is less than.05, thus you do have evidence of a relationship between incidence of gastrointestinal disease and water consumption.

Example 2: Is there evidence that the distribution of primary disabilities of CA special education students in 2010 is different than in 2009? Chi-Square Test Statistic: X^2 =7.115 degrees of freedom: df = k – 1 =3 p-value = The p value of.068 is greater than.05, thus you do not have evidence that the distribution of primary disabilities in 2010 is different from the 2009 distribution.

Interpreting  2 Tests Cardinal rule: Scientists do not say “prove”! Conclusions are based on probability (likely due to chance, likely a real effect…). Be explicit about this to your students. Based on p value, determine whether you have evidence to conclude the difference was probably real or was probably due to chance: Which hypothesis is supported? p <.05: Significant Reject null hypothesis and support research hypothesis (the difference between at least one observed and expected frequency was probably real) p >.05: Not significant Retain null hypothesis and reject research hypothesis (any difference between observed and expected frequencies was probably due to chance)

Teaching Tips Students have trouble understanding what is less than.05 and what is greater, so a little redundancy will go a long way! Whenever you say “p is less than point oh-five” also say, “so the probability that this is due to chance is less than 5%, so it’s probably a real effect.” Whenever you say “p is greater than point oh-five” also say, “so the probability that this is due to chance is greater than 5%, so there’s just not enough evidence to conclude that it’s a real effect – the observed and expected frequencies are not really different” In other words, read the p value as a percentage, as odds, “the odds that this difference is due to chance is only 1%, so it’s probably not chance…”

Reporting  2 Tests Results State key findings in understandable sentences Use descriptive and inferential statistics to supplement verbal description by putting them in parentheses and at the end of the sentence Use a table and/or figure to illustrate findings

Reporting  2 Tests Results Step 1: Write a sentence that clearly indicates what statistical analysis you used A chi square test was conducted to determine whether [state your research question] A chi square test was conducted to determine whether the incidence of gastrointestinal disease during an epidemic was related to the amount of water consumed. A chi square test was conducted to determine whether the distribution of primary disabilities of CA special education students in 2010 follows the 2009 distribution (43% of students have learning disabilities, 25% have speech/language impairments, 8% have autism and the rest have other primary disabilities).

Reporting  2 Tests Results Step 2: Write a sentence that clearly indicates what pattern you saw in your data analysis If a significant difference was obtained, describe what observed frequencies differed The number of people with gastrointestinal disease was higher when more water was consumed. If no significant difference was obtained, say so, and explain that the observed frequencies did not differ. The distribution of primary disabilities in 2010 was not significantly different from the distribution in 2009.

Reporting  2 Tests Results Step 3: Tack the inferential statistics onto the end Put the chi square test results at the end of the sentence using this format: chi square(df) = x.xx, p =.xx The number of people with gastrointestinal disease was higher when more water was consumed, chi square(3) = 9.80, p =.02. The distribution of primary disabilities in 2010 was not significantly different from the distribution in 2009, chi square(3) = 7.12, p =.07. Step 4: Make a table of observed frequencies

More Teaching Tips You can ask your students to report either: the exact p value (p =.03, p =.45) the cutoff: say either p.05 (not significant) You should specify which style you expect. Ambiguity confuses them! Tell students they can only use the word “significant” only when they mean it (i.e., the probability the results are due to chance is less than 5%) and to not use it with adjectives (i.e., they often mistakenly think one test can be “more significant” or “less significant” than another). Emphasize that “significant” is a cutoff that is either met or not met -- Just like you are either found guilty or not guilty, pregnant or not pregnant. There are no gradients. Lower p values = less likelihood results are due to chance, not “more significant”

One-way Table: Used to determine if the probability of the outcomes of one categorical variable are significantly different from the predetermined probabilities Two-way Table : Used to determine if there is evidence of a relationship between 2 categorical variables Same basic type of test is used for both cases, although the null (H 0 ) and research (H A ) hypotheses are different Chi Square Tests for:

Let your students know that it doesn’t matter which variable they use for columns and which they use for rows But if one variable has a lot of levels, it looks better in the table to have that be the variable represented in rows Teaching Tip:

Example of Two Way Table Is there evidence to indicate that diet and the presence/absence of cancer are independent? High Fat, No Fiber High Fat, Fiber Low Fat, No Fiber Low Fat, Fiber Totals Tumors No Tumors Totals30 120

Hypotheses Null hypothesis H 0 : The two categorical variables are independent; there is no relationship between the 2 categorical variables Research hypothesis H A : The two categorical variables are dependent; there is a relationship between the 2 categorical variables

Calculating Expected Counts in a Given Cell E = (row total * column total)/total number of observations High Fat, No Fiber High Fat, Fiber Low Fat, No Fiber Low Fat, Fiber Totals Tumor27 (30*80/120 = 20) 20 (20) 19 (20) 14 (20) 80 No Tumor 3 (30*40/120 = 10) 10 (10) 11 (10) 16 (10) 40 Totals30 120

Expected Counts in a Given Cell This can be taught without formulas but for the sake of thoroughness, here is the formula for the chi square test statistic O = observed frequency E = expected frequency df = (# of rows – 1)*(# of columns – 1)

Running the  2 Test for a Two-Way Table in Excel Two-way  2 : Use when two variables are categorical (with 2 or more levels each) and are measured by the frequency of instances in each category Need: Observed frequencies (from your data) Note: These should be raw #s, not converted to percentages df = (# of rows – 1)*(# of columns – 1). Note: Count rows and columns only for the data themselves, not for the totals, so in above example, (2-1)*(4-1)= 1*3=3 To run: Open Excel file “SFE Statistical Tests” and go to page called Chi square test for Two-way Table Enter in the observed frequencies (do not include totals!) and the degrees of freedom Output: Computer calculates chi square value and p value

Running the  2 Test for a Two-Way Table in SPSS Two-way  2 : Use when two variables are categorical (with 2 or more levels each) and are measured by the frequency of instances in each category To run: Enter data in SPSS Analyze  Descriptives  Crosstabs Choose one categorical variable to be “Row” (usually the IV) and the other one to the be “Column” (usually the DV) Check Statistics and choose Chi-Square and then Continue. Then click OK. Output: Computer calculates chi square value and p value

Example of Two Way Table Is there evidence to indicate that diet and the presence/absence of cancer are independent? The p value of.005 is less than.05 thus there is evidence of a relationship between diet and presence/absence of cancer. X^2 = df = (r-1)*(c-1)3.0 P-value =0.005

Reporting  2 Tests Results Step 1: Write a sentence that clearly indicates what statistical analysis you used A chi square test was conducted to determine whether [state your research question] A chi square test was conducted to determine whether people’s diets were related to the presence or absence of cancerous tumors.

Reporting  2 Tests Results Step 2: Write a sentence that clearly indicates what pattern you saw in your data analysis If a significant difference was obtained, say there was a relation: The presence and absence of cancerous tumors was found to be related to different type of diets. -or- People who ate high fat/low fiber diets had a higher incidence of tumors compared to the other types of diets, whereas people who ate low fat/high fiber diets had a lower incidence of tumors If no significant difference was obtained, say there was no relation: The presence or absence of cancerous tumors was not related to different types of diet.

Reporting  2 Tests Results Step 3: Tack the inferential statistics onto the end Put the chi square test results at the end of the sentence using this format:  2 (df) = x.xx, p =.xx There was a relationship found between people’s diet (high/low fiber, high/low fat) and the presence or absence of tumors,  2 (3) = 12.90, p =.005. The presence or absence of cancerous tumors was not related to different types of diet,  2 (3) = 1.29, p =.25. Step 4: Make a table of observed frequencies

Issues to Consider Check if assumptions are met. For chi square, all expected counts should be at least 5 Note: The expected counts are given in the Excel output. The p-value does not give any information about the strength of the relationship only whether there is evidence of a relationship. You can compute the effect size for this by hand using the chi square value from the Excel output.

Effect Size for 2x2 Contingency Table The effect size is a measure of the strength of the relationship Phi Coefficient: N = total number of observations Conventions for effect size [Cohen].10 small effect.30 medium effect.50 large effect

Effect Size for Bigger Contingency Table Contingency Coefficient: N = total number of observations

Reporting  2 Tests Results Step 5: OPTIONAL: Report the Effect Size if Chi Square was Significant People who ate had high fat/low fiber diets had a higher incidence of tumors compared to the other types of diets, whereas people who ate low fat/high fiber diets had a lower incidence of tumors,  2 (3) = 12.90, p =.005. This effect size was medium,  =.31.

Time to Practice Running and reporting chi square tests