The 2 (chi-squared) test for independence

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Hypothesis Testing and Comparing Two Proportions Hypothesis Testing: Deciding whether your data shows a “real” effect, or could have happened by chance.
Chi Square Example A researcher wants to determine if there is a relationship between gender and the type of training received. The gender question is.
Chi-Square Test.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Material Taken From: Mathematics for the international student Mathematical Studies SL Mal Coad, Glen Whiffen, John Owen, Robert Haese, Sandra Haese and.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Two Variable Statistics
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Chi-Square Test.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chi Square Classifying yourself as studious or not. YesNoTotal Are they significantly different? YesNoTotal Read ahead Yes.
Non-parametric tests (chi-square test) Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Chi Squared Test for Independence. Hypothesis Testing Null Hypothesis, – States that there is no significant difference between two (population) parameters.
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Chi-Square Test (χ 2 ) χ – greek symbol “chi”. Chi-Square Test (χ 2 ) When is the Chi-Square Test used? The chi-square test is used to determine whether.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Material Taken From: Mathematics for the international student Mathematical Studies SL Mal Coad, Glen Whiffen, John Owen, Robert Haese, Sandra Haese and.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
The  2 (chi-squared) test for independence. One way of finding out is to perform a  2 (chi-squared) test for independence. We might want to find out.
Basic Statistics The Chi Square Test of Independence.
Chi-Square Test.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Test of independence: Contingency Table
Chapter 9: Non-parametric Tests
10 Chapter Chi-Square Tests and the F-Distribution Chapter 10
The Chi-Squared Test Learning outcomes
Chapter Fifteen McGraw-Hill/Irwin
Testing a Claim About a Mean:  Not Known
Hypothesis Testing Review
Chi-squared Distribution
Qualitative data – tests of association
1) A bicycle safety organization claims that fatal bicycle accidents are uniformly distributed throughout the week. The table shows the day of the week.
Chi-Square Test.
The Chi-Square Distribution and Test for Independence
Is a persons’ size related to if they were bullied
Consider this table: The Χ2 Test of Independence
Testing for Independence
Chi-Square Test.
Is a persons’ size related to if they were bullied
Hypothesis Testing and Comparing Two Proportions
Contingency Tables: Independence and Homogeneity
Statistical Analysis Chi-Square.
Chi-Square Test For nominal/qualitative data
Chi-Square Test.
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Lecture 42 Section 14.4 Wed, Apr 17, 2007
Lecture 37 Section 14.4 Wed, Nov 29, 2006
Lecture 38 Section 14.5 Mon, Dec 4, 2006
CHI SQUARE TEST OF INDEPENDENCE
Lecture 43 Sections 14.4 – 14.5 Mon, Nov 26, 2007
11E The Chi-Square Test of Independence
Chapter 26 Comparing Counts.
Inference for Two Way Tables
Looks at differences in frequencies between groups
Graphs and Chi Square.
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Lecture 46 Section 14.5 Wed, Apr 13, 2005
Lecture 43 Section 14.1 – 14.3 Mon, Nov 28, 2005
CHI SQUARE (χ2) Dangerous Curves Ahead!.
What is Chi-Square and its used in Hypothesis? Kinza malik 1.
Presentation transcript:

The 2 (chi-squared) test for independence IB Math Studies SL West Hall High School

A random sample of 200 teachers in higher education, secondary schools and primary schools gave the following numbers of men and women in each sector: Higher Education Secondary Education Primary Education Male 21 39 20 Female 13 55 52 We might want to find out whether or not there is an association between ‘age-group taught’ and ‘gender’. One way of finding out is to perform a 2 (chi-squared) test for independence. To set up the test: We first set up a null hypothesis, H0, and an alternative hypothesis, H1. H0 always states that the data sets are independent, and H1 always states that they are related. In this case, H0 could be “The age-group taught is independent of gender”. H1 could be “There is an association between age-group taught and gender.”

We put the data into a table We put the data into a table . The elements in the table are our observed data and the table is known as a contingency table. Higher Education Secondary Education Primary Education Male 21 39 20 Female 13 55 52

We put the data into a table We put the data into a table . The elements in the table are our observed data and the table is known as a contingency table. Higher Education Secondary Education Primary Education TOTAL Male 21 39 20 Female 13 55 52 80 120 34 94 72 200

From the observed data we can calculate the expected frequencies. We put the data into tables. The elements in the table are our observed data and the table is known as a contingency table. Higher Education Secondary Education Primary Education TOTAL Male 21 39 20 80 Female 13 55 52 120 34 94 72 200 From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 34 94 72 200 13.6 37.6

This gives us the degree of freedom for this table - it is 2 In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 2 The expected frequency for each cell will be: row total x column total total sample size Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 34 94 72 200 13.6 37.6 28.8 20.4 56.4 43.2

In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) Higher Education Secondary Education Primary Education Male 21 39 20 Female 13 55 52 df = 2 Higher Education Secondary Education Primary Education TOTAL Male 80 Female 120 34 94 72 200 13.6 37.6 28.8 20.4 56.4 43.2

2calc fo is the observed value fe is the expected value 2calc Contingency Table – Observed Data Expected Frequencies Higher Education Secondary Education Primary Education Male 21 39 20 Female 13 55 52 Higher Education Secondary Education Primary Education Male 13.6 37.6 28.8 Female 20.4 56.4 43.2 Now we are ready to calculate the 2 value using the formula: 2calc fo is the observed value fe is the expected value 2calc Finally look at the critical value that you have been given. If the 2 calc value is less than the critical value, we accept H0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H1 In this case the 2 calc value is 11.3, and the critical value at 5% is 5.991. So we do not accept H0, the null hypothesis. There is an association between age-group taught and gender.

If the 2 calc value is less than the critical value, we do accept H0, the null hypothesis. (The 2 calc value is small – there is nothing of significance going on!) If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H1 (The 2 calc value is large – there is something of significance going on!) If the p-value is less than the significance level, we do not accept H0, the null hypothesis. We accept H1 (The probability of this happening just by chance is small – there is probably something of significance going on!) If the p-value is more than the significance level, we do accept the null hypothesis, so we accept H0 (The probability of this happening just by chance is large – there is probably nothing of significance going on!)

2 is given to you. p is the probability df is the degree of freedom You can do all this on the GDC: Enter the data into a Matrix MATRIX ENTER [EDIT] Enter the size of your matrix ; in this case 2 x 3 (2 rows, 3 columns) Enter your data, pressing after every value. ENTER STAT [TESTS] Scroll up to find 2 ENTER You will now see where your table of expected values will be ; change it if you wish. Otherwise scroll down to Calculate and ENTER 2 is given to you. p is the probability df is the degree of freedom To see the table of expected values: ENTER MATRIX Finally look at the critical value that you have been given. If the 2 calc value is less than the critical value, we accept the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H1

Suppose we collect data on the favourite colour of car for men and women. Black White Red Blue Male 51 22 33 24 Female 45 36 27 We may want to find out whether favourite colour of car and gender are independent or related. One way of finding out is to perform a 2 (chi-squared) test for independence. To set up the test: We first set up a null hypothesis, H0, and an alternative hypothesis, H1. H0 always states that the data sets are independent, and H1 always states that they are related. In this case, H0 could be “The favourite colour of car is independent of gender”. H1 could be “There is an association between favourite colour of car and gender.”

Black White Red Blue TOTAL Male 51 22 33 24 130 Female 45 36 27 130 96 58 55 51 260

From the observed data we can calculate the expected frequencies. Black White Red Blue TOTAL Male 51 22 33 24 130 Female 45 36 27 96 58 55 260 From the observed data we can calculate the expected frequencies. The expected frequency for each cell will be: row total x column total total sample size Black White Red Blue TOTAL Male 130 Female 96 58 55 51 260 48 29 27.5

This gives us the degree of freedom for this table - it is 3 In fact for this table we only need to actually work out three of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 3 The expected frequency for each cell will be: row total x column total total sample size Black White Red Blue TOTAL Male 48 29 27.5 130 Female 96 58 55 51 260 25.5 48 29 27.5 25.5

In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) Black White Red Blue Male 51 22 33 24 Female 45 36 27 df = 3 Black White Red Blue TOTAL Male 48 29 27.5 25.5 130 Female 96 58 55 51 260

2calc fo is the observed value fe is the expected value 2calc Contingency Table – Observed Data Expected Frequencies Black White Red Blue Male 51 22 33 24 Female 45 36 27 Black White Red Blue Male 48 29 27.5 25.5 Female Now we are ready to calculate the 2 value using the formula: fo is the observed value fe is the expected value 2calc 2calc Finally look at the critical value that you have been given. If the 2 calc value is less than the critical value, we accept H0, the null hypothesis. If the 2 calc value is more than the critical value, we do not accept the null hypothesis, so we accept H1 In this case the 2 calc value is 6.13, and the critical value at 5% is 7.815. So we do accept H0, the null hypothesis. There is no association between favourite colour of car and gender.

The entries in the contingency table must be frequencies. The expected frequencies must not be less than 1, and no more than 20% of the entries can be between 1 and 5. Otherwise the test is invalid.