Test of Independence.

Slides:



Advertisements
Similar presentations
Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
Advertisements

For small degrees of freedom, the curve displays right-skewness.
Chapter 13: Inference for Distributions of Categorical Data
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square Test of Independence
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Presentation 12 Chi-Square test.
How Can We Test whether Categorical Variables are Independent?
AM Recitation 2/10/11.
GOODNESS OF FIT TEST & CONTINGENCY TABLE
Hypothesis testing – mean differences between populations
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chi-square test or c2 test
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Chapter-8 Chi-square test. Ⅰ The mathematical properties of chi-square distribution  Types of chi-square tests  Chi-square test  Chi-square distribution.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
+ Chi Square Test Homogeneity or Independence( Association)
Chi-square Test of Independence
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Outline Goodness of Fit test Test of Independence.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chi-square Test of Independence. The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Hypothesis Testing Review
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Contingency Tables: Independence and Homogeneity
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Test of Independence

Parametric test In earlier lectures, we discussed the procedures of drawing conclusions about population parameters on the basis of sample information. It requires certain assumptions about the population from which the sample is drawn. Application of t-test for small samples requires that the parent population is normally distributed. Second, the hypothesis is formulated by specifying a particular value for the parameter. When it is not possible to make any assumption about the value of a parameter the test procedure and the population does not follow normal distribution , we use non-parametric tests.

Non-parametric test In many cases, particularly for nominal variables, we do not have parameters. For a categorical data, the variable (or attribute) assumes values pertaining to a finite number of categories. For instance, a gender variable has only two categories: male and female. We can count the number of observations in each category. Drawing inferences in the case of nominal data, chi-square test statistic is used for independence test between two categorical variables.

Contingency table A contingency table is an arrangement of data in a two-way classification. The data are sorted into cells, and the count for each cell is reported. The contingency table involves two factors (or variables), and a common question concerning such tables is whether the data indicate that the two variables are independent or dependent. To illustrate a test of independence, let’s consider a random sample that shows the gender of liberal arts college students and their favorite academic area.

Looking at the bar chart, it appears that there are differences in the proportion of females’ and males’ preference for each subject. We need a statistical test to verify our visual impression, that is there is a statistical dependence between them.

A group of 300 were randomly selected and asked whether he or she prefers taking liberal arts courses in the area of math–science, social science, or humanities. Does this sample present sufficient evidence to reject the null hypothesis “Preference for math–science, social science, or humanities is independent of the gender of a college student”? (In layman term, it implies that the preference for subject area is not related to gender).

Parameter of interest: Determining the independence of the variables “gender” and “favorite subject area” requires us to discuss the probability of the various cases and the effect that answers about one variable have on the probability of answers about the other variable. Two random variables X and Y are called independent if the probability distribution of one variable is not affected by the presence of another. As defined in conditional probability, independence requires P(MS | M) = P(MS | F) = P(MS) ; that is, gender has no effect on the probability of a person’s choice of subject area. If both events are independent, then P(AB)= P(A) * P(B)

Steps in hypothesis testing Ho: Preference for math–science, social science, or humanities is independent of the gender of a college student. Ha: Subject area preference is not independent of the gender of the student. Test statistic - 2 chi-square = where O= observed frequency and E = expected frequency If the observed counts are “close enough” to the expected counts, then the sum of all of their deviations 2  should not be much bigger than 0.  Otherwise, would be very big, suggesting that the original hypothesis of independence between the 2 random variables is not valid.

E is the expected count if X and Y are independent E is the expected count if X and Y are independent. Since the null hypothesis asserts that these factors are independent, we would expect the values to be distributed in proportion to the marginal totals. The expected value is calculated on the assumption that H0 is true, i.e. gender and subject choice is independent. There are 122 males; we would expect them to be distributed among MS, SS, and H proportionally to the 72, 113, and 115 totals. Thus, the expected cell counts for males are Would expect for the females

The calculated chi-square is Typically, the contingency table is written so that it contains all this information. The calculated chi-square is = 2.035 + 0.533 + 0.164 + 1.395 + 0.365 + 0.112 = 4.604

2 distribution table and degree of freedom Like the t distribution, the chi-square distribution has (C-1)(R-1) degree of freedom where C= number of column and R= number of row shown in the two-way table. In the displayed table, there are 3 columns and 2 rows. So the degree of freedom is (3-1)(2-1)=2.

A chi-square statistic was computed for a two-way table having 4 degrees of freedom. The value of the statistic was 9.49. What is the p-value? 0.005 B. 0.01 C. 0.05 A chi-square statistic was computed for a two-way table having 20 degrees of freedom. The value of the statistic was 29.69. What is the p-value? 0.025 B. 0.05 C. 0.075

Decision making Given the level of significance is 5% and df=2, the critical value is 5.99. The computed 2 is 4.604 and is smaller than the critical value. As it lies inside the non-rejection region, we fail to reject the null hypothesis at the 5% significance level. Therefore, we conclude that the preference for academic subject areas does not depend on the sex of the respondents. Alternatively, we say – sex of the students has no significant influence (relationship) on the preference for subject areas. P-value approach: The null hypothesis of the independence assumption is to be rejected if the p-value of the Chi-squared test statistics is less than a given significance level α. Given the 2 test statistic 4.604, the corresponding p-value at d.f. 2 is around 10%. If we set α=0.05, we cannot reject the null hypothesis.

A researcher is interested to know ‘with whom it is easiest to make friends?’. 205 first year students from the FSS were randomly selected to participate in the interview. Expected values: Do we have evidence to support that it is easier for men and women to make friends with same sex or opposite sex at 5% significance level? Opposite sex Same sex No difference Total Female 58 (48.79) 16 (19.38) 63 (68.83) 137 Male 15 (24.21) 13 (9.62) 40 (34.17) 68 73 29 103 205

Ho: There is no relationship between sex of students and the perception to make friends in the population. Ha: There is a relationship between sex of students and the perception to make friends in the population. Test statistic = 8.515 Assume the significance level is 5%, at df=((3-1)(2-1)=2, the critical value is 5.99. As 2 test statistic (8.515) is larger than the critical value, we reject the null hypothesis and conclude that there is a relationship between the respondents’ sex and response to the question asked in the population.

Gender and opinion have a statistically significant relationship Suppose that a two-way table displaying sample information about gender and opinion about the legalization of marijuana (yes or no) is examined using a chi-square test. The necessary conditions are met and the chi-square value is calculated to be 15. What conclusion can be made? Gender and opinion have a statistically significant relationship Gender and opinion do not have a statistically significant relationship It is impossible to make a conclusion because we don’t know the sample size. It is impossible to make a conclusion because we don’t know the degrees of freedom.

The value of the χ2-test statistic is 5. 33 The value of the χ2-test statistic is 5.33. Are the results statistically significant at the 5% significance level if degree of freedom is 2? Yes, because 5.33 is greater than the critical value of 3.84. Yes, because 5.33 is greater than the critical value of 4.01. No, because 5.33 is smaller than the critical value of 5.99. No, because 5.33 is smaller than the critical value of 11.07.

Suppose a random sample of 650 of the 1 million residents of a city is taken, in which every resident of each of four neighborhoods, A, B, C, and D, is equally likely to be chosen. A null hypothesis says the randomly chosen person's neighborhood of residence is independent of the person's occupational classification, which is either "blue collar", "white collar", or "service". Assume = 5%. A B C D Total Blue collar 90 60 104 95 349 White 30 50 51 20 151 Service 40 45 35 150 200 650

Based on the following table, use the chi-square test to determine whether smoking habit and exercise level of students is independent at the 5% significance level. Smoking habit Exercise Frequently Never Some Heavy 7 1 3 87 18 84 Occasionally 12 4 Regularly 9

Exercise Which of the following relationships could be analyzed using a chi-square test? The relationship between height (inches) and weight (pounds). The relationship between satisfaction with K-12 schools (satisfied or not) and political party affiliation. The relationship between gender and amount willing to spend on a stereo system (in dollars). The relationship between opinion on gun control and income earned last year (in thousands of dollars).

A student survey was done to study the relationship between where students live (dormitory, apartment, house, co-op, or parent’s home) and how they usually get to campus (walking, bus, bicycle, car, or subway). What are the degrees of freedom for the chi-square statistic? A. 5 B. 16 C. 20 D. 25 A chi-square test involves a set of counts called “expected counts.” What are the expected counts? Hypothetical counts that would occur if the alternative hypothesis were true. Hypothetical counts that would occur if the null hypothesis were true. The actual counts that did occur in the observed data. The long-run counts that would be expected if the observed counts are representative.

In the General Social Survey, respondents were asked what they thought was most important to get ahead: hard work, lucky breaks, or both. What is the null hypothesis for this situation? There is a relationship between gender and opinion on what is important to get ahead in the sample. There is no relationship between gender and opinion on what is important to get ahead in the sample. There is a relationship between gender and opinion on what is important to get ahead in the population. There is no relationship between gender and opinion on what is important to get ahead in the population. What is the alternative hypothesis?

What are the degrees of freedom for this testing? What is the value of the test statistic? What are the degrees of freedom for this testing? At a significance level of 0.05, what is the conclusion?

A researcher conducted a study on college students to see if there was a link between gender and how often they have cheated on an exam. She asked two questions on a survey: (1) What is your gender? Male ___ Female ___ (2) How many times have you cheated on an exam while in college? Never __ 1 or 2 times ___ 3 or more times ___ what is the appropriate null hypothesis? What are the degrees of freedom for the test statistic?

Using SPSS for 2 test Analyse → descriptive → crosstab→ drag a variable into column and a variable to row → statistic choose chi-square, → cell by default the system gives observed count, choose expected count and see the result. But the expected value is normally not reported. Choose either column or row percentage. This is to report the descriptive statistics in the write up.