Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1.

Slides:



Advertisements
Similar presentations
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Advertisements

1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Inference about the Difference Between the
Statistical Inference for Frequency Data Chapter 16.
© 2002 Prentice-Hall, Inc.Chap 10-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 10 Tests for Two or More Samples with Categorical.
Discrete (Categorical) Data Analysis
1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter Goals After completing this chapter, you should be able to:
Chapter 16 Chi Squared Tests.
Lecture 9: One Way ANOVA Between Subjects
CHAPTER 11: CHI-SQUARE TESTS.
Business 205. Review Correlation MS5 Preview Chi-Square.
Previous Lecture: Analysis of Variance
Inferences About Process Quality
Chi-Square Tests and the F-Distribution
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Chapter 12 Analysis of Variance. An Overview We know how to test a hypothesis about two population means, but what if we have more than two? Example:
Two Sample t test Chapter 9.
1 Nonparametric Statistical Techniques Chapter 17.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
CHI SQUARE TESTS.
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1/71 Statistics Tests of Goodness of Fit and Independence.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Chapter Outline Goodness of Fit test Test of Independence.
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 11 Chi-Square Tests.
John Loucks St. Edward’s University . SLIDES . BY.
Comparing Populations
Contingency Tables: Independence and Homogeneity
Chapter 11 Chi-Square Tests.
Chapter Outline Goodness of Fit test Test of Independence.
Chapter 11 Chi-Square Tests.
Presentation transcript:

Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1

2 Estimation Estimate population means Estimate population proportion Estimate population variance Hypothesis testing Testing population means Testing categorical data / proportion Testing population variances Hypothesis about many population means One-way ANOVA Two-way ANOVA

3 Test the interested proportion in population e.g. Proportion of defect in production Proportion of people travelling by Skytrain in BKK Test if the proportion in population is as expected using data collected from sample

4 Binomial proportion Single population Two populations Multiple groups proportion : Chi-square Test Single population Test of homogeneity Goodness of fit test Two Populations Test of Homogeneity Test of Independence

5 The sample is categorized into two groups. Single population Determining if the proportion in one of two categories is different from a specified proportion Two populations Compare the difference between the proportions of two populations Steps are similar to testing of population mean Assumptions Normal distribution of population proportion (or proportion difference in the case of two populations) Number of sample (of both populations in the case of two populations) is sufficiently large (n ≥ 30)

6

7 A plastic product factory take a sample of 400 plastic containers, 12 of which are defective. From this data, test if the proportion of defect is more than 2% at significant level Hypotheses α = 0.05

8 Calculate test statistic z-score from table: z 0.05 = The calculated z-score is < 1.645, not falling in right- tailed critical region. Accept H 0 and reject H 1 The proportion of defect is not more than 2% at significant level 0.05

9

10 From the observation of students wearing and not wearing safety helmet when riding motorcycles, among 500 sample students, 75 students wear helmet. Can a conclusion be drawn that the proportion of students wearing helmet is less than 20% at significant level 0.01? Hypothesis H 0 : P ≥ 0.2 H 1 : P < 0.2 α = 0.01

11 Calculate test statistic z-score from table: z 0.01 = The calculated z-score is < , falling in left- tailed critical region. Reject H 0 and accept H 1 The proportion of students wearing helmet is less than 20% at significant level 0.01

12 A shampoo company expects that after advertising the new product for 2 months, the product will be popular among 60% of consumers. Thus, after the advertisement period, 300 bottles are given out to 300 sample consumers, 220 of which respond positively. Test if the assumption is true at significant level 0.05.

13 From a sample of 90 students, 28 students have private cars. Test if the proportion of the students having private cars is more than 25% at significant level 0.05.

14

15

16 If d 0 = 0 e.g. H 0 : P 1 = P 2 or P 1 – P 2 = 0 Pooled estimated proportion x 1, x 2 : numbers of interested in the first and second samples n 1, n 2 : sizes of the first and second samples Additional assumption

17 If d 0 = 0 e.g. H 0 : P 1 = P 2 or P 1 – P 2 = 0 Estimated variance of proportion difference Z-score calculation adjusted to

18 From a survey, among 100 IT students, 70 have a smart phone. And among 150 art students, 72 have a smart phone. Test if the proportion of IT students who have a smart phone is more than 10% higher than that of art students at significant level Hypothesis α = 0.05

19 Calculate test statistic z-score from table: z 0.05 = The calculated z-score is > 1.645, falling in right-tailed critical region. Reject H 0 and accept H 1 The proportion of IT students who have a smart phone is 10% higher than that of art students at significant level 0.05

20 From a survey, among 200 university students, 120 have a notebook computer. And among 500 high school students, 240 have a notebook computer. Test if the proportion of university students who have a notebook computer is higher than that of high school students at significant level Hypothesis *d 0 = 0 α = 0.025

21 Calculate pooled estimated proportion n 1 p = 200*0.51 = 102 n 1 q = 200*0.49 = 98 n 2 p = 500*0.51 = 255 n 2 q = 500*0.49 = 245 Calculate test statistic

22 z-score from table: z = 1.96 The calculated z-score is 2.9 > 1.96, falling in right-tailed critical region. Reject H 0 and accept H 1 The proportion of university students who have a notebook computer is higher than that of high school students at significant level 0.025

23 From the previous observation of students wearing and not wearing safety helmet when riding motorcycles, the 500 sample students are grouped by gender as shown in the table. Can a conclusion be drawn that the proportion of female students wearing helmet is higher than the male counterpart at significant level 0.05? MaleFemale Wearing helmet4035 Not wearing helmet Total300200

24 Hypothesis H 0 : P f ≤ P m H 1 : P f > P m

25 Calculate test statistic z-score from table: z 0.05 = The calculated z-score is 1.28 < 1.645, not falling in right- tailed critical region. Accept H 0 and reject H 1 The proportion of female students wearing helmet is not higher than male at significant level 0.05

26

27 According to a polio vaccination program in a school, 16 out of 100 vaccinated female students are infected, and 20 out of 200 vaccinated male students are infected. Test if the proportion of the infected female students is 5% higher than the proportion of the infected male students at significant level 0.10.

28 Categorical data cannot be measured in terms of number but can be grouped e.g. 5-rating scale, religion, occupation, and gender The data of each group is then frequency, which can be tested using Chi-square test (χ 2 ) Determine if the observed proportion of groups is different from a specified expected ratio

29 Assumptions Sample size must be sufficiently large: 4-5 times the number of groups The frequency of each group must not be less than 5. If exist, combine that group with an adjacent group (reducing degree of freedom) Cannot be applied to repeated measures design Measuring the same sample after a time period e.g. measuring the effect of a drug after the 1 st, 2 nd, and 3 rd hour. Measuring the same variable after changing treatment e.g. measuring blood pressure of the same sample after administering different drug dosages.

30 If the sample contain 2 groups (degree of freedom = 1) and total frequency is less than 50, Frank Yate suggested using Corrected Chi-square *If the total frequency is 50 or more, no need to use Corrected Chi-square But we leave this matter here

31 Single variable Test of homogeneity Goodness of fit test Two variables Test of Homogeneity Test of Independence df = k-1-m

32 Used to determine whether the proportion of two or more groups in a population is similar Hypothesis O i : observed frequency in each group E i : expected frequency in each group k: number of groups

33 Reject H 0 when the calculated from table Rejection region Acceptance region

34 In the teaching evaluation of a course, from the total of 200 students, 72 are very satisfied, 60 are satisfied, 22 are indifferent and 46 are unsatisfied. Is the proportion of the satisfaction levels similar at significant level 0.01? Hypothesis H 0 : Frequency of each satisfaction level is not different H 1 : Frequency of each satisfaction level is different

35 Calculate test statistic Frequency Level O E(O-E)(O-E) 2 (O-E) 2 /E Very satisfied Satisfied Indifferent Unsatisfied Total

36 Critical Chi-square Degree of freedom = k - 1 = 4 – 1 = 3 The calculated Chi-square is > falling in critical region Reject H 0 and accept H 1 The proportion of the satisfaction levels is not similar at significant level 0.01

37

38 A coffee bean reseller assumes that the sale proportion of 4 types of coffee beans are equal. 500 customers are sampled and the number of sale of each type of coffee bean is shown in the table. Test if the assumption is true at significant level TypeSale count ABCDABCD

39 Used to determine whether the proportion of two or more groups in a population fits a specified proportion Hypothesis O i : observed frequency in each group E i : expected frequency in each group E i = np i ; n = total freq, p = probability of distribution of the group k: number of groups m: number of parameters to be estimated (we only study non-parametric chi-square so ignore this) df = k-1-m

40 A financial institute studies history of loan clients. It is found that 80% of the clients can return their loan in 1 year, 10% in 2 years, 6% in 3 years, and 4% in over 3 years. To assess the current situation, 400 recent loan clients are sampled, 287 of which can return their loan in 1 year, 49 in 2 years, 30 in 3 years, and 34 in over 3 years. Test if the clients’ ability to return loans changes.

41 Hypothesis H 0 : p 1 :p 2 :p 3 :p 4 = 0.8: 0.1: 0.06: 0.04 H 1 : p 1 :p 2 :p 3 :p 4 ≠ 0.8: 0.1: 0.06: 0.04 OR H 0 : Clients’ ability to return loan does not change H 1 : Clients’ ability to return loan changes α = 0.05 Calculate test statistic

42 Degree of freedom = 4-1 = 3 The calculated Chi-square is > 7.81 falling in critical region Reject H 0 and accept H 1 Clients’ ability to return loan changes at significant level 0.05 TimeOiOi PiPi E i = np i O i – E i (O i - E i ) 2 (O i - E i ) 2 /E i 1 year 2 years 3 years > 3 years , Total

43

44 In an exam of a sale training program with 150 participant, the manager expects that the proportion of the results, which is categorized in 3 groups: very good, good, and fair, will be 2:1:2. After the exam, the actual frequency in the 3 groups are 70, 30, and 50 participants respectively. Are the actual and the expected proportions different at significant level 0.05?

45 Test of Homogeneity Used to determine whether the proportions of groups in a variable is similar when grouped by another variable Two or more groups in each variable H 0 : p 1 = p 2 = p 3 = … = p n H 1 : p 1 ≠ p 2 ≠ p 3 ≠ … ≠ p n E.g. proportion of occupations between three countries

46 Test of Independence Used to determine whether the effects of one variable depend on the value of another variable (2 variables) H 0 : Variable x and variable y are independent of each other (are not related) H 1 : Variable x and variable y are dependent of each other (are related)

47 Data is grouped in rows and columns of two-way table Country Occupation Sum ResearcherBusinessProgrammer ThailandO 11 O 12 O 13 R1R1 USAO 21 O 22 O 23 R2R2 AustraliaO 31 O 32 O 33 R3R3 SumC1C1 C2C2 C3C3 N

48 r: number of rows c: number of columns O ij : observed frequency of row i column j E ij : expected frequency of row i column j

49

50 Reject H 0 when the calculated from table Rejection region Acceptance region

51 According to a survey of 1200 sample individuals grouped by four occupations, the number of smokers and non-smokers are listed in the table. Test if the proportion in each occupation is different. OccupationNon-smokerSmokerFreq. Engineer Educator Accountant Scientist Total

52 Hypothesis H 0 : p 1 = p 2 = p 3 = p 4 H 1 : p 1 ≠ p 2 ≠ p 3 ≠ p 4 α = 0.05

53 Calculated expected frequencies E 11 = (300*233)/1200 = E 12 = (300*967)/1200 = E 21 = (250*233)/1200 = E 22 = (250*967)/1200 = E 31 = (300*233)/1200 = E 32 = (300*967)/1200 = E 41 = (350*233)/1200 = E 42 = (350*967)/1200 = Occup.Non- smoker SmokerFreq. Engineer Educator Account Scientist Total

54 Calculate test statistic Row- Column O ij E ij O ij – E ij (O ij - E ij ) 2 (O ij - E ij ) 2 /E ij Total

55 Degree of freedom = (r-1)(c-1) = 3*1 = 3 The calculated Chi-square is > 7.81 falling in critical region Reject H 0 and accept H 1 The proportion between smokers and non-smokers in each occupation is different at significant level 0.05

56 To test if the achievement score of a training program is related to the achievement score of the actual operation at significant level 0.01, 400 employees are sampled. The scores are listed in the table. Operation score Training score Total Below Average Average Above Average Fair Good Very good Total

57 Hypothesis H 0 : score of the training program and the score of the actual operation are not related H 1 : score of the training program and the score of the actual operation are related α = 0.01

58 Calculated test statistic = Row- Column O ij E ij =r i c j /NO ij – E ij (O ij - E ij ) 2 (O ij - E ij ) 2 /E ij *60/400= *188/400= *152/400= *60/400= *188/400= *152/400= *60/400= *188/400= *152/400= Total

59 Degree of freedom = (r-1)(c-1) = 2*2 = 4 The calculated Chi-square is > falling in critical region Reject H 0 and accept H 1 The score of the training program and the score of the actual operation are related (or are dependent on each other) at significant level 0.01

60

61 A factory manager believes that the efficiency of workers depends on how long they have worked in the factory. To test this belief, 100 sample products are inspected. The quality of the sample are listed in table. Test the belief at significant level Product QualityEmployee Experience (year)Total 12 – 56 – 10 Good Minor damaged Major damaged Total

62 A toothpaste company wants to know if the color of the toothpaste is related to the gender of buyers. Sample of 500 male and 500 female are randomly selected to examine their favored toothpaste color. Test if the color of the toothpaste is related to the gender at significant level Gender Color Total WhiteOther Male Female Total

63 Test of Homogeneity and Test of Independence use the same calculation Test of Homogeneity tells if the proportion is the same H 0 : Proportion is similar for all groups H 1 : Proportion not similar for some/all groups Test of Independence tells if two variables are dependent H 0 : Two variables are independent H 1 : Two variables are dependent

64 Consider this The proportion of selected major is the same for any gender That means no matter the gender, the proportions remain the same That means gender has no effect of selection of major and therefore the two are independent