1/71 Statistics Tests of Goodness of Fit and Independence.

Slides:



Advertisements
Similar presentations
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Advertisements

1 1 Slide © 2003 South-Western /Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Uji Kebaikan Suai (Uji Kecocokan) Pertemuan 23
Inference about the Difference Between the
Statistical Inference for Frequency Data Chapter 16.
1 1 Slide Mátgæði Kafli 11 í Newbold Snjólfur Ólafsson + Slides Prepared by John Loucks © 1999 ITP/South-Western College Publishing.
Chapter 13: Inference for Distributions of Categorical Data
Discrete (Categorical) Data Analysis
1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter Goals After completing this chapter, you should be able to:
1 Pertemuan 09 Pengujian Hipotesis Proporsi dan Data Katagorik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Chapter 16 Chi Squared Tests.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.
1. State the null and alternative hypotheses. 2. Select a random sample and record observed frequency f i for the i th category ( k categories) Compute.
Chi-Square Tests and the F-Distribution
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
GOODNESS OF FIT TEST & CONTINGENCY TABLE
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter Outline Goodness of Fit test Test of Independence.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 3 月 23 日 第六週:配適度與獨立性檢定.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Pertemuan 24 Uji Kebaikan Suai Matakuliah: I0134 – Metoda Statistika Tahun: 2005 Versi: Revisi.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
1. State the null and alternative hypotheses. 2. Select a random sample and record observed frequency f i for the i th category ( k categories) Compute.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Test of independence: Contingency Table
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
St. Edward’s University
10 Chapter Chi-Square Tests and the F-Distribution Chapter 10
CHAPTER 11 CHI-SQUARE TESTS
John Loucks St. Edward’s University . SLIDES . BY.
Statistics for Business and Economics (13e)
Econ 3790: Business and Economics Statistics
CHI SQUARE TEST OF INDEPENDENCE
Hypothesis Tests for a Standard Deviation
Chapter Outline Goodness of Fit test Test of Independence.
St. Edward’s University
Presentation transcript:

1/71 Statistics Tests of Goodness of Fit and Independence

2/71 Contents STATISTICS in PRACTICE Goodness of Fit Test: A Multinomial Population Test of Independence Goodness of Fit Test: Poisson and Normal Distributions

3/71 STATISTICS in PRACTICE United Way of Greater Rochester is a nonprofit organization dedicated to improving the quality of life for people. Because of enormous volunteer involvement, United Way of Greater Rochester is able to hold its operating costs at just eight cents of every dollar raised.

4/71 STATISTICS in PRACTICE One of statistical tests was to determine whether perceptions of administrative expenses were independent of occupation. In this chapter, you will learn how a statistical test of independence.

5/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Consider the case that each element of a population is assigned to one and only one of several classes or categories. Such a population is a multinomial population( 多項母體 ). We want to test if the population follows a multinominal distribution with specified probabilities for each of the k categories.

6/71 Hypothesis Test for Proportions of a Multinomial Population-- Procedures 1.Set up the null and alternative hypotheses. 2.Select a random sample and record the observed frequency, f i, for each of the k categories. 3.Assuming H 0 is true, compute the expected frequency, e i, in each category by multiplying the category probability by the sample size.

7/71 4. Compute the value of the test statistic. f i = observed frequency for category i e i = expected frequency for category i k = number of categories where: Hypothesis Test for Proportions of a Multinomial Population-- Procedures Note: The test statistic has a chi-square distribution with (k – 1) df provided that the expected frequencies are 5 or more for all categories.

8/71 where  is the significance level and there are k - 1 degrees of freedom p-value approach : Critical value approach: Reject H 0 if p -value <  5. Rejection rule: Reject H 0 if Hypothesis Test for Proportions of a Multinomial Population-- Procedures

9/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Example: The market share study being conducted by Scott Marketing Research. Over the past year market shares stabilized at 30% for company A, 50% for company B, and 20% for company C. Recently company C developed a “new and improved” product to replace its current entry in the market. Company C retained Scott Marketing Research to determine whether the new product will alter market shares.

10/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Scott Marketing Research conduct a sample survey and compute the proportion preferring each company’s product. A hypothesis test will then be conducted to see whether the new product caused a change in market shares. The null and alternative hypotheses are

11/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Perform a goodness of fit test that will determine whether the sample of 200 customer purchase preferences is consistent with the null hypothesis. Data

12/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Expected: the expected frequency for each category is found by multiplying the sample size of 200 by the hypothesized proportion for the category.

13/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Computation of the Chi-square Test Statistic for The Scott Marketing Research Market Share Study

14/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population p-Value Approach The test statistic χ 2 = 7.34 and < 7.34 < Thus, the corresponding upper tail area or p-value must be between.05 and.025. with p-value <.05, we reject H 0. Minitab or Excel can be used to show χ 2 = 7.34 provides a p-value =.0255.

15/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Critical Value Approach With α =.05 and 2 degrees of freedom, the critical value for the test statistic is = The upper tail rejection rule becomes Reject H 0 if χ 2 > With 7.34 > 5.991, we reject H 0. Comparisons of the observed and expected frequencies for the other two companies indicate that company C’s gain in market share will hurt company A more than company B.

16/71 Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population Note: The chi-square test is an approximate test and the test result may not be valid when the expected value for a category is less than 5. If one or more categories have expected values less than 5, you can combine them with adjacent categories to achieve the minimum required expected value.

17/71 Multinomial Distribution Goodness of Fit Test Example: Finger Lakes Homes (A) Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log cabin, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected.

18/71 Split- A- Split- A- Model Colonial Log Level Frame # Sold The number of homes sold of each model for 100 sales over the past two years is shown below. Multinomial Distribution Goodness of Fit Test Example: Finger Lakes Homes (A)

19/71 Hypotheses Multinomial Distribution Goodness of Fit Test where: p C = population proportion that purchase a colonial p L = population proportion that purchase a log cabin p S = population proportion that purchase a split-level p A = population proportion that purchase an A-frame H 0 : p C = p L = p S = p A =.25 H a : The population proportions are not p C =.25, p L =.25, p S =.25, and p A =.25 p C =.25, p L =.25, p S =.25, and p A =.25

20/71 Rejection Rule χ2χ2 χ2χ Do Not Reject H 0 Reject H 0 With  =.05 and k - 1 = = 3 k - 1 = = 3 degrees of freedom degrees of freedom if p-value Reject H 0 if p-value Multinomial Distribution Goodness of Fit Test

21/71 Expected Frequencies Test Statistic Multinomial Distribution Goodness of Fit Test e 1 =.25(100) = 25 e 2 =.25(100) = 25 e 3 =.25(100) = 25 e 4 =.25(100) = 25 e 3 =.25(100) = 25 e 4 =.25(100) = 25 = = 10

22/71 Multinomial Distribution Goodness of Fit Test Conclusion Using the p-Value Approach The p-value < . We can reject the null hypothesis. Because χ 2 = 10 is between and , the area in the upper tail of the distribution is between.025 and.01. Area in Upper Tail  2 Value (df = 3)

23/71 Conclusion Using the Critical Value Approach Multinomial Distribution Goodness of Fit Test We reject, at the.05 level of significance, the assumption that there is no home style preference. χ 2 = 10 > 7.815

24/71 Test of Independence: Contingency Tables( 列聯表) Another important application of the chi- square distribution involves using sample data to test for the independence of two variables. Contingency table is a table that lists all possible combinations of the two variables.

25/71 Test of Independence: Contingency Tables Example: A test of independence addresses the question of whether the beer preference (light, regular, or dark) is independent of the gender of the beer drinker (male, female). If the independence assumption is valid, we argue that the fraction of drinking light beer, regular beer and dark beer must be applicable to both male and female beer drinker.

26/71 Test of Independence: Contingency Tables 1.Set up the null and alternative hypotheses. 2.Select a random sample and record the observed frequency, f ij, for each cell of the contingency table. 3. Compute the expected frequency, e ij, for each cell.

27/71 5. Determine the rejection rule. Reject H 0 if p -value <  or. 4. Compute the test statistic. where  is the significance level and, with n rows and m columns, there are ( n - 1)( m - 1) degrees of freedom. Test of Independence: Contingency Tables

28/71 Test of Independence: Contingency Tables Example: A test of independence addresses the question of whether the beer preference (light, regular, or dark) is independent of the gender of the beer drinker (male, female). The hypotheses for this test of independence are: H 0 : Beer preference is independent of the gender of the beer drinker H a : Beer preference is not independent of the gender of the beer drinker

29/71 Sample Results Expected Frequencies Test of Independence: Contingency Tables 1/3 7/151/5 1 80*1/3=26.67

30/71 Test of Independence: Contingency Tables

31/71 Computation of the Chi-square Test Statistic p-value = At the.05 level of significance, p-value < α=.05. We reject the null hypothesis Test of Independence: Contingency Tables

32/71 Each home sold by Finger Lakes Homes can be classified according to price and to style. Finger Lakes’manager would like to determine if the price of the home and the style of the home are independent variables. Example: Finger Lakes Homes (B) Test of Independence: Contingency Tables

33/71 The number of homes sold for each model and price for the past two years is shown below. For convenience, the price of the home is listed as either $99,000 or less or more than $99,000. Price Colonial Log Split-Level A-Frame Price Colonial Log Split-Level A-Frame > $99, < $99, Example: Finger Lakes Homes (B) Test of Independence: Contingency Tables

34/71 Hypotheses H 0 : Price of the home is independent of the style of the home that is purchased style of the home that is purchased H a : Price of the home is not independent of the style of the home that is purchased style of the home that is purchased Test of Independence: Contingency Tables

35/71 Expected Frequencies Price Colonial Log Split-Level A-Frame Total Price Colonial Log Split-Level A-Frame Total < $99K > $99K Total Total Test of Independence: Contingency Tables

36/71 Contingency Table (Independence) Test Rejection Rule Reject H 0 if p-value Test Statistic With α =.05 and (2 - 1)(4 - 1) = 3 d.f., =7.815 = = 9.149

37/71 Conclusion Using the p-Value Approach The p-valu e < . We can reject the null hypothesis. Because χ 2 = is between and 9.348, the area in the upper tail of the distribution is between.05 and.025. Area in Upper Tail  2 Value (df = 3) Contingency Table (Independence) Test

38/71 Conclusion Using the Critical Value Approach We reject, at the.05 level of significance, We reject, at the.05 level of significance, the assumption that the price of the home is independent of the style of home that is purchased. χ 2 = > Contingency Table (Independence) Test

39/71 Goodness of Fit Test: Poisson Distribution In general, the goodness of fit test can be used with any hypothesized probability distribution. Here we will demonstrate the cases of Poisson distribution and Normal distribution.

40/71 Goodness of Fit Test: Poisson Distribution 1. Set up the null and alternative hypotheses. H 0 : Population has a Poisson probability distribution H a : Population does not have a Poisson distribution 2. Select a random sample and a. Record the observed frequency f i for each value of the Poisson random variable. b. Compute the mean number of occurrences .

41/71 Goodness of Fit Test: Poisson Distribution 3. Compute the expected frequency of occurrences e i for each value of the Poisson random variable. 4. Compute the value of the test statistic. f i = observed frequency for category i e i = expected frequency for category i k = number of categories where:

42/71 where  is the significance level and there are k - 2 degrees of freedom p-value approach: Critical value approach: Reject H 0 if p -value <  5. Rejection rule: Reject H 0 if Goodness of Fit Test: Poisson Distribution

43/71 Goodness of Fit Test: Poisson Distribution—Example Consider the arrival of customers at Dubek’s Food Market in Tallahassee, Florida. A statistical test conducted to see whether an assumption of a Poisson distribution for arrivals is reasonable.

44/71 Goodness of Fit Test: Poisson Distribution—Example Hypotheses are H 0 : The number of customers entering the store during 5-minute intervals has a Poisson probability distribution H a : The number of customers entering the store during 5-minute intervals does not have a Poisson distribution

45/71 Goodness of Fit Test: Poisson Distribution—Example The Poisson probability function, where --μ represents the expected number of customers arriving per 5-minute period, --x is the random variable indicating the number of customers arriving during a 5-minute period, and --f (x) is the probability that x customers will arrive in a 5-minute interval.

46/71 Goodness of Fit Test: Poisson Distribution—Example Sample Data μ known, an estimate of μ is 640/128 = 5 customers

47/71 Goodness of Fit Test: Poisson Distribution—Example Expected frequency

48/71 Goodness of Fit Test: Poisson Distribution—Example Computation of the Chi-square Test Statistic p-value = With p-value > α =.05, we cannot reject H 0.

49/71 Example: Troy Parking Garage In studying the need for an additional entrance to a city parking garage, a consultant has recommended an analysis approach that is applicable only in situations where the number of cars entering during a specified time period follows a Poisson distribution. Goodness of Fit Test: Poisson Distribution

50/71 A random sample of 100 one-minute time intervals resulted in the customer arrivals listed below. A statistical test must be conducted to see if the assumption of a Poisson distribution is reasonable. Goodness of Fit Test: Poisson Distribution # Arrivals Frequency

51/71 Hypotheses Goodness of Fit Test: Poisson Distribution H a : Number of cars entering the garage during a one-minute interval is not Poisson distributed H 0 : Number of cars entering the garage during a one-minute interval is Poisson distributed

52/71 Estimate of Poisson Probability Function Goodness of Fit Test: Poisson Distribution  otal Arrivals = 0(0) + 1(1) + 2(4) (1) = 600 Hence, Estimate of  = 600/100 = 6 Total Time Periods = 100

53/71 Expected Frequencies Goodness of Fit Test: Poisson Distribution x f ( x ) nf ( x ) Total x f ( x ) nf ( x )

54/71 Observed and Expected Frequencies Goodness of Fit Test: Poisson Distribution i f i e i f i - e i i f i e i f i - e i or 1 or or more

55/71 Test Statistic Goodness of Fit Test: Poisson Distribution Reject H 0 if p-value Rejection Rule With  =.05 and k - p - 1 = = 7 d.f. (where k = number of categories and p = number of population parameters estimated),

56/71 Conclusion Using the p-Value Approach The p-value > . We cannot reject the null hypothesis. There is no reason to doubt the assumption of a Poisson distribution. Because χ 2 = is between and in the Chi-Square Distribution Table, the area in the upper tail of the distribution is between.90 and.10. Area in Upper Tail  2 Value (df = 7) Goodness of Fit Test: Poisson Distribution

57/71 Goodness of Fit Test: Normal Distribution 1. Set up the null and alternative hypotheses. 3. Compute the expected frequency, e i, for each interval. 2. Select a random sample and a. Compute the mean and standard deviation. b. Define intervals of values so that the expected frequency is at least 5 for each interval. c. For each interval record the observed frequencies

58/71 4. Compute the value of the test statistic. Goodness of Fit Test: Normal Distribution 5. Reject H 0 if (where  is the significance level and there are k - 3 degrees of freedom).

59/71 Goodness of Fit Test: Normal Distribution--Example Chemline Employee Aptitude Scores To test the null hypothesis that the population of test scores has a normal distribution. Sample Data

60/71 Goodness of Fit Test: Normal Distribution—Example Estimates of μ and are = 68.42, s = Hypotheses H 0 : The population of test scores has a normal distribution with mean and standard deviation H a : The population of test scores does not have a normal distribution with mean and standard deviation 

61/71 Goodness of Fit Test: Normal Distribution—Example Expected Frequency Normal Distribution with 10 Equal- Probability Intervals (The Chemiline Example) Note: = (10.41)

62/71 Goodness of Fit Test: Normal Distribution—Example

63/71 Goodness of Fit Test: Normal Distribution—Example Computation of the Chi-square Test Statistic p-value =.4084 >α =.10. Cannot reject H 0.

64/71 Normal Distribution Goodness of Fit Test Example: IQ Computers IQIQ IQ Computers (one better than HP?) manufactures and sells a general purpose microcomputer. As part of a study to evaluate sales personnel, management wants to determine, at a.05 significance level, if the annual sales volume (number of units sold by a salesperson) follows a normal probability distribution.

65/71 A simple random sample of 30 of the salespeople was taken and their numbers of units sold are below. Normal Distribution Goodness of Fit Test Example: IQ Computers (mean = 71, standard deviation = 18.54) IQIQ

66/71 Hypotheses Normal Distribution Goodness of Fit Test H a : The population of number of units sold does not have a normal distribution with mean 71 and standard deviation H 0 : The population of number of units sold has a normal distribution with mean 71 and standard deviation

67/71 Interval Definition Normal Distribution Goodness of Fit Test To satisfy the requirement of an expected frequency of at least 5 in each interval we will divide the normal distribution into 30/5 = 6 equal probability intervals.

68/71 Interval Definition Areas = 1.00/6 =.1667 Areas = 1.00/6 = .43(18.54) = = (18.54) Normal Distribution Goodness of Fit Test

69/71 Observed and Expected Frequencies Normal Distribution Goodness of Fit Test Less than to to to to to to to to More than i f i e i f i - e i i f i e i f i - e iTotal

70/71 Test Statistic Reject H 0 if p-value Rejection Rule Normal Distribution Goodness of Fit Test With  =.05 and k - p - 1 = = 3 d.f. (where k = number of categories and p = number of population parameters estimated),

71/71 Normal Distribution Goodness of Fit Test Conclusion Using the p-Value Approach Because χ 2 = is between.584 and in the Chi- Square Distribution Table, the area in the upper tail of the distribution is between.90 and.10. The p-value > . We cannot reject the null hypothesis. There is little evidence to support rejecting the assumption the population is normally distributed with  = 71 and  = Area in Upper Tail  2 Value (df = 3)