BCOR 1020 Business Statistics

BCOR 1020 Business Statistics
Lecture 27 – April 29, 2008

Overview Chapter 14 – Chi-Square Tests Chi-Square Distribution
Chi-Square Test for Independence Chi-Square Test for Goodness of Fit

Chapter 14 – Chi-Square Distribution
For tests that have a test statistic involving a sum of squared differences, we will often use a chi-square distribution. Our test critical values come from the chi-square probability distribution with n degrees of freedom. n = degrees of freedom (will vary depending on the application) Appendix E contains critical values for right-tail areas of the chi-square distribution. The mean of a chi-square distribution is n with variance 2n.

Chapter 14 – Chi-Square Distribution
Consider the shape of the chi-square distribution: c2.1 = 6.251 c2.1 = 18.55 c2.1 = 4.605 Example: Find the upper 10% critical point for each of these distributions.

Clicker Using the chi-square table, find the upper 5%
critical point for a chi-square distribution with n = 5 degrees of freedom. (A) 1.610 (B) 9.488 (C) 9.236 (D) 11.07

Chapter 14 – Chi-Square Test for Independence
Contingency Tables: A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.

Contingency Tables: For example: (overhead)

In a test of independence for an r x c contingency table, the hypotheses are H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This non-parametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency fjk is compared with the expected frequency ejk.

Chi-Square Distribution: The critical value comes from the chi-square probability distribution with n degrees of freedom. n = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table

Expected Frequencies: Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r) Ck = total for column k (k = 1, 2, …, c) n = sample size

Expected Frequencies: The table of expected frequencies is: The ejk always sum to the same row and column frequencies as the observed frequencies.

Steps in Testing the Hypotheses: Step 1: State the Hypotheses H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Step 2: State the Decision Rule Calculate n = (r – 1)(c – 1) For a given a, look up the right-tail critical value (c2a,n) from Appendix E or by using Excel. Reject H0 if test statistic > c2a,n (or if p-value < a).

Steps in Testing the Hypotheses: Step 3: Calculate the Expected Frequencies ejk = RjCk/n For example,

Steps in Testing the Hypotheses: Step 4: Calculate the Test Statistic The chi-square test statistic is Step 5: Make the Decision Reject H0 if test statistic > c2a,n or if the p-value < a.

Example: Privacy Disclaimer Location and Web Site Nationality (on overhead) The actual frequencies are on the overhead (and slide #4). Our Hypotheses are: H0: Privacy disclaimer location is independent of Web site nationality. H1: Privacy disclaimer location is dependent on Web site nationality. Decision Rule (at a = 5%): Degrees of Freedom: n = (r – 1) x (c – 1) = (4 – 1) x (3 – 1) = 6 Reject H0 if c2 > c2a,n = c2.05,6 =

Example: Privacy Disclaimer Location and Web Site Nationality (on overhead) We computed the expected frequencies… We can use these and the actual frequencies to calculate our test statistic…

Clickers (A) Fail to reject H0. (B) Reject H0. (C) Start Laughing.
Example (continued)… Decision: Based on our test statistic and decision criteria, we should… (A) Fail to reject H0. (B) Reject H0. (C) Start Laughing. (D) Abandon Hope.

Another Example… Fill in the missing elements for the contingency table below (problem 14.2 on page 541.)… Our chi-square test will have n = (r – 1) x (c – 1) = 3 d.f. Running Shoe Ownership in World Regions Owned by U.S. Europe Asia Latin America Row Total Teens 80 89 69 ___ 303 Adults 11 31 35 97 Col Total 100 400 65 20

Running Shoe Ownership in World Regions (Actual Frequencies) Owned by U.S. Europe Asia Latin America Row Total Teens 80 89 69 65 303 Adults 20 11 31 35 97 Col Total 100 400 Running Shoe Ownership in World Regions (Expected Frequencies) Owned by U.S. Europe Asia Latin America Row Total Teens 75.75 (303x100)/400 = 75.75 303 Adults 24.25 (97x100)/400 = 24.25 97 Col Total 100 400

Example (continued)… To conduct the test for independence, State the hypotheses: H0: Running shoe ownership by age-group is independent of world region. H1: Running shoe ownership by age-group is dependent on world region. Decision Rule (at a = 5%): Reject H0 if c2 > c2a,n = c2.05,3 =

Clickers (A) Fail to reject H0. (B) Reject H0. (C) Too close to call.
Example (continued)… Our chi-square statistic is computed as c2 = What should our decision be? (A) Fail to reject H0. (B) Reject H0. (C) Too close to call.

Clickers (A) p-value > 0.05 (B) 0.01 < p-value < 0.05
Example (continued)… For our computed chi-square statistic, c2 = , which has n = 3 d.f. under H0, what is the best bound for the p-value for this test using the chi- square table? (A) p-value > 0.05 (B) 0.01 < p-value < 0.05 (C) < p-value < 0.01 (D) p-value < 0.005

Chapter 14 – Chi-Square Test for Goodness-of-Fit
Purpose of the Test: The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test will be used because it is versatile and easy to understand. The test statistic is intuitive… It involves differences between observed frequencies in the data and expected frequencies (assuming the assumed distribution is correct).

Hypotheses for GOF: The hypotheses are: H0: The population follows a _______ distribution H1: The population does not follow a _______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

Test Statistic and Degrees of Freedom for GOF: Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: where fj = the observed frequency of observations in class j ej = the expected frequency in class j if H0 were true

Test Statistic and Degrees of Freedom for GOF: If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. The test statistic follows the chi-square distribution with degrees of freedom n = c – m – 1 where c is the no. of classes used in the test m is the no. of parameters estimated

Eyeball Tests: A simple “eyeball” inspection of the histogram or dot plot may suffice to rule out a hypothesized population. Small Expected Frequencies: Goodness-of-fit tests may lack power in small samples. As a guideline, a chi-square goodness-of-fit test should be avoided if n < 25.

Multinomial Distribution A multinomial distribution is defined by any k probabilities p1, p2, …, pk that sum to unity. For example, consider the following “official” proportions of M&M colors.

Multinomial Distribution: The hypotheses are H0: p1 = .30, p2 = .20, p3 = .10, p4 = .10, p5 = .10, p6 = .20 H1: At least one of the pj differs from the hypothesized value No parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom are n = c – m – 1 = 6 – 0 – 1 = 5 degrees of freedom Our test statistic (from the table on the previous slide) is c2 = We will compare this to the appropriate critical point of the chi-square distribution with n = 5 d.f.

Clicker Our test statistic for the M&Ms example was
c2 = Under H0, this statistic has a chi-square distribution with n = 5 d.f. Use the chi-square table to bound the p-value for this hypothesis test. (A) < p-value < 0.01 (B) 0.01 < p-value < 0.025 (C) < p-value < 0.05 (D) 0.05 < p-value < 0.10

Uniform Distribution: The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. The chi-square test for a uniform distribution compares all c groups simultaneously. The hypotheses are: H0: p1 = p2 = …, pc = 1/c H1: Not all pj are equal

Uniform GOF Test: Grouped Data The test can be performed on data that are already tabulated into groups. Calculate the expected frequency eij for each cell. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value c2a from Appendix E for the desired level of significance a. The p-value can be obtained from Excel. Reject H0 if p-value < a.

Uniform GOF Test: Raw Data First form c bins of equal width and create a frequency distribution. Calculate the observed frequency fj for each bin. Define ej = n/c. Perform the chi-square calculations. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level a and make the decision.

Uniform GOF Test: Raw Data Maximize the test’s power by defining bin width as As a result, the expected frequencies will be as large as possible. Calculate the mean and standard deviation of the uniform distribution as: m = (a + b)/2 If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. So, test the hypothesized uniform mean using s = [(b – a + 1)2 – 1)/12

General GOF Tests: Goodness-of-Fit tests can be constructed in a similar manner for other distributions (Poisson, Normal, etc.) We will generally conduct these tests using a software package.

BCOR 1020 Business Statistics

Similar presentations

Presentation on theme: "BCOR 1020 Business Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BCOR 1020 Business Statistics

Similar presentations

Presentation on theme: "BCOR 1020 Business Statistics"— Presentation transcript:

Similar presentations

About project

Feedback