Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 12 Goodness-of-Fit Tests and Contingency Analysis
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Hypothesis Testing IV Chi Square.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter 7: Statistical Applications in Traffic Engineering
Chapter 14 Analysis of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter Goals After completing this chapter, you should be able to:
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
BCOR 1020 Business Statistics
Week 9 October Four Mini-Lectures QMM 510 Fall 2014.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 15 The.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
15 Chi-Square Tests Chi-Square Test for Independence
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. 1.. Section 11-2 Goodness of Fit.
BIOL 582 Lecture Set 17 Analysis of frequency and categorical data Part II: Goodness of Fit Tests for Continuous Frequency Distributions; Tests of Independence.
Chi-square test or c2 test
1 Statistical Distribution Fitting Dr. Jason Merrick.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Confidence intervals and hypothesis testing Petter Mostad
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Week 6 October 6-10 Four Mini-Lectures QMM 510 Fall 2014.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Week 8 October Three Mini-Lectures QMM 510 Fall 2014.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistics 300: Elementary Statistics Section 11-2.
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 12 Chi-Square Tests and Nonparametric Tests.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Two-Sample Hypothesis Testing
Hypothesis testing. Chi-square test
Qualitative data – tests of association
Goodness-of-Fit Tests
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
Analyzing the Association Between Categorical Variables
Section 11-1 Review and Preview
15 Chi-Square Tests Chi-Square Test for Independence
Presentation transcript:

Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014

15-2 Chi-Square Tests ML 10.1 Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chapter 15 So many topics, so little time …

15-3 Chi-Square Test for Independence A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading. Contingency Tables Contingency Tables Chapter 15

15-4 Contingency Tables Contingency Tables For example:For example: Chapter 15 Chi-Square Test for Independence

15-5 Chi-Square Test Chi-Square Test In a test of independence for an r x c contingency table, the hypotheses are H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This nonparametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency f jk is compared with the expected frequency e jk. Chapter 15 Chi-Square Test for Independence

15-6 The critical value comes from the chi-square probability distribution with d.f. degrees of freedom. d.f. = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table Appendix E contains critical values for right-tail areas of the chi-square distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.) The mean of a chi-square distribution is d.f. with variance 2d.f. Chi-Square Distribution Chi-Square Distribution Chapter 15 Chi-Square Test for Independence

15-7 Consider the shape of the chi-square distribution: Chi-Square Distribution Chi-Square Distribution Chapter 15 Chi-Square Test for Independence

15-8 Assuming that H 0 is true, the expected frequency of row j and column k is: e jk = R j C k /n where R j = total for row j (j = 1, 2, …, r) C k = total for column k (k = 1, 2, …, c) n = sample size Expected Frequencies Expected Frequencies Chapter 15 Chi-Square Test for Independence

15-9 Step 1: State the Hypotheses H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B Step 2: Specify the Decision Rule Calculate d.f. = (r – 1)(c – 1) For a given α, look up the right-tail critical value (  2 R ) from Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.). Reject H 0 if  2 R > test statistic. Steps in Testing the Hypotheses Steps in Testing the Hypotheses Chapter 15 Chi-Square Test for Independence

15-10 For example, for d.f. = 6 and α =.05,  2.05 = Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

15-11 Here is the rejection region. Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

15-12 Step 3: Calculate the Expected Frequencies e jk = R j C k /n For example, Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

15-13 Step 4: Calculate the Test Statistic The chi-square test statistic is Step 5: Make the Decision Reject H 0 if test statistic  2 calc >  2 R or if the p-value  α. Steps in Testing the Hypotheses Steps in Testing the Hypotheses Chapter 15 Chi-Square Test for Independence

15-14 Example: MegaStat Example: MegaStat Chapter 15 Chi-Square Test for Independence p-value = is not small enough to reject the hypothesis of independence at α =.05 all cells have e jk  5 so Cochran’s Rule is met Caution: Don’t highlight row or column totals

15-15 For a 2 × 2 contingency table, the chi-square test is equivalent to a two- tailed z test for two proportions. The hypotheses are: Test of Two Proportions Test of Two Proportions Figure 14.6 Chapter 15 Chi-Square Test for Independence

15-16 The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochran’s Rule requires that e jk > 5 for all cells. Up to 20% of the cells may have e jk < 5 Small Expected Frequencies Small Expected Frequencies Most agree that a chi-square test is infeasible if e jk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies. Chapter 15 Chi-Square Test for Independence

15-17 Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. Cross-Tabulating Raw Data Cross-Tabulating Raw Data For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories: Chapter 15 Chi-Square Test for Independence

15-18 Why Do a Chi-Square Test on Numerical Data? Why Do a Chi-Square Test on Numerical Data? The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. There are outliers or anomalies that prevent us from assuming that the data came from a normal population. The researcher has numerical data for one variable but not the other. Chapter 15 Chi-Square Test for Independence

15-19 More than two variables can be compared using contingency tables. However, it is difficult to visualize a higher-order table. For example, you could visualize a cube as a stack of tiled 2-way contingency tables. Major computer packages permit three-way tables. 3-Way Tables and Higher 3-Way Tables and Higher Chapter 15 Chi-Square Test for Independence

15-20 Chi-Square Tests for Goodness-of-Fit ML 10.2 Purpose of the Test Purpose of the Test The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test is versatile and easy to understand. Chapter 15 Hypotheses for GOF tests: Hypotheses for GOF tests: The hypotheses are: H 0 : The population follows a _____ distribution H 1 : The population does not follow a ______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

15-21 Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: Test Statistic and Degrees of Freedom for GOF where f j = the observed frequency of observations in class j e j = the expected frequency in class j if the sample came from the hypothesized population Chapter 15 Chi-Square Tests for Goodness-of-Fit

15-22 If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. The test statistic follows the chi-square distribution with degrees of freedom d.f. = c – m – 1. where c is the number of classes used in the test and m is the number of parameters estimated. Test Statistic and Degrees of Freedom for GOF tests Chapter 15 Chi-Square Tests for Goodness-of-Fit

15-23 Many statistical tests assume a normal population, so this the most common GOF test. Two parameters, the mean μ and the standard deviation σ, fully describe a normal distribution. Unless μ and σ are known a priori, they must be estimated from a sample in order to perform a GOF test for normality. Is the Sample from a Normal Population? Chapter 15 Normal Chi-Square GOF Test

15-24 Method 1: Standardize the Data Method 1: Standardize the Data Chapter 15 Normal Chi-Square GOF Test Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient). Transform sample observations x 1, x 2, …, x n into standardized z-values. Count the sample observations within each interval on the z-scale and compare them with expected normal frequencies e j.

15-25 Step 1: Divide the exact data range into c groups of equal width, and count the sample observations in each bin to get observed bin frequencies f j. Step 2: Convert the bin limits into standardized z-values: Method 2: Equal Bin Widths Method 2: Equal Bin Widths Chapter 15 Step 3: Find the normal area within each bin assuming a normal distribution. Step 4: Find expected frequencies e j by multiplying each normal area by the sample size n. Normal Chi-Square GOF Test Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).

Chapter Method 3: Equal Expected Frequencies Method 3: Equal Expected Frequencies Normal Chi-Square GOF Test Define histogram bins in such a way that an equal number of observations would be expected under the hypothesis of a normal population, i.e., so that e j = n/c. A normal area of 1/c is expected in each bin. The first and last classes must be open-ended, so to define c bins we need c-1 cut points. Count the observations f j within each bin. Compare the f j with the expected frequencies e j = n/c. Advantage: Advantage: Makes efficient use of the sample. Disadvantage Disadvantage: Cut points on the z-scale points may seem strange.

15-27 Method 3: Equal Expected Frequencies Method 3: Equal Expected Frequencies Standard normal cut points for equal area bins. Standard normal cut points for equal area bins. Table Chapter 15 Normal Chi-Square GOF Test

15-28 Critical Values for Normal GOF Test Critical Values for Normal GOF Test Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1.Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1. We need at least four bins to ensure at least one degree of freedom.We need at least four bins to ensure at least one degree of freedom. Chapter 15 Normal Chi-Square GOF Test Small Expected Frequencies Small Expected Frequencies Cochran’s Rule suggests at least e j  5 in each bin (e.g., with 4 bins we would want n  20, and so on).Cochran’s Rule suggests at least e j  5 in each bin (e.g., with 4 bins we would want n  20, and so on).

15-29 Visual Tests Visual Tests The fitted normal superimposed on a histogram gives visual clues as to the likely outcome of the GOF test. A simple “eyeball” inspection of the histogram may suffice to rule out a normal population by revealing outliers or other non- normality issues. Chapter 15 Normal Chi-Square GOF Test

15-30 ECDF Tests ML 10.3 There are alternatives to the chi-square test for normality based on the empirical cumulative distribution function (ECDF). ECDF tests are done by computer. Details are omitted here. A small p-value casts doubt on normality of the population. Kolmogorov-Smirnov (K-S)The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values. Anderson-Darling (A-D)The Anderson-Darling (A-D) test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test is widely used because of its power and attractive visual. Chapter 15 ECDF Tests for Normality ECDF Tests for Normality

15-31 Chapter 15 ECDF Tests Example: Minitab’s Anderson-Darling Test for Normality Near-linear probability plot suggests good fit to normal distribution p-value = is not small enough to reject normal population at α =.05 Data: weights of 80 babies (in ounces)

15-32 Chapter 15 ECDF Tests Example: MegaStat’s Normality Tests Near-linear probability plot suggests good fit to normal distribution p-value = is not small enough to reject normal population at α =.05 in this chi-square test Data: weights of 80 babies (in ounces) Note: Note: MegaStat’s chi-square test is not as powerful as the A-D test, so we would prefer the A-D test if software is available. The MegaStat probability plot is good, but shows no p-value.