1 Categorical Data (Chapter 10) Inference about one population proportion (§10.2). Inference about two population proportions (§10.3). Chi-square goodness-of-fit.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Chi Square Tests Chapter 17. Nonparametric Statistics A special class of hypothesis tests Used when assumptions for parametric tests are not met –Review:
Discrete (Categorical) Data Analysis
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Final Review Session.
Chapter 16 Chi Squared Tests.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistics for Managers Using Microsoft® Excel 5th Edition
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chi-Square Tests and the F-Distribution
Statistical Inference for Two Samples
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
AM Recitation 2/10/11.
Chapter 13: Inference in Regression
GOODNESS OF FIT TEST & CONTINGENCY TABLE
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Fundamentals of Hypothesis Testing: One-Sample Tests
Regression Analysis (2)
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
More About Significance Tests
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
1 Always be mindful of the kindness and not the faults of others.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Copyright © 2010 Pearson Education, Inc. Slide
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
1 Hypothesis Testing Goodness-of-fit & Independence Chi-Squared Tests.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistics 300: Elementary Statistics Section 11-2.
1 Always be mindful of the kindness and not the faults of others.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Chapter 11 Chi-Square Tests.
Chapter 4. Inference about Process Quality
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Presentation transcript:

1 Categorical Data (Chapter 10) Inference about one population proportion (§10.2). Inference about two population proportions (§10.3). Chi-square goodness-of-fit test (§10.4). Contingency Tables: Tests of independence and homogeneity (§10.5). Generalized Linear Models: Logistic regression (§12.8) and Poisson regression. Problem: The response variable is now categorical. Goals: (i)Extend analyses for comparing means in quantitative data (one- sample t-test, two-sample t-test, ANOVA), to compare proportions in categorical data. (ii)Build linear regression models for predicting a categorical response in similar style as for a quantitative response.

2 Chi-Square Goodness-of-Fit Test (§10.4) Want to compare several (k) observed proportions (  i ), to hypothesized proportions (  io ). Do the observed agree with the hypothesized?  i =  io, for categories i=1,2,...,k? H 0 :  i =  io, for i=1,2,...,k H a : At least two of the observed cell proportions differ from the hypothesized proportions.

3 Mannan & Meslow (1984) studied bird foraging behavior in a forest in Oregon. In a managed forest, 54% of the canopy volume was Douglas fir, 40% was ponderosa pine, 5% was grand fir, and 1% was western larch. They made 156 observations of foraging by red-breasted nuthatches: 70 observations (45%) in Douglas fir; 79 (51%) in ponderosa pine; 3 (2%) in grand fir; and 4 (3%) in western larch. H o : The birds forage randomly. H a : The birds do NOT forage randomly. (Prefer certain trees.) Example: Do Birds Forage Randomly?

4 If the birds forage randomly, we would expect to find them in the following proportions: Douglas Fir54% Ponderosa Pine40% Grand Fir5% Western Larch1%. But the following proportions were observed: Douglas Fir45% (70) Ponderosa Pine51% (79) Grand Fir2% (3) Western Larch3% (4) Do the birds prefer certain trees? Perform a test using a Pr(Type I error)=0.05. Bird Example: Summary

5 What are the key characteristics of the sample data collected? The data represent counts in different categories. What is the basic experiment? The type of tree where each of the 156 birds was observed foraging was noted. Before observing, each has a certain probability of being in one of the four types. After observing they are placed in the appropriate class. We call any experiment of n trials where each trial can have one of k possible outcomes a Multinomial Experiment. For individual j, the response, y j, indicates which outcome was observed. Possible outcomes are the integers 1,2,…,k. QUESTIONS TO ASK

6 The experiment consists of n identical trials. Each trial results in one of k possible outcomes. The probability that a single trial will result in outcome i is  i i=1,2,...,k, (  i =1)  and remains constant from trial to trial  The trials are independent (the response of one trial does not depend on the response of any other). The response of interest is n i the number of trials resulting in a particular outcome i. (  n i =n). Multinomial distribution: provides the probability distribution for the number of observations resulting in each of k outcomes. This tells us the probability of observing exactly n 1,n 2,...,n k. 0!=1 The Multinomial Experiment

7 Douglas Fir54%  1 =0.54 Pond Pine40%  2 =0.40 Grand Fir5%  3 =0.05 West Larch1%  4 =0.01 Douglas Fir n 1 =70 Pond Pine n 2 =79 Grand Fir n 3 =3 West Larch n 4 =4. Hypothesized Observed If this probability is high, then we would say that there is good likelihood that the observed data come from a multinomial experiment with the hypothesized probabilities. Otherwise we have the probabilities wrong. How do we measure the goodness of fit between the hypothesized probabilities and the observed data? From the bird foraging example

8 In a multinomial experiment of n trials with hypothesized probabilities of  i i=1,2,...,k, the expected number of responses in each outcome class is given by: A reasonable measure of goodness of fit would be to compare the observed class frequencies to the expected class frequencies. Turns out (Pearson, 1900) that this statistic is one of the best for this purpose. cell probability observed cell count expected cell count Has Chi Square distribution with df = k-1 provided no sparse counts: (i) no E i is less than 1, and (ii) no more than 20% of the E i are less than 5.

9 ClassHypothesizedObservedExpected Douglas Fir54%  1 = Pond Pine40%  2 = Grand Fir 5%  3 = West Larch 1%  4 = Since > 7.81 we reject H o. Conclude: it is unlikely that the birds are foraging randomly. (But: more than 20% of the E i are less than 5… Use an exact test.) Pr(Type I error  = 0.05

10 > birds = chisq.test(x=c(70,79,3,4), p=c(.54,.40,.05,.01)) Chi-squared test for given probabilities data: c(70, 79, 3, 4) X-squared = , df = 3, p-value = Warning message: In chisq.test(x = c(70, 79, 3, 4), p = c(0.54, 0.4, 0.05, 0.01)) : Chi-squared approximation may be incorrect > birds$resid [1] R Looks like birds prefer Pond Pine & West Larch to the other two!

11 H 0 :  i =  i o for categories i=1,2,...,k (Specified cell proportions for k categories) H a : At least two of the true population cell proportions differ from the specified proportions. Test Statistic: Rejection Region: Reject H 0 if  2 exceeds the tabulated critical value for the Chi Square distribution with df=k-1 and Pr(Type I Error) = . Summary: Chi Square Goodness of Fit Test

12 McDonald et al. (1996) examined variation at the CVJ5 locus in the American oyster (Crassostrea virginica). There were two alleles, L and S, and the genotype frequencies observed from a sample of 60 were: LL: 14LS: 21SS: 25 Using an estimate of the L allele proportion of p=0.408, the Hardy- Weinberg formula gives the following expected genotype proportions: LL: p 2 = 0.167LS: 2p(1-p) = 0.483SS: (1-p) 2 = Here there are 3 classes (LL, LS, SS), but all the classes are functions of only one parameter (p). Hence, the chi-square distribution has only one (1) degree of freedom, and NOT 3-1=2. Example: Genotype Frequency in Oysters (A Nonstandard Chi-Square GOF Problem)

13 > chisq.test(x=c(14,21,25), p=c(.167,.483,.350)) Chi-squared test for given probabilities data: c(14, 21, 25) X-squared = , df = 2, p-value = R This p-value is WRONG! Must compare with chi-square with 1 df. Since > 3.841, we should reject H o. Conclude that the genotype frequencies do NOT follow the Hardy- Weinberg formula.

14 Suppose you want to do a genetic cross of snapdragons with an expected 1:2:1 ratio, and you want to be able to detect a pattern with 5% more heterozygotes than expected under Hardy-Weinberg. Power Analysis in Chi-Square GOF Tests ClassHypothesizedWant to detect (from data) aa25%  1 = % aA50%  2 = % AA 25%  3 = % The necessary sample size (n) to be able to detect this difference can be computed by the more comprehensive packages (SAS, SPSS, R). There is also a free package, G*Power 3 (correct as of Spring 2010): [ Inputs: (0.25,0.50,0.25) for hypothesized; (0.225,0.55,0.225) for expected; and df=1. Should get n of approx 1,000.

15 Tests and Confidence Intervals for One and Two Proportions (§10.2, 10.3) First look at case of single population proportion (  ). A random sample of size n is taken, and the number of “successes” (y) is noted.

16 Since the sum of the proportions is equal to 1, we have: Since the sum of the cell frequencies equal the total sample size. If  is the probability of a “success” and y is the number of “successes” in n trials. Estimate of success probability is: Binomial Experiment = Multinomial Experiment with two classes

17 In general, for the probability of observing y or greater successes can be approximated by an appropriate normal distribution (see section 4.13). What about a confidence interval (CI) for  ? Using a similar argument as for y, we obtain the (1-  )100% CI: Use when  is unknown. Normal Approximation to the Binomial and CI for 

18 H 0 :  =  0 (  0 specified)H a :1.  >  0 2.  <  0 3.    0 Test Statistic: Rejection Region: 1.Reject if z > z  2.Reject if z < -z  3.Reject if | z | > z  Note: Under H 0 : Approximate Statistical Test for 

19 Suppose we wish to estimate  to within  E with confidence 100(1-  )%. What sample size should we use? Since  is unknown, do the following: 1.Substitute our best guess. 2.Use  = 0.5 (worst case estimate). Example: We have been contracted to perform a survey to determine what fraction of students eat lunch on campus. How many students should we interview if we wish to be 95% confident of being within  2% of the true proportion? Worst case: (  = 0.5) Best guess: (  = 0.2) Sample Size needed to meet a pre-specified confidence in 

20 Situation: Two sets of 60 ninth-graders were taught algebra I by different methods (self-paced versus formal lectures). At the end of the 4-month period, a comprehensive, standardized test was given to both groups with results: Experimental group:n=60,39 scored above 80%. Traditional group:n=60,28 scored above 80%. Is this sufficient evidence to conclude that the experimental group performed better than the traditional group? Each student is a Bernoulli trial with probability  1 of success (high test score) if they are in the experimental group, and  2 of success if they are in the traditional group. H 0 :  1 =  2 H a :  1 >  2 versus Comparing Two Binomial Proportions

21 PopulationExample 12 Population proportion  1  2 Sample sizen 1 n Number of successesy 1 y Sample proportion: (1-  )% confidence interval for  1 -  2. use Interpret … ± 1.645(0.089)  (.036,.330) Ex: 90% CI is

22 H 0 :   -  2 =0 (or   =  2 =  H a :1.   -  2 > 0 2.   -  2 < 0 3.    2  Test Statistic: Rejection Region: 1.Reject if z > z  2.Reject if z < -z  3.Reject if | z | > z  Note: Under H 0 : Statistical Test for Comparing Two Binomial Proportions

23 PopulationExample 12 Population proportion  1  2 Sample sizen 1 n Number of successesy 1 y Sample proportion: Since is greater than we reject H 0 and conclude H a :  1 >  2. Test Statistic: