Lecture 4. The Multinomial Distribution (II)

Slides:



Advertisements
Similar presentations
Categorical Data Analysis
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Likelihood ratio tests
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Chapter 16 The Chi-Square Statistic
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Confidence intervals and hypothesis testing Petter Mostad
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Chapter Outline Goodness of Fit test Test of Independence.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chi square and Hardy-Weinberg
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Topics
Lecture #24 Thursday, November 10, 2016 Textbook: 13.1 through 13.6
SETS AND VENN DIAGRAMS.
Lecture #19 Tuesday, October 25, 2016 Textbook: Sections 12.3 to 12.4
STATISTICAL INFERENCE
Topics
Chapter 11 Chi-Square Tests.
Chapter Fifteen McGraw-Hill/Irwin
Test for Goodness of Fit
Days of the week NEXT.
Active Learning Lecture Slides
Lecture 6 Comparing Proportions (II)
DAYS OF THE WEEK.
SA3202, Solution for Tutorial 1
SA3202, Solution for Tutorial 2
Goodness of Fit Tests The goal of goodness of fit tests is to test if the data comes from a certain distribution. There are various situations to which.
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
Chapter 11 Goodness-of-Fit and Contingency Tables
SA3202 Statistical Methods for Social Sciences
Lecture 2. The Binomial Distribution
Elementary Statistics: Picturing The World
Inference on Categorical Data
Lecture 11 Nonparametric Statistics Introduction
Goodness-of-Fit Tests
Sunday Monday Tuesday Wednesday Sunday Monday Tuesday Wednesday
Econ 3790: Business and Economics Statistics
Discrete Event Simulation - 4
Lecture 5, Goodness of Fit Test
POINT ESTIMATOR OF PARAMETERS
Lecture 10 Comparing 2xk Tables
SA3202 Data Sets 1. Random Number Data The following table shows the frequency of each digit when 100 “random digits” were generated on a pocket.
Chapter 13 Goodness-of-Fit Tests and Contingency Analysis
Lecture 14 The Sign Test and the Rank Test
Categorical Data Analysis
Lecture 9 Sampling Procedures and Testing Independence
Chapter 11 Chi-Square Tests.
Lecture 1. Introduction Outlines for Today 1.Types of Variables
Chapter 13 – Applications of the Chi-Square Statistic
Lecture 3. The Multinomial Distribution
Contingency tables and goodness of fit
Statistics II: An Overview of Statistics
Days of the Week Monday Tuesday Wednesday Friday Thursday Saturday
Copyright © Cengage Learning. All rights reserved.
SA3202, Solution for Tutorial 3
Test for Equality of Several Proportions
Chapter 13 Goodness-of-Fit Tests and Contingency Analysis
Summary Table of Influence Procedures for a Single Sample (I)
Time 1.
Contact
Chapter Outline Goodness of Fit test Test of Independence.
Chi-Square Test for Homogeneity
Chapter 11 Chi-Square Tests.
2011年 5月 2011年 6月 2011年 7月 2011年 8月 Sunday Monday Tuesday Wednesday
On and At Unit 2 Fun after school.
Chapter 13: Chi-Square Procedures
Presentation transcript:

Lecture 4. The Multinomial Distribution (II) Outlines for Today Test Composite Hypotheses 1. Definition Examples Testing Procedure 11/22/2018 SA3202, Lecture 4

Definition Composite Hypothesis: a hypothesis that does not completely specify the values of the parameters, arising in two cases: Case I: the hypothesis imposes restrictions on the possible values of the parameters Example 1 Let X1,X2,…,X7 denote the number of Singaporean who go to movie on Monday, Tuesday, …, Saturday, Sunday respectively. Then X=(X1,X2,…,X7)~M(n; p1,p2, …p7) Where n is the total population of Singapore and p1,p2, …,p7 are the population proportions of Singaporean who go to movie on Monday, Tuesday, …,Saturday, and Sunday. The hypothesis for testing the population proportions are the same for weekdays and weekends respectively is: H0: p1=p2=p3=p4=p5, p6=p7 which is a composite hypothesis. 11/22/2018 SA3202, Lecture 4

Example 2 The following table shows the classification of suicides in France by day of the week: Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total # of Suicides 1001 1035 982 1033 905 737 894 6587 For this data set, we may consider the hypothesis that essentially the only difference between the days is the difference between the working days ( Monday, Tuesday, …Friday) and the weekend (Saturday and Sunday): H0: p1=p2=p3=p4=p5, p6=p7 Example 3 The following table shows the political views of 1397 Americans in 1975. Response Code Frequency Extremely Liberal 1 46 Liberal 2 179 Slightly Liberal 3 196 Moderate 4 559 Slightly Conservative 5 232 Conservative 6 150 Extremely Conservative 7 35 Total 1397 A hypothesis of interest may be the hypothesis states that there is a balance between the “left” (liberals) and the “right” (conservatives): H0: p1=p7, p2=p6, p3=p5. 11/22/2018 SA3202, Lecture 4

Case II: the hypothesis postulates a model for the pj’s, in terms of a small number of new parameters. That is, the hypothesis specifies the values of pj’s in terms of a set of new parameters, , say. Example the Number of Boys Data: The following table shows the number of boys among the first 4 children in 3343 Swedish families of size 4 or more. Number of boys 0 1 2 3 4 Total Frequency 183 789 1250 875 246 3343 We may consider the hypothesis that the number of boys, Y, among the first four children follows a binomial distribution but not necessarily with parameter .5: Y~Binom(4, ). That is the parameter, , of the binomial distribution needs to be specified 11/22/2018 SA3202, Lecture 4

Testing Procedure A composite hypothesis is also tested by comparing the Observed Frequencies Xi with their Expected Frequencies mi under H0 using the Pearson’s Goodness of Fit Test Statistic Or The Wilk’s Likelihood Ratio Test Statistic : But 1. The mi’s have to be estimated. 2. The df ‘s of the test statistics have to de adjusted: For the Case I : df= k- the number of (independent) restrictions on the pi’s. For the Case II: df=k-1- the number of ‘free” parameters “ estimated under H0 11/22/2018 SA3202, Lecture 4

df=7-2=5, 95% table value with 5 df= 11.07 , H0 rejected. Why? Examples Example 1 Consider the Suicide Data. To test H0:p1=p2=…=p5, p6=p7, we have n=6586. p1=…=p5=[(x1+x2+x3+x4+x5)/n]/5=(4956/6586)/5=.1505, p6=p7=[(x6+x7)/n]/2=.1238 m1=…=m5=n*p1=6586*.1505=991.2, m6=m7=n p6=6586*.1238=815.5, The Pearson’s Goodness of Fit Test Statistic T= =26.489, df=7-2=5, 95% table value with 5 df= 11.07 , H0 rejected. Why? (X-m)^2/m=[0.096, 1.935, 0.0853, 1.763, 7.496, 7.556, 7.556] The Wilk’s Likelihood Ratio Test Statistic G= =26.69, df=7-2=5, H0 is rejected. After-class exercises: For the above data set, Test and interpret the results: 1. H0: p1=p2=p3=p4, p5=p6=p7 2. H0:p1=p2=p3=p4,p5=p7. 11/22/2018 SA3202, Lecture 4

theta=total number of boys/total number of children Examples Example 2 For the Number of Boys Data, consider the hypothesis that the number of boys follows a binomial distribution: Binom(4, theta). The natural estimator of theta is theta=total number of boys/total number of children =(0*183+1*789+…+4*256)/(4*3343)=6898/13372=.5159. Then the estimated number of boys in each categories are m0=n(1-theta)^4=3343*(1-.5159)^4=183.6, n=# of families m1=n (4*theta(1-theta)^3)=782.8, m2=1251.1, m3=888.7 m4=236.72 The Pearson’s Goodness of Fit Test Statistic T= =.6269, df=5-1-1=3, 95% table value with 3 df=7.815 , H0 is not rejected The Wilk’s Likelihood Ratio Test Statistic G= =.6232, df=5-1-1=3, H0 is NOT rejected. 11/22/2018 SA3202, Lecture 4

Remarks about the Pearson’s Goodness of Fit test and the Wilk’s Likelihood Ratio Test: They are asymptotically chi-squared distributions. The accuracy depends on the sample size n. A rough rule for them to be valid: a). Most of the expected frequencies must be >5 b). None of them <1. c). In case some of them <1, combine them with the neighboring categories so that b) is satisfied. 11/22/2018 SA3202, Lecture 4