Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4. The Multinomial Distribution (II)

Similar presentations


Presentation on theme: "Lecture 4. The Multinomial Distribution (II)"— Presentation transcript:

1 Lecture 4. The Multinomial Distribution (II)
Outlines for Today Test Composite Hypotheses Definition Examples Testing Procedure 11/22/2018 SA3202, Lecture 4

2 Definition Composite Hypothesis: a hypothesis that does not completely specify the values of the parameters, arising in two cases: Case I: the hypothesis imposes restrictions on the possible values of the parameters Example 1 Let X1,X2,…,X7 denote the number of Singaporean who go to movie on Monday, Tuesday, …, Saturday, Sunday respectively. Then X=(X1,X2,…,X7)~M(n; p1,p2, …p7) Where n is the total population of Singapore and p1,p2, …,p7 are the population proportions of Singaporean who go to movie on Monday, Tuesday, …,Saturday, and Sunday. The hypothesis for testing the population proportions are the same for weekdays and weekends respectively is: H0: p1=p2=p3=p4=p5, p6=p7 which is a composite hypothesis. 11/22/2018 SA3202, Lecture 4

3 Example 2 The following table shows the classification of suicides in France by day of the week:
Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total # of Suicides For this data set, we may consider the hypothesis that essentially the only difference between the days is the difference between the working days ( Monday, Tuesday, …Friday) and the weekend (Saturday and Sunday): H0: p1=p2=p3=p4=p5, p6=p7 Example 3 The following table shows the political views of 1397 Americans in 1975. Response Code Frequency Extremely Liberal Liberal Slightly Liberal Moderate Slightly Conservative Conservative Extremely Conservative Total A hypothesis of interest may be the hypothesis states that there is a balance between the “left” (liberals) and the “right” (conservatives): H0: p1=p7, p2=p6, p3=p5. 11/22/2018 SA3202, Lecture 4

4 Case II: the hypothesis postulates a model for the pj’s, in terms of a small number of new parameters. That is, the hypothesis specifies the values of pj’s in terms of a set of new parameters, , say. Example the Number of Boys Data: The following table shows the number of boys among the first 4 children in 3343 Swedish families of size 4 or more. Number of boys Total Frequency We may consider the hypothesis that the number of boys, Y, among the first four children follows a binomial distribution but not necessarily with parameter .5: Y~Binom(4, ). That is the parameter, , of the binomial distribution needs to be specified 11/22/2018 SA3202, Lecture 4

5 Testing Procedure A composite hypothesis is also tested by comparing the Observed Frequencies Xi with their Expected Frequencies mi under H0 using the Pearson’s Goodness of Fit Test Statistic Or The Wilk’s Likelihood Ratio Test Statistic : But 1. The mi’s have to be estimated. 2. The df ‘s of the test statistics have to de adjusted: For the Case I : df= k- the number of (independent) restrictions on the pi’s. For the Case II: df=k-1- the number of ‘free” parameters “ estimated under H0 11/22/2018 SA3202, Lecture 4

6 df=7-2=5, 95% table value with 5 df= 11.07 , H0 rejected. Why?
Examples Example 1 Consider the Suicide Data. To test H0:p1=p2=…=p5, p6=p7, we have n=6586. p1=…=p5=[(x1+x2+x3+x4+x5)/n]/5=(4956/6586)/5=.1505, p6=p7=[(x6+x7)/n]/2=.1238 m1=…=m5=n*p1=6586*.1505=991.2, m6=m7=n p6=6586*.1238=815.5, The Pearson’s Goodness of Fit Test Statistic T= =26.489, df=7-2=5, 95% table value with 5 df= , H0 rejected. Why? (X-m)^2/m=[0.096, 1.935, , 1.763, , 7.556, 7.556] The Wilk’s Likelihood Ratio Test Statistic G= =26.69, df=7-2=5, H0 is rejected. After-class exercises: For the above data set, Test and interpret the results: 1. H0: p1=p2=p3=p4, p5=p6=p7 2. H0:p1=p2=p3=p4,p5=p7. 11/22/2018 SA3202, Lecture 4

7 theta=total number of boys/total number of children
Examples Example 2 For the Number of Boys Data, consider the hypothesis that the number of boys follows a binomial distribution: Binom(4, theta). The natural estimator of theta is theta=total number of boys/total number of children =(0*183+1*789+…+4*256)/(4*3343)=6898/13372=.5159. Then the estimated number of boys in each categories are m0=n(1-theta)^4=3343*( )^4=183.6, n=# of families m1=n (4*theta(1-theta)^3)=782.8, m2=1251.1, m3=888.7 m4=236.72 The Pearson’s Goodness of Fit Test Statistic T= =.6269, df=5-1-1=3, 95% table value with 3 df= , H0 is not rejected The Wilk’s Likelihood Ratio Test Statistic G= =.6232, df=5-1-1=3, H0 is NOT rejected. 11/22/2018 SA3202, Lecture 4

8 Remarks about the Pearson’s Goodness of Fit test and the Wilk’s Likelihood Ratio Test:
They are asymptotically chi-squared distributions. The accuracy depends on the sample size n. A rough rule for them to be valid: a). Most of the expected frequencies must be >5 b). None of them <1. c). In case some of them <1, combine them with the neighboring categories so that b) is satisfied. 11/22/2018 SA3202, Lecture 4


Download ppt "Lecture 4. The Multinomial Distribution (II)"

Similar presentations


Ads by Google