Multinomial Distribution

Name: Multinomial Distribution
Uploaded: 2017-08-06T13:43:23+00:00
Duration: PTM10S42
Channel: Laurel Short
Description: Multinomial Distribution

Multinomial Distribution
Multinomial coefficients Definition Marginals are binomial Maximum likelihood Hypothesis tests

Multinomial Coefficient: From n objects, number of ways to choose
n1 of type 1 n2 of type 2 nk of type k

Of 30 graduating students, how many ways are there for 15 to be employed in a job related to their field of study, 10 to be employed in a job unrelated to their field of study, and 5 unemployed?

Multinomial Distribution
Statistical experiment with k outcomes Repeated independently n times Pr(Outcome j) = pj, j = 1, …, k Number of times outcome j occurred is xj, j = 1, …, k A multivariate distribution

But if one x_j = n, all the others are zero.

Marginals are also multinomial
This is too messy -- students are not responsible for it Using binomial theorem …

Observe Adding over xk-1 throws it into the “leftover” category.
Labels 1, …, k are arbitrary, so this means you can combine any 2 categories and the result is still multinomial. k is arbitrary, so you can keep doing it and combine any number of categories. When only two categories are left, the result is binomial E(xj) = npj, Var(xj) = npj(1-pj) You are responsible for these IMPLICATIONS of the last slide

Sample problem P(Job related to field of study) = 0.60
P(Job unrelated to field of study) = 0.30 P(No job) = 0.10 Of 30 randomly chosen students, what is probability that 15 are employed in a job related to their field of study, 10 are employed in a job unrelated to their field of study, and 5 are unemployed? What is the probability that exactly 5 are unemployed? How did I get that exact answer?!! Alternative is dmultinom(c(15,10,5), prob=c(60,30,10))

Data File Case Job x1 x2 x3 1 2 3 4 N Total
2 3 4 N Total Data file almost always has a var with category membership - almost never true multinomial setup

Lessons from the data file
Cases (N of them) are independent M(1,p), so E(xi,j) = pj. Column totals count the number of times each category occurs: Joint distribution is M(N,p) These are the table (cell) frequencies! They are random variables, and now we know their joint distribution. Each individual table frequency is B(N,pj) Expected value of frequency j is mj = Npj Tables of 2 and or more dimensions present no problems -- combination variables. Expected frequencies are important. Note the notation m_j.

More about the frequencies
We are in the familiar situation of estimating expected values with sample means. And these sample means are just sample proportions.

Simple Tools for Estimation
So the (multivariate) sample mean is an unbiased estimator of the vector of multinomial probabilities. The Law of Large numbers says CLT says multivariate sample mean has an approximate multivariate normal distribution for large N. Basis of large-sample tests and confidence intervals.

Maximum Likelihood Product of N probability mass functions, each
M(1,p) Depends upon the sample data only through the vector of k frequency counts. By the factorization theorem, a sufficient statistic All the information about the parameter in the sample data is contained in the sufficient statistic. Actually it’s minimal sufficient and complete -- no need to go there.

Following the book’s notation
Write the frequencies as x1, …, xk. Later, x values with multiple subscripts will refer to frequencies in a multi-dimensional table, like xi,j,k will be the frequency in row i and column j of sub-table k. Write likelihood function as To maximize likelihood function, must allow for the facts that the probabilities sum to one and frequencies sum to N -- this is easier than Lagrange multipliers.

Log likelihood: p-1 parameters
It’s unique, too, if no frequency equals zero. Set all k-1 derivatives to zero and solve for p1, …, pk. Verify that pi = xi /N for i = 1, … k–1 works: MLE is the sample mean.

Likelihood Ratio Tests
Under H0, G2 has an approximate chi-square distribution for large N. Degrees of freedom = number of (non-redundant, linear) equalities specified by H0. Reject when G2 is large. Need more detail about degrees of freedom.

Degrees of Freedom Express H0 as a set of linear combinations of the parameters, set equal to constants (usually zeros). Degrees of freedom = number of non-redundant linear combinations. df=3

p = (p1,p2,p3,p4,p5) H0: p1=0.25, p2=(p3+p4)/2,p4=p5 so df=3
H0: p1=1/5, p2=1/5, p3=1/5, p4=1/5, p5=1/5 so df=4 not 5, because probabilities add to one, so one equality is redundant. Matrix stuff is just there for completeness. If is a kx1 vector and H0: C = h where C is an rxk matrix, the degrees of freedom is the row rank (number of linearly independent rows) of C --- usually r. But remember, if = p for the multinomial, there are really k-1 parameters.

Example University administrators recognize that the percentage of students who are unemployed after graduation will vary depending upon economic conditions, but they claim that still, about twice as many students will be employed in a job related to their field of study, compared to those who get an unrelated job. To test this hypothesis, they select a random sample of 200 students from the most recent class, and observe 106 employed in a job related to their field of study, 74 employed in a job unrelated to their field of study, and 20 unemployed. Test the hypothesis using a large-sample likelihood ratio test and significance level = State your conclusions in symbols and words.

What is the model? What is the null hypothesis, in symbols? What are the degrees of freedom for this test?

What is the restricted MLE. Your answer is a symbolic expression
What is the restricted MLE? Your answer is a symbolic expression. It’s a vector. Show your work. Just one parameter in this restricted model. Makes sense because x_1/N -> 2p, x_2/N -> p Note H0 does not concern p3, so estimate is just the mean.

What is the unrestricted MLE
What is the unrestricted MLE? Your answer is a numeric vector: 3 numbers. What is the restricted MLE? Your answer is a numeric vector: 3 numbers. What are the estimated expected frequencies under the null hypothesis? Your answer is a numeric vector: 3 numbers. Notice the NATURAL way of estimating the expected frequencies. When our text says expected frequencies, they almost always mean estimated expected frequencies.

Calculate G2. Show your work.

Or, with R Note log is ln, Don't need tables, p-value is easy.

State your conclusions
In symbols: Reject H0: p1=2p2 at alpha = 0.05 In words: More graduates appear to be employed in jobs unrelated to their fields of study than expected. Of course that's estimated expected. STATEMENT IN WORDS IS VALUABLE! Obs-Exp is kind of like a residual in regression Statement in words is justified because Observed Expected Obs-Exp

For a general hypothesis about a multinomial
Summation is over all cells.

Two chi-square formulas
Likelihood Ratio Pearson Summation is over all cells By expected frequency, we mean estimated expected frequency. Asymptotically equivalent Same degrees of freedom Book's formula for df applies only to log-linear models. Use the approach given here, for now. X2 comes directly from CLT, G2 indirectly

Pearson Chi-square on the jobs data
Observed Expected Asymptotically equivalent -- this is N=200

Multinomial Distribution

Similar presentations

Presentation on theme: "Multinomial Distribution"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multinomial Distribution

Similar presentations

Presentation on theme: "Multinomial Distribution"— Presentation transcript:

Similar presentations

About project

Feedback