Lecture 2. The Binomial Distribution Outlines for Today The Equi-probability Model The Binomial Distribution 11/17/2018 SA3202, Lecture 2
The Equi-probability Model: the model postulating “all the categories are equally likely”. For the Random Number Data, Question of Interest: Are the “Random Digits” generated by the calculator genuinely random? This is equivalent to test if the digits follow an equi-probability model. Number of possible digits: Expected frequency for a digit: Observed frequency for a digit: 7 8 8 15 13 11 12 8 5 13 (E-O)^2/E: .9 .4 .4 2.5 .9 .1 .4 .4 2.5 .9 9.4 2*O*log(O/E): 9.6436 df of the Pearson’s goodness of fit test statistic: df of the Wilk’s likelihood ratio test statistic: Table values: Conclusions: 11/17/2018 SA3202, Lecture 2
The Binomial Distribution The binomial distribution is an important distribution in categorical data analysis. Bernoulli Trial: a trial with two different outcomes: “success”(S) and “failure”(F) e.g. Let p be the probability of the event “success”. Denote X be a random variable which takes value 1 when the Bernoulli trial is “success”, and 0 when “failure”. That is, P(X=1)=P(“S”)=p, P(X=0)=P(“F”)=1-p=q. X is called the Bernoulli variable, denoted as X~ Binom(1,p). 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Binomial R.V. the sum of several Beroulli random variables X1,X2,…Xn, i.e. S=X1+X2+…+Xn~Binom(n,p) with index n and parameter p. A Beroulli r.v. is a special binomial r.v. with index 1 and parameter p. Binomial Distribution the distribution of a binomial r.v. : P(X=k)=P( k “success” in n Bernoulli trials)= where q=1-p and is the binomial coefficients. 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Example 1: Assume there are about 5% of the students in NUS who are smokers. Let X be 1 if a student is smoker and 0 otherwise. Then X~Binom( ). Let Y be the number of smokers in a class which has 50 students. Then Y~Binom( ). Example 2: Assume there are about 35% of the Singaporean who have visited US. Let X be 1 if a Singaporean who has visited US and 0 otherwise. Then X~Binom( ). Let Y be the number of Singaporean in a community which has 2000 Singaporean. Then Y~Binom( ). The Dual Interpretation of p: (a). The “population proportion” that possesses some property (b). The “probability” that a member chosen at random possesses some property 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Expectation and Variance of X~Binom(n,p) E(X)=np, Var(X)=npq Interpretation of the mean E(X)=np The Expected Frequency= # of Repeats x Probability The Expected Frequency = Sample Size x Population Proportion e.g. When a fair coin is tossed 100 times, the expected # of times that a “head” appears is e.g. If the failure proportion is 5% for a course, then it is expected that there are students in the class who has 100 students have failed the course. 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Asymptotic Normality X~Binom(n,p) X~AN(np,npq) provided p is not very close to 0 or 1. 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Estimation of p Given X~Binom(n,p), it is natural to estimate p by the sample proportion p=X/n. Expectation and Variance of the sample Proportion E( )=p, Var( )=pq/n The Estimated Variance and the Standard Deviation (s.e.) of p^: Var( )=p q/n, s.e.( )=(p q/n)^(1/2). 11/17/2018 SA3202, Lecture 2
The Binomial Distribution Asymptotic Normality of the sample Proportion ~ AN(p, pq/n) Confidence Interval: an interval that contains a true parameter with a given confidence level. the estimate table value x s.e. (the estimate) The table value is determined by the distribution of the estimate and the given confidence level. 11/17/2018 SA3202, Lecture 2
The Binomial Distribution The Approximation CI for p : Example In a sample of 200 students, 120 are in favor of a particular proposal. Then The estimated proportion= The estimated s.e. = The CI of 95% is 11/17/2018 SA3202, Lecture 2