Chapter 3 Discrete Random Variables and Probability Distributions 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution
Discrete random variable X Examples: shoe size, dosage (mg), # cells,… Recall… POPULATION Discrete random variable X Examples: shoe size, dosage (mg), # cells,… Pop values x Probabilities p(x) Cumul Probs F (x) x1 p(x1) x2 p(x2) p(x1) + p(x2) x3 p(x3) p(x1) + p(x2) + p(x3) ⋮ 1 Total X Total Area = 1 Mean Variance
~ The Binomial Distribution ~ Used only when dealing with binary outcomes (two categories: “Success” vs. “Failure”), with a fixed probability of Success () in the population. Calculates the probability of obtaining any given number of Successes in a random sample of n independent “Bernoulli trials.” Has many applications and generalizations, e.g., multiple categories, variable probability of Success, etc.
How can we calculate the probability of POPULATION 40% Male, 60% Female For any randomly selected individual, define a binary random variable: RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) x p(x) x1 p(x1) x2 p(x2) x3 p(x3) ⋮ 1 F(x) F(x1) F(x2) ⋮ 1 How can we calculate the probability of How can we calculate the probability of P(X = x), for x = 0, 1, 2, 3, …,100? p(x) = P(X = x), for x = 0, 1, 2, 3, …,100? P(X = 0), P(X = 1), P(X = 2), …, P(X = 99), P(X = 100)? p(x) = F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100?
How can we calculate the probability of POPULATION 40% Male, 60% Female For any randomly selected individual, define a binary random variable: RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) Example: How can we calculate the probability of F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100? p(25) = P(X = 25)? P(X = x), for x = 0, 1, 2, 3, …,100? p(x) = Solution: Solution: Model the sample as a sequence of independent coin tosses, with 1 = Heads (Male), 0 = Tails (Female), where P(H) = 0.4, P(T) = 0.6 .… etc….
How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 3 4 5 . . . . . . 97 98 99 100 … X = 25 Heads: { H1, H2, H3,…, H25 } HOWEVER… permutations of 25 among 100 There are 100 possible open slots for H1 to occupy. For each one of them, there are 99 possible open slots left for H2 to occupy. For each one of them, there are 98 possible open slots left for H3 to occupy. …etc…etc…etc… For each one of them, there are 77 possible open slots left for H24 to occupy. For each one of them, there are 76 possible open slots left for H25 to occupy. Hence, there are ?????????????????????? possible outcomes. 100 99 98 … 77 76 This value is the number of permutations of the coins, denoted 100P25.
How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 3 4 5 . . . . . . 97 98 99 100 X = 25 Heads: { H1, H2, H3,…, H25 } 100 99 98 … 77 76 HOWEVER… permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. For example: We would not want to count this as a distinct outcome. 1 2 3 4 5 . . . . . . 97 98 99 100
How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 3 4 5 . . . . . . 97 98 99 100 X = 25 Heads: { H1, H2, H3,…, H25 } 100 99 98 … 77 76 HOWEVER… permutations of 25 among 100 This number unnecessarily includes the distinct permutations of the 25 among themselves, all of which have Heads in the same positions. How many is that? By the same logic…... 25 24 23 … 3 2 1 “25 factorial” - denoted 25! 100 99 98 … 77 76 25 24 23 … 3 2 1 100!_ 25! 75! = R: choose(100, 25) Calculator: 100 nCr 25 “100-choose-25” - denoted or 100C25 This value counts the number of combinations of 25 Heads among 100 coins.
How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 3 4 5 . . . . . . 97 98 99 100 0.4 0.6 . . . . . . Answer: What is the probability of each such outcome? Recall that, per toss, P(Heads) = = 0.4 P(Tails) = 1 – = 0.6 Answer: Via independence in binary outcomes between any two coins, 0.4 0.6 0.6 0.4 0.6 … 0.6 0.4 0.4 0.6 = . Therefore, the probability P(X = 25) is equal to……. R: dbinom(25, 100, .4)
How many possible outcomes of n = 100 tosses exist with X = 25 Heads? 3 4 5 . . . . . . 97 98 99 100 0.5 . . . . . . 0.4 0.6 . . . . . . Answer: This is the “equally likely” scenario! What is the probability of each such outcome? Recall that, per toss, P(Heads) = = 0.4 P(Tails) = 1 – = 0.6 = 0.5 1 – = 0.5 Answer: Via independence in binary outcomes between any two coins, 0.4 0.6 0.6 0.4 0.6 … 0.6 0.4 0.4 0.6 = . 0.5 0.5 0.5 0.5 0.5 … 0.5 0.5 0.5 0.5 = Therefore, the probability P(X = 25) is equal to……. Question: What if the coin were “fair” (unbiased), i.e., = 1 – = 0.5 ?
independent, with constant probability () per trial POPULATION 40% Male, 60% Female For any randomly selected individual, define a binary random variable: “Success” vs. “Failure” “Failure” “Success” 1 – RANDOMSAMPLE n = 100 Discrete random variable X = # Males in sample (0, 1, 2, 3, …, n) Discrete random variable X = # Males in sample (0, 1, 2, 3, …, 99, 100) Discrete random variable X = # “Successes” in sample (0, 1, 2, 3, …, n) size n Example: What is the probability P(X = 25)? F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100? x x = 0, 1, 2, 3, …,100 n Solution: Model the sample as a sequence of n = 100 independent coin tosses, with 1 = Heads (Male), 0 = Tails (Female). Solution: n Bernoulli trials with P(“Success”) = , P(“Failure”) = 1 – . independent, with constant probability () per trial Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability mass function” p(x) = , x = 0, 1, 2, …, n. .… etc….
Example: Blood Type probabilities, revisited Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Check: 1. Independent outcomes? Reasonably assume that outcomes “Type O” vs. “Not Type O” between two individuals are independent of each other. 2. Constant probability ? Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) From table, = P(Type O) = .461 throughout population. Binomial model applies?
Example: Blood Type probabilities, revisited p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 0.00207 0.00207 1 (.461)1 (.539)9 = 0.01770 0.01977 2 (.461)2 (.539)8 = 0.06813 0.08790 3 (.461)3 (.539)7 = 0.15538 0.24328 4 (.461)4 (.539)6 = 0.23257 0.47585 5 (.461)5 (.539)5 = 0.23870 0.71455 6 (.461)6 (.539)4 = 0.17013 0.88468 7 (.461)7 (.539)3 = 0.08315 0.96783 8 (.461)8 (.539)2 = 0.02667 0.99450 9 (.461)9 (.539)1 = 0.00507 0.99957 10 (.461)10 (.539)0 = 0.00043 1.00000 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461)
Example: Blood Type probabilities, revisited p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 0.00207 0.00207 1 (.461)1 (.539)9 = 0.01770 0.01977 2 (.461)2 (.539)8 = 0.06813 0.08790 3 (.461)3 (.539)7 = 0.15538 0.24328 4 (.461)4 (.539)6 = 0.23257 0.47585 5 (.461)5 (.539)5 = 0.23870 0.71455 6 (.461)6 (.539)4 = 0.17013 0.88468 7 (.461)7 (.539)3 = 0.08315 0.96783 8 (.461)8 (.539)2 = 0.02667 0.99450 9 (.461)9 (.539)1 = 0.00507 0.99957 10 (.461)10 (.539)0 = 0.00043 1.00000 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461)
n = 10 p = .461 pmf = function(x)(dbinom(x, n, p)) N = 100000 x = 0:10 bin.dat = rep(x, N*pmf(x)) hist(bin.dat, freq = F, breaks = c(-.5, x+.5), col = "green") axis(1, at = x) axis(2)
Example: Blood Type probabilities, revisited p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 0.00207 0.00207 1 (.461)1 (.539)9 = 0.01770 0.01977 2 (.461)2 (.539)8 = 0.06813 0.08790 3 (.461)3 (.539)7 = 0.15538 0.24328 4 (.461)4 (.539)6 = 0.23257 0.47585 5 (.461)5 (.539)5 = 0.23870 0.71455 6 (.461)6 (.539)4 = 0.17013 0.88468 7 (.461)7 (.539)3 = 0.08315 0.96783 8 (.461)8 (.539)2 = 0.02667 0.99450 9 (.461)9 (.539)1 = 0.00507 0.99957 10 (.461)10 (.539)0 = 0.00043 1.00000 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) Also, can show mean = x p(x) = and variance 2 = (x – ) 2 p(x) = n = 4.61 = (10)(.461) n (1 – ) = 2.48
Example: Blood Type probabilities, revisited p(x) = (.461)x (.539)10 – x Example: Blood Type probabilities, revisited R: dbinom(0:10, 10, .461) Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 x p(x) F (x) (.461)0 (.539)10 = 0.00207 0.00207 1 (.461)1 (.539)9 = 0.01770 0.01977 2 (.461)2 (.539)8 = 0.06813 0.08790 3 (.461)3 (.539)7 = 0.15538 0.24328 4 (.461)4 (.539)6 = 0.23257 0.47585 5 (.461)5 (.539)5 = 0.23870 0.71455 6 (.461)6 (.539)4 = 0.17013 0.88468 7 (.461)7 (.539)3 = 0.08315 0.96783 8 (.461)8 (.539)2 = 0.02667 0.99450 9 (.461)9 (.539)1 = 0.00507 0.99957 10 (.461)10 (.539)0 = 0.00043 1.00000 Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type O) Binomial model applies. X ~ Bin(10, .461) Also, can show mean = x p(x) = and variance 2 = (x – ) 2 p(x) = n = 4.61 n (1 – ) = 2.48
Example: Blood Type probabilities, revisited Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. RARE EVENT! Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) n = 1500 individuals are to Binomial model applies. X ~ Bin(10, .461) X ~ Bin(1500, .007) Also, can show mean = x p(x) = and variance 2 = (x – ) 2 p(x) = n = 10.5 n (1 – ) 2.48 = 10.43
Example: Blood Type probabilities, revisited Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? RARE EVENT! Long positive skew as x 1500 …but contribution 0
Chapter 3 Discrete Random Variables and Probability Distributions 3.2 - Probability Distributions for Discrete Random Variables 3.3 - Expected Values 3.4 - The Binomial Probability Distribution 3.5 - Hypergeometric and Negative Binomial Distributions 3.6 - The Poisson Probability Distribution
Example: Blood Type probabilities, revisited Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? Poisson distribution x = 0, 1, 2, …, where mean and variance are = n and 2 = n RARE EVENT! Suppose n = 10 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) n = 1500 individuals are to = 10.5 Binomial model applies. X ~ Bin(1500, .007) X ~ Poisson(10.5) Also, can show mean = x p(x) = and variance 2 = (x – ) 2 p(x) = n = 10.5 Notation: Sometimes the symbol (“lambda”) is used instead of (“mu”). n (1 – ) = 10.43
Example: Blood Type probabilities, revisited Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Rh Factor Blood Type + – O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 .833 .166 .999 Therefore, p(x) = x = 0, 1, 2, …, 1500. Is there a better alternative? Poisson distribution x = 0, 1, 2, …, where mean and variance are = n and 2 = n RARE EVENT! Suppose n = 1500 individuals are to be selected at random from the population. Probability table for X = #(Type AB–) = 10.5 X ~ Poisson(10.5) Ex: Probability of exactly X = 15 Type(AB–) individuals = ? Poisson: Binomial: (both ≈ .0437)
Example: Deaths in Wisconsin
Example: Deaths in Wisconsin Assuming deaths among young adults are relatively rare, we know the following: Average 584 deaths per year λ = Mortality rate (α) seems constant. Therefore, the Poisson distribution can be used as a good model to make future predictions about the random variable X = “# deaths” per year, for this population (15-24 yrs)… assuming current values will still apply. Probability of exactly X = 600 deaths next year P(X = 600) = 0.0131 R: dpois(600, 584) Probability of exactly X = 1200 deaths in the next two years Mean of 584 deaths per yr Mean of 1168 deaths per two yrs, so let λ = 1168: P(X = 1200) = 0.00746 Probability of at least one death per day: λ = = 1.6 deaths/day P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + … True, but not practical. P(X ≥ 1) = 1 – P(X = 0) = 1 – = 1 – e–1.6 = 0.798
Classical Discrete Probability Distributions Binomial ~ X = # Successes in n trials, P(Success) = Poisson ~ As above, but n large, small, i.e., Success RARE Negative Binomial ~ X = # trials for k Successes, P(Success) = Geometric ~ As above, but specialized to k = 1 Hypergeometric ~ As Binomial, but changes between trials Multinomial ~ As Binomial, but for multiple categories, with 1 + 2 + … + last = 1 and x1 + x2 + … + xlast = n