Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons
From last lecture found that variations in the product yield were significantly related to runny feed One solution is to find a way to identify runny feed before it was fed into the process and avoid it.
Runnyfeedometer TM Image from You develop an offline tool to detect runny feed using a cone and plate viscometer. The test is inexpensive, but not always accurate due to inhomogeneous feed. You have a more accurate way of measuring runny feed but it is slow and expensive, so maybe you can get away with multiple reads on the Runnyfeedometer TM ? Experimental Data: 100 known runny and 100 known normal samples tested in the Runnyfeedometer TM P(+ test | runny) = 98:100 P(- test | runny) = 2:100 P(+ test | normal) = 3:100 P(- test | normal) = 97:100 True positive False negative False positive True negative What are the odds that 9 in 10 tests on a runny sample would all come back positive?
P(+ test | runny) = 98:100 P(- test | runny) = 2:100 Question: What are the odds that 9 in 10 tests on a runny sample would all come back positive? 10 combinations Probability of a particular outcome (0.98)*(0.98)*(0.98)*(0.98)*(0.98)* (0.98)*(0.98)*(0.98)*(0.98)*(0.02) Overall probability= probability of a particular outcome* # combinations= 10*(0.98) 9 (0.02) 1 = Possible results: {+,+,+,+,+,+,+,+,+,-} {+,+,+,+,+,+,+,+,-,+} {+,+,+,+,+,+,+,-,+,+} {+,+,+,+,+,+,-,+,+,+} {+,+,+,+,+,-,+,+,+,+} {+,+,+,+,-,+,+,+,+,+} {+,+,+,-,+,+,+,+,+,+} {+,+,-,+,+,+,+,+,+,+} {+,-,+,+,+,+,+,+,+,+} {-,+,+,+,+,+,+,+,+,+} Note: hard to list if 2 or more fail..
In our case: P(+ test | runny) = 98:100 = p P(- test | runny) = 2:100 = (1-p) Binomial Distribution Describes the probability of obtaining k events from N independent samples of a binary outcome with known probability. Examples: Odds of getting 20 heads from 30 coin tosses Odds of finding 3 broken bolts in a box of 100
In Mathematica Probability of exactly 5 heads out of 10 tosses Probability of 0-5 heads out of 10 tosses Probability test: What are the odds of getting 5 heads out of 10 coin tosses? (a) 25% (b) 50% (c) 62%
Probability of exactly 5 heads out of 10 tosses Probability of 0-5 heads out of 10 tosses Probability test: What are the odds of getting 5 heads out of 10 tosses? Note axes are off by 1 25% 62% (a) 25% (b) 50% (c) 62% =5 Okay No ≤5 Okay
Runnyfeedometer TM Image from P(+ test | runny) = 98:100 P(- test | runny) = 2:100 P(+ test | normal) = 3:100 P(- test | normal) = 97:100 Given these data what acceptance sampling criteria would be required to correctly identify a normal sample with 99.99% confidence? Example acceptance sampling criteria: Accept sample if from 10 samples, 3 or fewer test positive Translation: We want the following P(normal | 3 or fewer positive results from 10 tests) Using our binomial distribution we can calculate a related quantity (0 in 10 positive: very likely normal, 10 in 10: very likely runny)
x P(x) Using our binomial distribution we can calculate a related quantity P(3 or fewer positive results from 10 tests | normal) Where i=# of positive results p= probability of a positive result given a normal feed=0.03 If normal will get ≤3 positive tests with 99% probability! Not the same! Translation: We want the following P(normal | 3 or fewer positive results from 10 tests)
1. Joint Probability 2. Conditional Probability 3. Marginalization Three Probability Definitions
1. Joint Probability Three Probability Definitions What is the probability of drawing an ace first and then a jack from a deck of 52 cards? What is the probability of a protein being highly expressed and phosphorylated? What is the probability that valves A and B both fail? (# highly expressed and phosphorylated proteins)/(total proteins) (# times A & B fail) (total observations)
2. Conditional Probability Three Probability Definitions What is the probability of drawing an ace given that you just drew a jack from a deck of 52 cards? What is the probability of a protein being highly expressed given that it is phosphorylated? What is the probability that valve A fails given that B has failed? (# highly expressed phosphorylated proteins)/(total phosphorylated proteins) (# times A & B fail) (total observations where B fails)
3. Marginalization Three Probability Definitions What is the probability of drawing an ace given that you just drew one other card from a deck of 52 cards?
in general if independent Probability Algebra Bayes’ Rule
We want the following P(normal | 3 or fewer positive results from 10 tests) Bayes’ Rule P(normal | 3 or fewer positive results from 10 tests)= P(3 or fewer positive results from 10 tests | normal) P(normal) P(3 or fewer positive results from 10 tests) Marginalize Binomial distribution Prior
P(3 or fewer positive results from 10 tests | normal): P(normal): from prior observations, what are the odds of getting a batch of normal feed? From previous data found normal feed in 19 of 25 samples, so a first approximation could be 0.76 P(normal | 3 or fewer positive results from 10 tests)= P(3 or fewer positive results from 10 tests | normal) P(normal) P(3 or fewer positive results from 10 tests) =0.9998
P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal =P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal) P(≤3 of 10 positive | runny) P(+ test | runny) = 98:100 ~0% of the time will a runny sample yield ≤3 pos. P(runny)=1-P(normal) = 0.24 P(normal | 3 or fewer positive results from 10 tests)= P(3 or fewer positive results from 10 tests | normal) P(normal) P(3 or fewer positive results from 10 tests)
P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal =(0)(0.24)+(0.9998)(0.76)= P(runny | 3 or fewer positive results from 10 tests)= (0.9998) (0.76)= Acceptance sampling criteria will identify runny feeds essentially 100% of the time.. May be too strict! =P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal) P(normal | 3 or fewer positive results from 10 tests)= P(3 or fewer positive results from 10 tests | normal) P(normal) P(3 or fewer positive results from 10 tests)
Test different acceptance sampling criteria: Acceptance sampling criteria will identify normal feeds >99.99% of the time Remember: 0 in 10 positive: very likely normal 10 in 10 positive: very likely runny 0 to 10 positive: no information --> 0 to 6 positive: likely normal
Runnyfeedometer TM Image from Analysis result: If ≤6 of 10 samples report positive then I am >99.99% sure the feed is normal. Acceptance criteria: If ≤6 of 10 tests are positive, use feed, otherwise reject feed. Q: What are the odds of rejecting normal feed? P(normal | 7 or more positive results from 10 tests)= P(7 or more positive results from 10 tests | normal) P(normal) P(7 or more positive results from 10 tests) Very rarely..
Take Home Messages Acceptance sampling provides an easy to implement way to eliminate variation Basic probability rules like Bayes Rule help to rearrange your expressions to get to things you can solve.