Download presentation
Presentation is loading. Please wait.
1
Discrete Structures for Computer Science
Presented By: Andrew F. Conn November 16th, 2016 Lecture #21: Conditional Probability
2
Conditional Probability
So far we have looked at independent probabilities, but sometimes events are dependent on one another. For example, the probability that you will die in a plane crash depends on whether or not you fly. The probability of the Steelers winning on any given Sunday depends on whether or not Ben Roethlisberger plays. These situations require us to consider new ways to compute probability.
3
Motivating Example We have two boxes. The first contains two green balls and seven red balls. The second contains four green balls and three red balls. Bob selects a ball by first choosing a box at random. He then selects one of the balls from that box at random. 1 2
4
Motivating Example Cont.
What is the probability that Bob chooses a red ball from box 1? Let πΈ= Bob chooses a red ball β πΈ = Bob chooses a green ball Let πΉ= Bob chooses box 1 β πΉ = Bob chooses box 2 Then the probability that Bob chooses a red ball from box 1 is: π πΉ βπ πΈ πΉ = 1 2 β 7 9 = 7 18 What is π πΈ πΉ ? You can read this as the probability that πΈ will happen given that πΉ has already happened.
5
A definition for conditional probability
What is the probability that Bob chooses a green ball from box 2? π πΈ β© πΉ =π πΉ βπ πΈ πΉ = 1 2 β 4 7 = 4 14 We can adapt use the above relationship to derive a general formula for π πΈ πΉ : Definition: Let πΈ and πΉ be events with π(πΉ)>0. The conditional probability of πΈ given πΉ, denoted π(πΈ|πΉ), is defined as: π πΈ πΉ = π πΈβ©πΉ π πΉ
6
Bob makes it tougherβ¦ What is the probability that Bob picked box 1 if Bob only knows he picked a red ball? We know π πΉβ©πΈ = 7 18 , but is that what this question is asking? No, we need to compute π πΉ πΈ From our definition we know that π πΉ πΈ = π πΉβ©πΈ π πΈ = π πΈβ©πΉ π πΈ How can we compute π πΈ ? Note that it is not just the overall probability of picking a red ball since the probability of picking a red ball is % of the time and % of the time. There is an idea! π πΈ =π πΉ βπ πΈ πΉ +π πΉ βπ(πΈ| πΉ ) We will explore this identity more in depth later.
7
Answering Bob We need to solve π πΉ πΈ = 7 18 π πΈ
We now strongly believe π πΈ =π πΉ βπ πΈ πΉ +π πΉ βπ(πΈ| πΉ ) We can easily compute: π πΉ = 1 2 π πΉ = 1 2 π πΈ πΉ = 7 9 π πΈ πΉ = 3 7 Then π πΈ = 1 2 β β 3 7 = 38 63 So finally we can determine that π πΉ πΈ = = 49 76
8
Bayesβ Theorem It turns out that Bobβs question is not that unusual.
Bayesβ Theorem allows us to relate the conditional and marginal probabilities of two random events. In English: Bayesβ Theorem will help us assess the probability that an event occurred given only partial evidence. Doesn't our formula for conditional probability do this already? Yes, but look at all the work we did!
9
Another Motivating Example
Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Question: Can we use our simple conditional probability formula? What is the probability π πΈ πΉ = π πΈβ©πΉ π πΉ X is a user X tested positive
10
The reasoning that we used in the last problem essentially derives Bayesβ Theorem for us!
Bayesβ Theorem: Suppose that πΈ and πΉ are events from some sample space π such that π πΈ >0 and π πΉ >0. Then: π πΉ πΈ = π πΈ πΉ π πΉ π πΈ πΉ π πΉ +π πΈ πΉ π πΉ Proof: π πΉ πΈ = π πΉβ©πΈ π πΈ , By definition π πΈ πΉ = π πΈβ©πΉ π(πΉ) , By definition βπ πΉ πΈ π πΈ =π πΈ πΉ π πΉ βπ πΉ πΈ = π(πΈ|πΉ)π πΉ π πΈ Notice that this gives us the numeratorβ¦
11
Proof (continued) Note: To finish, we must prove π(πΈ)=π(πΈ|πΉ)π(πΉ)+π(πΈ| πΉ )π( πΉ ) We used βintuitionβ to solve this earlierβ¦ Observe that πΈ=πΈβ©π =πΈβ©(πΉβͺ πΉ ) =(πΈβ©πΉ)βͺ(πΈβ© πΉ ) Note also that πΈβ©πΉ and πΈβ© πΉ are disjoint. This means that π(πΈ)=π(πΈβ©πΉ)+π(πΈβ© πΉ ) We already have shown that π(πΈβ©πΉ)=π(πΈ|πΉ)π(πΉ) Furthermore, note that π πΈβ© πΉ =π πΈ πΉ π( πΉ ) So π(πΈ)=π(πΈβ©πΉ)+π(πΈβ© πΉ )=π(πΈ|πΉ)π(πΉ)+π(πΈ| πΉ )π( πΉ ) Putting everything together, we get: π πΉ πΈ = π πΈ πΉ π πΉ π πΈ πΉ π πΉ +π πΈ πΉ π πΉ
12
Probability that X is an opium user given a positive test
The 1,000 foot viewβ¦ In situations like the drug testing problem, Bayesβ theorem can help! Essentially, Bayesβ theorem will allow us to calculate π πΈ πΉ assuming that we know (or can derive): Probability that X is a user: π(πΈ) The probability that the test yields a true positive: π πΉ πΈ The probability that the test yields a false positive: π πΉ πΈ ) Returning to our earlier example: Let πΈ= βPerson X is an opium userβ Let πΉ= βPerson X tested positive for opiumβ It looks like Bayesβ Theorem could help in this caseβ¦ Probability that X is an opium user given a positive test
13
And why is this useful? In a nutshell, Bayesβ Thereom is useful if you want to find p(F|E), but you donβt know p(E β© F) or p(E).
14
Example: Pants and Skirts
Suppose there is a co-ed school having 60% boys and 40% girls as students. The girl students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all they can see is that this student is wearing trousers. What is the probability this student is a girl? Step 1: Set up events πΈ= βX is wearing pantsβ πΈ = βX is wearing a skirtβ πΉ= βX is a girlβ πΉ = X is a boy Step 2: Extract probabilities from problem definition π πΉ =0.4 π( πΉ )=0.6 π πΈ πΉ =π πΈ πΉ =0.5 π πΈ πΉ =1 Note at least 20% chance
15
Pants and Skirts (continued)
π πΉ πΈ = π πΈ πΉ π(πΉ) π πΈ πΉ π πΉ +π πΈ πΉ π( πΉ ) Step 3: Plug in to Bayesβ Theorem π πΉ πΈ = 0.5Γ Γ0.4+1Γ0.6 = = 1 4 Conclusion: There is a 25% chance that the person seen was a girl, given that they were wearing pants. Recall: π πΉ =0.4 π πΉ =0.6 π πΈ πΉ =π πΈ πΉ =0.5 π πΈ πΉ =1
16
Drug screening, revisited
Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Step 1: Set up events πΉ= βX is an opium userβ πΉ = βX is not an opium userβ πΈ= βX tests positive for opiatesβ πΈ = X tests negative for opiates Step 2: Extract probabilities from problem definition π(πΉ)=0.005 π( πΉ )=0.995 π πΈ πΉ =0.99 π(πΈ| πΉ )=0.01
17
Drug screening (continued)
π πΉ πΈ = π πΈ πΉ π(πΉ) π πΈ πΉ π πΉ +π πΈ πΉ π( πΉ ) Step 3: Plug in to Bayesβ Theorem π πΉ πΈ = 0.99 Γ Γ Γ 0.995 =0.3322 Conclusion: If an employee tests positive for opiate use, there is only a 33% chance that they are actually an opium user! Recall: π(πΉ) = 0.005 π( πΉ ) = 0.995 π(πΈ|πΉ) = 0.99 π(πΈ| πΉ ) = 0.01 1% of 99.5% of clean employees that are misclassified is MUCH greater than the 99% of .5% employees that are users
18
Group Work! Suppose that 1 person in 100,000 has a particular rare disease. A diagnostic test is correct 99% of the time when given to someone with the disease, and is correct 99.5% of the time when given to someone without the disease. Problem 1: Calculate the probability that someone who tests positive for the disease actually has it. Problem 2: Calculate the probability that someone who tests negative for the disease does not have the disease. F = has disease, FC = no disease E = test positive, EC = test negative p(F) = 1/100,000 = P(FC) = p(E|F) = 0.99 P(E|FC) = 0.005 1: want p(F|E) = p(E|F)p(F)/[ p(E|F)p(F) + p(E|FC)p(FC) ] = .99*.00001/[.99* *.99999] = approx 0.002 2: want p(FC|EC) = p(EC|FC)p(FC)/[ p(EC|FC)p(FC) + p(EC|F)p(F) ] = .995*.99999/[ .995* * ] = approx Conclusion: good for weeding people out, not so good for telling that people are sick
19
Application: Spam filtering
Definition: Spam is unsolicited bulk In recent years, spam has become increasingly problematic. For example, in 2015, Spam accounted for ~50% of all messages sent. To combat this problem, people have developed spam filters based on Bayesβ theorem! I didnβt ask for it, I probably donβt want it Sent to lots of peopleβ¦
20
How does a Bayesian spam filter work?
Essentially, these filters determine the probability that a message is spam, given that it contains certain keywords. π πΉ πΈ = π πΈ πΉ π(πΉ) π πΈ πΉ π πΉ +π πΈ πΉ π( πΉ ) In the above equation: p(E|F) = Probability that our keyword occurs in spam messages p(E|FC) = Probability that our keyword occurs in legitimate messages p(F) = Probability that an arbitrary message is spam p(FC) = Probability that an arbitrary message is legitimate Question: How do we derive these parameters? Message is spam Message contains questionable keyword If above a certain threshold, toss the message
21
We can learn these parameters by examining historical email traces
Imagine that we have a corpus of messagesβ¦ We can ask a few intelligent questions to learn the parameters of our Bayesian filter: How many of these messages do we consider spam? p(F) In the spam messages, how often does our keyword appear? p(E|F) In the good messages, how often does our keyword appear? p(E|FC) Aside: This is what happens every time you click the βmark as spamβ button in your client! Given this information, we can apply Bayesβ theorem!
22
Filtering spam using a single keyword
Suppose that the keyword βRolexβ occurs in 250 of 2000 known spam messages, and in 5 of 1000 known good messages. Estimate the probability that an incoming message containing the word βRolexβ is spam, assuming that it is equally likely that an incoming message is spam or not spam. If our threshold for classifying a message as spam is 0.9, will we reject this message? Step 1: Define events πΉ= βmessage is spamβ πΉ = βmessage is goodβ πΈ= βmessage contains the keyword βRolexββ πΈ = βmessage does not contain the keyword βRolexββ Step 2: Gather probabilities from the problem statement π(πΉ)=π( πΉ )=0.5 π(πΈ|πΉ)= =0.125 π(πΈ| πΉ )= =0.005
23
Spam Rolexes (continued)
π πΉ πΈ = π πΈ πΉ π(πΉ) π πΈ πΉ π πΉ +π πΈ πΉ π( πΉ ) Step 3: Plug in to Bayesβ Theorem π(πΉ|πΈ)= Γ Γ Γ 0.5 = β 0.962 Conclusion: Since the probability that our message is spam given that it contains the string βRolexβ is approximately > 0.9, we will discard the message. Recall: π(πΉ)=π( πΉ )=0.5 π(πΈ|πΉ)=0.125 π(πΈ| πΉ )=0.005
24
Problems with this simple filter
How would you choose a single keyword/phrase to use? βAll naturalβ βNigeriaβ βClick hereβ β¦ Users get upset if false positives occur, i.e., if legitimate messages are incorrectly classified as spam When was the last time you checked your spam folder? How can we fix this? Choose keywords s.t. p(spam | keyword) is very high or very low Filter based on multiple keywords
25
Specifically, we want to develop a Bayesian filter that tells us π(πΉ| πΈ 1 β© πΈ 2 )
First, some assumptions Events πΈ1 and πΈ2 are independent The events πΈ1|π and πΈ2|π are independent π(πΉ)=π( πΉ )=0.5 Now, letβs derive formula for this π(πΉ| πΈ 1 β© πΈ 2 ) By Bayesβ theorem π πΉ πΈ 1 β© πΈ 2 = π πΈ 1 β© πΈ 2 |πΉ π πΉ π πΈ 1 β© πΈ 2 |πΉ π πΉ +π πΈ 1 β© πΈ 2 πΉ π πΉ Assumption 3 These assumptions my cause errors, but weβll assume that theyβre small = π πΈ 1 β© πΈ 2 |πΉ π πΈ 1 β© πΈ 2 |πΉ +π πΈ 1 β© πΈ 2 πΉ Assumptions 1 and 2 = π πΈ 1 |πΉ π πΈ 2 |πΉ π πΈ 1 |πΉ π πΈ 2 |πΉ +π πΈ 1 πΉ π πΈ 2 | πΉ
26
Spam filtering on two keywords
Suppose that we train a Bayesian spam filter on a set of 2000 spam messages and 1000 messages that are not spam. The word βstockβ appears in 400 spam messages and 60 good messages, and the word βundervaluedβ appears in 200 spam messages and 25 good messages. Estimate the probability that a message containing the words βstockβ and βundervaluedβ is spam. Will we reject this message if our spam threshold is set at 0.9? Step 1: Set up events πΉ= βmessage is spamβ, πΉ = βmessage is goodβ πΈ 1 = βmessage contains the word βstockββ πΈ 2 = βmessage contains the word βundervaluedββ Step 2: Identify probabilities π πΈ 1 πΉ = =0.2, π( πΈ 1 | πΉ )= =0.06 π πΈ 2 πΉ = =0.1, π( πΈ 2 | πΉ )= =0.025
27
Two keywords (continued)
π πΉ πΈ 1 β© πΈ 2 = π πΈ 1 πΉ π πΈ 2 πΉ π πΈ 1 πΉ π πΈ 2 πΉ +π πΈ 1 πΉ π πΈ 2 πΉ Step 3: Plug in to Bayesβ Theorem p(F|E1 β© E2) = (0.2 Γ 0.1)/(0.2 Γ Γ 0.025) = 0.02/( ) β Conclusion: Since the probability that our message is spam given that it contains the strings βstockβ and βundervaluedβ is β > 0.9, we will reject this message. Recall: π( πΈ 1 |πΉ)=0.2 π( πΈ 1 | πΉ )=0.06 π( πΈ 2 |πΉ)=0.1 π( πΈ 2 | πΉ )=0.025 Also claim that Bayesian filters are not a panacea
28
Consider: π πΉ πΈ 1 = π πΈ 1 πΉ π(πΉ) π πΈ 1 πΉ π πΉ +π πΈ 1 πΉ π( πΉ ) = .2Γ.5 .2Γ.5+.06Γ.5 =.76923β¦ π πΉ πΈ 2 = π πΈ 2 πΉ π(πΉ) π πΈ 2 πΉ π πΉ +π πΈ 2 πΉ π( πΉ ) = .1Γ.5 .1Γ Γ.5 =.8 So there is a 77% chance that a message containing βstockβ is spam and a 80% chance that a message containing βundervaluedβ is spam. Neither word is enough to categorize the message as spam alone!
29
Final Thoughts Conditional probability is very useful Bayesβ theorem
Helps us assess conditional probabilities Has a range of important applications Next time: Relations!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.