Discrete Structures for Computer Science

Slides:

Advertisements

Similar presentations

Chapter 7 Hypothesis Testing

Advertisements

Discrete Probability Chapter 7.

Chapter 10: Estimating with Confidence

COUNTING AND PROBABILITY

Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.

CSC-2259 Discrete Structures

22C:19 Discrete Structures Discrete Probability Fall 2014 Sukumar Ghosh.

1 Copyright M.R.K. Krishna Rao 2003 Chapter 5. Discrete Probability Everything you have learned about counting constitutes the basis for computing the.

Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.

Chapter 6 Lecture 3 Sections: 6.4 – 6.5.

Hypotheses tests for means

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.

6.3 Bayes Theorem. We can use Bayes Theorem… …when we know some conditional probabilities, but wish to know others. For example: We know P(test positive|have.

22C:19 Discrete Structures Discrete Probability Spring 2014 Sukumar Ghosh.

Chapter 21: More About Tests

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.

1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.

3/7/20161 Now it’s time to look at… Discrete Probability.

Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Section 7.3. Why we need Bayes?  How to assess the probability that a particular event occurs on the basis of partial evidence.  The probability p(F)

1 COMP2121 Discrete Mathematics Principle of Inclusion and Exclusion Probability Hubert Chan (Chapters 7.4, 7.5, 6) [O1 Abstract Concepts] [O3 Basic Analysis.

Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4

Now it’s time to look at…

Chapter 8: Estimating with Confidence

Naïve Bayes CSE651C, B. Ramamurthy 6/28/2014.

Chapter 8: Estimating with Confidence

Probability --- Part b)

ICS 253: Discrete Structures I

Probability Axioms and Formulas

Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.

What Is a Test of Significance?

Testing Hypotheses about Proportions

COMP61011 : Machine Learning Probabilistic Models + Bayes’ Theorem

What is Probability? Quantification of uncertainty.

Chapter 21 More About Tests.

Testing Hypotheses About Proportions

Chapter 5 Sampling Distributions

Chapter 5 Sampling Distributions

Overview and Basics of Hypothesis Testing

Hypothesis Tests for a Population Mean,

More about Tests and Intervals

Week 11 Chapter 17. Testing Hypotheses about Proportions

Naïve Bayes CSE487/587 Spring /17/2018.

Naïve Bayes CSE651 6/7/2014.

Testing Hypotheses about Proportions

More About Tests and Intervals

Hypothesis Testing Two Proportions

Chapter 5 Sampling Distributions

Honors Statistics From Randomness to Probability

Now it’s time to look at…

Chapter 10: Estimating with Confidence

Testing Hypotheses About Proportions

Chapter 8: Estimating with Confidence

COUNTING AND PROBABILITY

Naïve Bayes CSE487/587 Spring2017 4/4/2019.

Chapter 8: Estimating with Confidence

Now it’s time to look at…

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Now it’s time to look at…

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Presentation transcript:

Discrete Structures for Computer Science Presented By: Andrew F. Conn November 16th, 2016 Lecture #21: Conditional Probability

Conditional Probability So far we have looked at independent probabilities, but sometimes events are dependent on one another. For example, the probability that you will die in a plane crash depends on whether or not you fly. The probability of the Steelers winning on any given Sunday depends on whether or not Ben Roethlisberger plays. These situations require us to consider new ways to compute probability.

Motivating Example We have two boxes. The first contains two green balls and seven red balls. The second contains four green balls and three red balls. Bob selects a ball by first choosing a box at random. He then selects one of the balls from that box at random. 1 2

Motivating Example Cont. What is the probability that Bob chooses a red ball from box 1? Let 𝐸= Bob chooses a red ball ⇒ 𝐸 = Bob chooses a green ball Let 𝐹= Bob chooses box 1 ⇒ 𝐹 = Bob chooses box 2 Then the probability that Bob chooses a red ball from box 1 is: 𝑝 𝐹 ∗𝑝 𝐸 𝐹 = 1 2 ∗ 7 9 = 7 18 What is 𝑝 𝐸 𝐹 ? You can read this as the probability that 𝐸 will happen given that 𝐹 has already happened.

A definition for conditional probability What is the probability that Bob chooses a green ball from box 2? 𝑝 𝐸 ∩ 𝐹 =𝑝 𝐹 ∗𝑝 𝐸 𝐹 = 1 2 ∗ 4 7 = 4 14 We can adapt use the above relationship to derive a general formula for 𝑝 𝐸 𝐹 : Definition: Let 𝐸 and 𝐹 be events with 𝑝(𝐹)>0. The conditional probability of 𝐸 given 𝐹, denoted 𝑝(𝐸|𝐹), is defined as: 𝑝 𝐸 𝐹 = 𝑝 𝐸∩𝐹 𝑝 𝐹

Bob makes it tougher… What is the probability that Bob picked box 1 if Bob only knows he picked a red ball? We know 𝑝 𝐹∩𝐸 = 7 18 , but is that what this question is asking? No, we need to compute 𝑝 𝐹 𝐸 From our definition we know that 𝑝 𝐹 𝐸 = 𝑝 𝐹∩𝐸 𝑝 𝐸 = 𝑝 𝐸∩𝐹 𝑝 𝐸 How can we compute 𝑝 𝐸 ? Note that it is not just the overall probability of picking a red ball since the probability of picking a red ball is 7 9 50% of the time and 3 7 50% of the time. There is an idea! 𝑝 𝐸 =𝑝 𝐹 ∗𝑝 𝐸 𝐹 +𝑝 𝐹 ∗𝑝(𝐸| 𝐹 ) We will explore this identity more in depth later.

Answering Bob We need to solve 𝑝 𝐹 𝐸 = 7 18 𝑝 𝐸 We now strongly believe 𝑝 𝐸 =𝑝 𝐹 ∗𝑝 𝐸 𝐹 +𝑝 𝐹 ∗𝑝(𝐸| 𝐹 ) We can easily compute: 𝑝 𝐹 = 1 2 𝑝 𝐹 = 1 2 𝑝 𝐸 𝐹 = 7 9 𝑝 𝐸 𝐹 = 3 7 Then 𝑝 𝐸 = 1 2 ∗ 7 9 + 1 2 ∗ 3 7 = 38 63 So finally we can determine that 𝑝 𝐹 𝐸 = 7 18 38 63 = 49 76

Bayes’ Theorem It turns out that Bob’s question is not that unusual. Bayes’ Theorem allows us to relate the conditional and marginal probabilities of two random events. In English: Bayes’ Theorem will help us assess the probability that an event occurred given only partial evidence. Doesn't our formula for conditional probability do this already? Yes, but look at all the work we did!

Another Motivating Example Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Question: Can we use our simple conditional probability formula? What is the probability 𝑝 𝐸 𝐹 = 𝑝 𝐸∩𝐹 𝑝 𝐹 X is a user X tested positive

The reasoning that we used in the last problem essentially derives Bayes’ Theorem for us! Bayes’ Theorem: Suppose that 𝐸 and 𝐹 are events from some sample space 𝑆 such that 𝑝 𝐸 >0 and 𝑝 𝐹 >0. Then: 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝 𝐹 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝 𝐹 Proof: 𝑝 𝐹 𝐸 = 𝑝 𝐹∩𝐸 𝑝 𝐸 , By definition 𝑝 𝐸 𝐹 = 𝑝 𝐸∩𝐹 𝑝(𝐹) , By definition →𝑝 𝐹 𝐸 𝑝 𝐸 =𝑝 𝐸 𝐹 𝑝 𝐹 →𝑝 𝐹 𝐸 = 𝑝(𝐸|𝐹)𝑝 𝐹 𝑝 𝐸 Notice that this gives us the numerator…

Proof (continued) Note: To finish, we must prove 𝑝(𝐸)=𝑝(𝐸|𝐹)𝑝(𝐹)+𝑝(𝐸| 𝐹 )𝑝( 𝐹 ) We used “intuition” to solve this earlier… Observe that 𝐸=𝐸∩𝑆 =𝐸∩(𝐹∪ 𝐹 ) =(𝐸∩𝐹)∪(𝐸∩ 𝐹 ) Note also that 𝐸∩𝐹 and 𝐸∩ 𝐹 are disjoint. This means that 𝑝(𝐸)=𝑝(𝐸∩𝐹)+𝑝(𝐸∩ 𝐹 ) We already have shown that 𝑝(𝐸∩𝐹)=𝑝(𝐸|𝐹)𝑝(𝐹) Furthermore, note that 𝑝 𝐸∩ 𝐹 =𝑝 𝐸 𝐹 𝑝( 𝐹 ) So 𝑝(𝐸)=𝑝(𝐸∩𝐹)+𝑝(𝐸∩ 𝐹 )=𝑝(𝐸|𝐹)𝑝(𝐹)+𝑝(𝐸| 𝐹 )𝑝( 𝐹 ) Putting everything together, we get: 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝 𝐹 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝 𝐹

Probability that X is an opium user given a positive test The 1,000 foot view… In situations like the drug testing problem, Bayes’ theorem can help! Essentially, Bayes’ theorem will allow us to calculate 𝑝 𝐸 𝐹 assuming that we know (or can derive): Probability that X is a user: 𝑝(𝐸) The probability that the test yields a true positive: 𝑝 𝐹 𝐸 The probability that the test yields a false positive: 𝑝 𝐹 𝐸 ) Returning to our earlier example: Let 𝐸= “Person X is an opium user” Let 𝐹= “Person X tested positive for opium” It looks like Bayes’ Theorem could help in this case… Probability that X is an opium user given a positive test

And why is this useful? In a nutshell, Bayes’ Thereom is useful if you want to find p(F|E), but you don’t know p(E ∩ F) or p(E).

Example: Pants and Skirts Suppose there is a co-ed school having 60% boys and 40% girls as students. The girl students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all they can see is that this student is wearing trousers. What is the probability this student is a girl? Step 1: Set up events 𝐸= “X is wearing pants” 𝐸 = “X is wearing a skirt” 𝐹= “X is a girl” 𝐹 = X is a boy Step 2: Extract probabilities from problem definition 𝑝 𝐹 =0.4 𝑝( 𝐹 )=0.6 𝑝 𝐸 𝐹 =𝑝 𝐸 𝐹 =0.5 𝑝 𝐸 𝐹 =1 Note at least 20% chance

Pants and Skirts (continued) 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝(𝐹) 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝( 𝐹 ) Step 3: Plug in to Bayes’ Theorem 𝑝 𝐹 𝐸 = 0.5×0.4 0.5×0.4+1×0.6 = 0.2 0.8 = 1 4 Conclusion: There is a 25% chance that the person seen was a girl, given that they were wearing pants. Recall: 𝑝 𝐹 =0.4 𝑝 𝐹 =0.6 𝑝 𝐸 𝐹 =𝑝 𝐸 𝐹 =0.5 𝑝 𝐸 𝐹 =1

Drug screening, revisited Suppose that a certain opium test correctly identifies a person who uses opiates as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. If a company suspects that 0.5% of its employees are opium users, what is the probability that an employee that tests positive for this drug is actually a user? Step 1: Set up events 𝐹= “X is an opium user” 𝐹 = “X is not an opium user” 𝐸= “X tests positive for opiates” 𝐸 = X tests negative for opiates Step 2: Extract probabilities from problem definition 𝑝(𝐹)=0.005 𝑝( 𝐹 )=0.995 𝑝 𝐸 𝐹 =0.99 𝑝(𝐸| 𝐹 )=0.01

Drug screening (continued) 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝(𝐹) 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝( 𝐹 ) Step 3: Plug in to Bayes’ Theorem 𝑝 𝐹 𝐸 = 0.99 × 0.005 0.99 × 0.005 + 0.01 × 0.995 =0.3322 Conclusion: If an employee tests positive for opiate use, there is only a 33% chance that they are actually an opium user! Recall: 𝑝(𝐹) = 0.005 𝑝( 𝐹 ) = 0.995 𝑝(𝐸|𝐹) = 0.99 𝑝(𝐸| 𝐹 ) = 0.01 1% of 99.5% of clean employees that are misclassified is MUCH greater than the 99% of .5% employees that are users

Group Work! Suppose that 1 person in 100,000 has a particular rare disease. A diagnostic test is correct 99% of the time when given to someone with the disease, and is correct 99.5% of the time when given to someone without the disease. Problem 1: Calculate the probability that someone who tests positive for the disease actually has it. Problem 2: Calculate the probability that someone who tests negative for the disease does not have the disease. F = has disease, FC = no disease E = test positive, EC = test negative p(F) = 1/100,000 = 0.00001 P(FC) = 0.99999 p(E|F) = 0.99 P(E|FC) = 0.005 1: want p(F|E) = p(E|F)p(F)/[ p(E|F)p(F) + p(E|FC)p(FC) ] = .99*.00001/[.99*.00001 + .005*.99999] = approx 0.002 2: want p(FC|EC) = p(EC|FC)p(FC)/[ p(EC|FC)p(FC) + p(EC|F)p(F) ] = .995*.99999/[ .995*.99999 + .01*.00001 ] = approx .9999999 Conclusion: good for weeding people out, not so good for telling that people are sick

Application: Spam filtering Definition: Spam is unsolicited bulk email In recent years, spam has become increasingly problematic. For example, in 2015, Spam accounted for ~50% of all email messages sent. To combat this problem, people have developed spam filters based on Bayes’ theorem! I didn’t ask for it, I probably don’t want it Sent to lots of people…

How does a Bayesian spam filter work? Essentially, these filters determine the probability that a message is spam, given that it contains certain keywords. 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝(𝐹) 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝( 𝐹 ) In the above equation: p(E|F) = Probability that our keyword occurs in spam messages p(E|FC) = Probability that our keyword occurs in legitimate messages p(F) = Probability that an arbitrary message is spam p(FC) = Probability that an arbitrary message is legitimate Question: How do we derive these parameters? Message is spam Message contains questionable keyword If above a certain threshold, toss the message

We can learn these parameters by examining historical email traces Imagine that we have a corpus of email messages… We can ask a few intelligent questions to learn the parameters of our Bayesian filter: How many of these messages do we consider spam? p(F) In the spam messages, how often does our keyword appear? p(E|F) In the good messages, how often does our keyword appear? p(E|FC) Aside: This is what happens every time you click the “mark as spam” button in your email client! Given this information, we can apply Bayes’ theorem!

Filtering spam using a single keyword Suppose that the keyword “Rolex” occurs in 250 of 2000 known spam messages, and in 5 of 1000 known good messages. Estimate the probability that an incoming message containing the word “Rolex” is spam, assuming that it is equally likely that an incoming message is spam or not spam. If our threshold for classifying a message as spam is 0.9, will we reject this message? Step 1: Define events 𝐹= “message is spam” 𝐹 = “message is good” 𝐸= “message contains the keyword “Rolex”” 𝐸 = “message does not contain the keyword “Rolex”” Step 2: Gather probabilities from the problem statement 𝑝(𝐹)=𝑝( 𝐹 )=0.5 𝑝(𝐸|𝐹)= 250 2000 =0.125 𝑝(𝐸| 𝐹 )= 5 1000 =0.005

Spam Rolexes (continued) 𝑝 𝐹 𝐸 = 𝑝 𝐸 𝐹 𝑝(𝐹) 𝑝 𝐸 𝐹 𝑝 𝐹 +𝑝 𝐸 𝐹 𝑝( 𝐹 ) Step 3: Plug in to Bayes’ Theorem 𝑝(𝐹|𝐸)= 0.125 × 0.5 0.125 × 0.5 + 0.005 × 0.5 = 0.125 0.125 + 0.005 ≈ 0.962 Conclusion: Since the probability that our message is spam given that it contains the string “Rolex” is approximately 0.962 > 0.9, we will discard the message. Recall: 𝑝(𝐹)=𝑝( 𝐹 )=0.5 𝑝(𝐸|𝐹)=0.125 𝑝(𝐸| 𝐹 )=0.005

Problems with this simple filter How would you choose a single keyword/phrase to use? “All natural” “Nigeria” “Click here” … Users get upset if false positives occur, i.e., if legitimate messages are incorrectly classified as spam When was the last time you checked your spam folder? How can we fix this? Choose keywords s.t. p(spam | keyword) is very high or very low Filter based on multiple keywords

Specifically, we want to develop a Bayesian filter that tells us 𝑝(𝐹| 𝐸 1 ∩ 𝐸 2 ) First, some assumptions Events 𝐸1 and 𝐸2 are independent The events 𝐸1|𝑆 and 𝐸2|𝑆 are independent 𝑝(𝐹)=𝑝( 𝐹 )=0.5 Now, let’s derive formula for this 𝑝(𝐹| 𝐸 1 ∩ 𝐸 2 ) By Bayes’ theorem 𝑝 𝐹 𝐸 1 ∩ 𝐸 2 = 𝑝 𝐸 1 ∩ 𝐸 2 |𝐹 𝑝 𝐹 𝑝 𝐸 1 ∩ 𝐸 2 |𝐹 𝑝 𝐹 +𝑝 𝐸 1 ∩ 𝐸 2 𝐹 𝑝 𝐹 Assumption 3 These assumptions my cause errors, but we’ll assume that they’re small = 𝑝 𝐸 1 ∩ 𝐸 2 |𝐹 𝑝 𝐸 1 ∩ 𝐸 2 |𝐹 +𝑝 𝐸 1 ∩ 𝐸 2 𝐹 Assumptions 1 and 2 = 𝑝 𝐸 1 |𝐹 𝑝 𝐸 2 |𝐹 𝑝 𝐸 1 |𝐹 𝑝 𝐸 2 |𝐹 +𝑝 𝐸 1 𝐹 𝑝 𝐸 2 | 𝐹

Spam filtering on two keywords Suppose that we train a Bayesian spam filter on a set of 2000 spam messages and 1000 messages that are not spam. The word “stock” appears in 400 spam messages and 60 good messages, and the word “undervalued” appears in 200 spam messages and 25 good messages. Estimate the probability that a message containing the words “stock” and “undervalued” is spam. Will we reject this message if our spam threshold is set at 0.9? Step 1: Set up events 𝐹= “message is spam”, 𝐹 = “message is good” 𝐸 1 = “message contains the word ‘stock’” 𝐸 2 = “message contains the word ‘undervalued’” Step 2: Identify probabilities 𝑃 𝐸 1 𝐹 = 400 2000 =0.2, 𝑝( 𝐸 1 | 𝐹 )= 60 1000 =0.06 𝑝 𝐸 2 𝐹 = 200 2000 =0.1, 𝑝( 𝐸 2 | 𝐹 )= 25 1000 =0.025

Two keywords (continued) 𝑝 𝐹 𝐸 1 ∩ 𝐸 2 = 𝑝 𝐸 1 𝐹 𝑝 𝐸 2 𝐹 𝑝 𝐸 1 𝐹 𝑝 𝐸 2 𝐹 +𝑝 𝐸 1 𝐹 𝑝 𝐸 2 𝐹 Step 3: Plug in to Bayes’ Theorem p(F|E1 ∩ E2) = (0.2 × 0.1)/(0.2 × 0.1 + 0.06 × 0.025) = 0.02/(0.02 + 0.0015) ≈ 0.9302 Conclusion: Since the probability that our message is spam given that it contains the strings “stock” and “undervalued” is ≈ 0.9302 > 0.9, we will reject this message. Recall: 𝑝( 𝐸 1 |𝐹)=0.2 𝑝( 𝐸 1 | 𝐹 )=0.06 𝑝( 𝐸 2 |𝐹)=0.1 𝑝( 𝐸 2 | 𝐹 )=0.025 Also claim that Bayesian filters are not a panacea

Consider: 𝑝 𝐹 𝐸 1 = 𝑝 𝐸 1 𝐹 𝑝(𝐹) 𝑝 𝐸 1 𝐹 𝑝 𝐹 +𝑝 𝐸 1 𝐹 𝑝( 𝐹 ) = .2×.5 .2×.5+.06×.5 =.76923… 𝑝 𝐹 𝐸 2 = 𝑝 𝐸 2 𝐹 𝑝(𝐹) 𝑝 𝐸 2 𝐹 𝑝 𝐹 +𝑝 𝐸 2 𝐹 𝑝( 𝐹 ) = .1×.5 .1×.5+.025×.5 =.8 So there is a 77% chance that a message containing “stock” is spam and a 80% chance that a message containing “undervalued” is spam. Neither word is enough to categorize the message as spam alone!

Final Thoughts Conditional probability is very useful Bayes’ theorem Helps us assess conditional probabilities Has a range of important applications Next time: Relations!