COMP61011 : Machine Learning Probabilistic Models + Bayes’ Theorem

Slides:



Advertisements
Similar presentations
Probability. Uncertainty Let action A t = leave for airport t minutes before flight –Will A t get me there on time? Problems: Partial observability (road.
Advertisements

Bayesian analysis with a discrete distribution Source: Stattrek.com.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
What we will cover here What is a classifier
Naïve Bayes Classifier
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Naïve Bayesian Classifiers Before getting to Naïve Bayesian Classifiers let’s first go over some basic probability theory p(C k |A) is known as a conditional.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review: Probability Random variables, events Axioms of probability
PROBABILITY David Kauchak CS451 – Fall Admin Midterm Grading Assignment 6 No office hours tomorrow from 10-11am (though I’ll be around most of the.
Statistics Continued. Purpose of Inferential Statistics Try to reach conclusions that extend beyond the immediate data Make judgments about whether an.
More probability CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Introduction to Probability Theory March 24, 2015 Credits for slides: Allan, Arms, Mihalcea, Schutze.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
Bayes’ Theorem Susanna Kujanpää OUAS Bayes’ Theorem This is a theorem with two distinct interpretations. 1) Bayesian interpretation: it shows.
Probability 2.0. Independent Events Events can be "Independent", meaning each event is not affected by any other events. Example: Tossing a coin. Each.
Probability Lecture 2. Probability Why did we spend last class talking about probability? How do we use this?
BIOSTAT 3 Three tradition views of probabilities: Classical approach: make certain assumptions (such as equally likely, independence) about situation.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.
V7 Foundations of Probability Theory „Probability“ : degree of confidence that an event of an uncertain nature will occur. „Events“ : we will assume that.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
COM24111: Machine Learning Decision Trees Gavin Brown
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Oliver Schulte Machine Learning 726
Bayesian inference, Naïve Bayes model
Probability David Kauchak CS158 – Fall 2013.
Naïve Bayes Classifier
Classification Algorithms
CSE543: Machine Learning Lecture 2: August 6, 2014
Data Science Algorithms: The Basic Methods
Decision Trees: Another Example
What is Probability? Quantification of uncertainty.
Naïve Bayes Classifier
1 Scientific Method.
Probability and Statistics Chapter 3 Notes
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Bayes Net Learning: Bayesian Approaches
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Oliver Schulte Machine Learning 726
Chapter 5 Sampling Distributions
Decision Tree Saed Sayad 9/21/2018.
Discrete Structures for Computer Science
Basic Probability aft A RAJASEKHAR YADAV.
Probability Topics Random Variables Joint and Marginal Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Mr Barton’s Maths Notes
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
Naïve Bayes Classifier
Generative Models and Naïve Bayes
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
COMP61011 : Machine Learning Decision Trees
Wellcome Trust Centre for Neuroimaging
Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before.
Decision Trees Decision tree representation ID3 learning algorithm
Generative Models and Naïve Bayes
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Machine Learning: Decision Tree Learning
NAÏVE BAYES CLASSIFICATION
Naïve Bayes Classifier
Presentation transcript:

COMP61011 : Machine Learning Probabilistic Models + Bayes’ Theorem

Probabilistic Models one of the most active areas of ML research in last 15 years foundation of numerous new technologies enables decision-making under uncertainty Tough. Don’t expect to get this immediately. It takes time.

I have four snooker balls in a bag – 2 black, 2 white. I reach in with my eyes closed. What is the probability of picking a black ball? I give this variable a name, “A”.

Picking a black ball, then replacing, then picking black again? Why?

Picking two black balls in sequence (i.e. no replacing) ?

Probabilities and Conditional Probabilities Events : A, B, C, etc – “random variables” e.g. A is the random “event” of picking the first ball. B is the random “event” of picking the second ball. where ‘1’ means the ball was black.

the second is black, given that Rules of Probability Theory Probability that the second is black, given that the first was black Probability that both balls are black = Probability that the first is black x

Shorthand notation Means that the rule holds for all possible assignments of values to A and B.

If two events A,B are dependent : e.g. black/white balls example If two events A,B are independent : e.g. two consecutive rolls of a dice

The chances of the wind being strong, among all days. Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14 The chances of the wind being strong, among all days.

The chances of the wind being strong, among all days. Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14 The chances of the wind being strong, among all days.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14 The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild D5 Cool Normal D7 Strong D9 Sunny D10 D11 D12 D13 The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild D5 Cool Normal D7 Strong D9 Sunny D10 D11 D12 D13 The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14 The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D2 Sunny Hot High Strong No D6 Rain Cool Normal D7 Overcast Yes D11 Mild D12 D14 The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 D10 D11 D12 D13 D14

What’s the use of all this? We can calculate these numbers on data Leads to an elegant theorem we can make use of

A problem to solve: The question: Quick guess: • 1% of the population get cancer • 80% of people with cancer get a positive test • 9.6% of people without cancer also get a positive test The question: A person has a test for cancer that comes back positive. What is the probability that they actually have cancer? Quick guess: less than 1% somewhere between 1% and 70% between 70% and 80% more than 80%

Write down the probabilities of everything… Define variables: The prior probability of cancer in the population is 1%, so… The probability of positive test given there is cancer, If there is no cancer, we still have The question is: what is… ?

Working with Concrete Numbers 10,000 patients p(C=0) = 0.99 p(C=1) = 0.01 100 cancer 9900 no cancer 80 cancer, positive test 20 negative test p(E=1|C=1) = 0.8 950.4 no cancer, positive test 8949.6 negative test p(E=1|C=0) = 0.096 How many people from 10,000 get E=1 ? How many from those get C=1 ?

Working with Concrete Numbers 10,000 patients p(C=0) = 0.99 p(C=1) = 0.01 100 cancer 9900 no cancer 80 cancer, positive test 20 negative test p(E=1|C=1) = 0.8 950.4 no cancer, positive test 8949.6 negative test p(E=1|C=0) = 0.096 7.76%

Surprising result! Do you trust your Doctor? Although the probability of a positive test given cancer is 80%, the probability of cancer given a positive test is only about 7.8%. 8/10 doctors would have said: c) between 70% and 80% ……. WRONG!! Common mistake: “the probability that a person with positive test has cancer” is not the same as “the probability that a person with cancer has a positive test”. One must also consider : …the background chances (prior) of having cancer, …the chances of receiving a false alarm in the test.

Solving the same problem, via “Bayes Theorem” The general statement is: And since the statement “E and C” is equivalent to “C and E” :

Solving the same problem, via “Bayes Theorem” Now rearrange…

Bayes’ Theorem forms the backbone of the past Rev. Thomas Bayes, 1702 - 1761 Bayes’ Theorem forms the backbone of the past 20 years of ML research into probabilistic models. Think of E as “effect” and C as “cause”. But.. warning: sometimes thinking this way will be very non-intuitive.

Another rule of probability theory: “marginalizing” we know this we know this we want this we can calculate this Another rule of probability theory: “marginalizing” Think of this as …“given all possible things that can happen with C, what is the probability of E=1 ?

Notice the denominator now contains the same term as the numerator. We only need to know two terms here: p(E=1 | C=1)p(C=1) and p(E=1 | C=0)p(C=0)

Talk to your neighbours – 5 mins or so. Bayes’ theorem…. Talk to your neighbours – 5 mins or so.

Another Example… what year is it? You jump in a time machine. It takes you somewhere. But you don’t know to what year it has taken you. You know it is one of 1885, 1955, 1985, or 2015.

What year is it? You look out the window… and see a STEAM train. What are the chances of seeing this in the year 2015 ? Let’s guess…

What year is it? In other years? And remember…

What year is it? Bayes Theorem to the rescue…. We can calculate the denominator as …

What year is it? Bayes Theorem to the rescue….

What year is it? Bayes Theorem to the rescue…. For other years….

What year is it? Then you look out the window…. And see someone wearing Nike branded trainers.

What year is it? But now our belief over what year it is has changed, because of the train… But, Bayes Theorem can just use this, plugging it back into the same equation…

What year is it?

What year is it?

What year is it? Observation Prior belief We believe we are in 1985, with p = 0.945

Bayes’ theorem, done. Take a 15 minute break.

More Problems Solved with Probabilities Your car is making a noise. What are the chances that the tank is empty? The chances of the car making noise, if the tank really is empty. The chances of the car making noise, if the tank is not empty The chances of the tank being empty, regardless of anything else.

Bayes’ Theorem

Another Problem to Solve… A person tests positive for a certain medical disease. What are the chances that they really do have the disease? The chances of the test being positive, if the person really is ill. The chances of the test being positive, if the person is in fact well. The chances of the condition, in the general population.

Bayes’ Theorem

Another Problem to Solve… Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 20% of the time. What is are the chances it will rain on the day of Marie's wedding? The chances of the forecast saying rain, if it really does rain. saying rain, if it will be fine. The chances of rain, in the general case.