CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Lecture 5 Bayesian Learning
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
5/17/20151 Probabilistic Reasoning CIS 479/579 Bruce R. Maxim UM-Dearborn.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Probabilistic inference
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Bayesian Belief Networks
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review: Probability Random variables, events Axioms of probability
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Naive Bayes Classifier
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Uncertainty Management in Rule-based Expert Systems
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Lecture 1.31 Criteria for optimal reception of radio signals.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Bayesian inference, Naïve Bayes model
Review of Probability.
Qian Liu CSE spring University of Pennsylvania
Naive Bayes Classifier
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Class #16 – Tuesday, October 26
LECTURE 07: BAYESIAN ESTIMATION
28th September 2005 Dr Bogdan L. Vrusias
Naive Bayes Classifier
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
Presentation transcript:

CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current Events Christian - now Jeff - Thursday Research Paper due Tuesday, November 20

Probabilistic Reasoning Lecture 15

CS 484 – Artificial Intelligence3 Probabilistic Reasoning Logic deals with certainties A → B Probabilities are expressed in a notation similar to that of predicates in First Order Predicate Calculus: P(R) = 0.7 P(S) = 0.1 P(¬(A Λ B) V C) = = certain; 0 = certainly not

CS 484 – Artificial Intelligence4 What's the probability that either A is true or B is true? A Λ B A B P(A V B) = Venn Diagram

CS 484 – Artificial Intelligence5 Conditional Probability Conditional probability refers to the probability of one thing given that we already know another to be true: This states the probability of B, given A. A Λ B A B

CS 484 – Artificial Intelligence6 Calculate P(R|S) given that the probability of rain is 0.7, the probability of sun is 0.1 and the probability of rain and sun is 0.01 P(R|S) = Note: P(A|B) ≠ P(B|A)

CS 484 – Artificial Intelligence7 Joint Probability Distributions A joint probability distribution represents the combined probabilities of two or more variables. This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09 Using this, we can calculate P(A): P(A) = P(A Λ B) + P(A Λ ¬B) = = 0.74 A⌐A B ⌐B A Λ B AB

CS 484 – Artificial Intelligence8 Bayes’ Theorem Bayes’ theorem lets us calculate a conditional probability: P(B) is the prior probability of B. P(B | A) is the posterior probability of B.

CS 484 – Artificial Intelligence9 Bayes' Theorem Deduction Recall:

CS 484 – Artificial Intelligence10 Medical Diagnosis Data 80% of the time you have a cold, you also have a high temperature. At any one time, 1 in every 10,000 people has a cold 1 in every 1000 people has a high temperature Suppose you have a high temperature. What is the likelihood that you have a cold?

CS 484 – Artificial Intelligence11 Witness Reliability A hit-and-run incident has been reported, and an eye witness has stated she is certain that the car was a white taxi. How likely is she right? Facts: Yellow taxi company has 90 cars White taxi company has 10 cars Expert says that given the foggy weather, the witness has 75% chance of correctly identifying the taxi

CS 484 – Artificial Intelligence12 Witness Reliability – Prior Probability Imagine lady shown a sequence of 1000 cars Expect 900 to be yellow and 100 to be white Given 75% accuracy, how many will she say are white and yellow Of 900 yellow cars, says yellow and says white Of 100 yellow cars, says yellow and says white What is the probability women says white? How likely is she right?

CS 484 – Artificial Intelligence13 Comparing Conditional Probabilities Medical diagnosis Probability of cold (C) is P(HT|C) = 0.8 Probability of plague (P) is P(HT|P) = 0.99 Relative likelihood of cold and plague

CS 484 – Artificial Intelligence14 Simple Bayesian Concept Learning (1) P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E. Let us suppose we have a set of hypotheses H 1 … H n. For each H i Hence, given a piece of evidence, a learner can determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.

CS 484 – Artificial Intelligence15 Simple Bayesian Concept Learning (2) In fact, this can be simplified. Since P(E) is independent of H i it will have the same value for each hypothesis. Hence, it can be ignored, and we can find the hypothesis with the highest value of: We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|H i ). This is the likelihood of E given H i.

CS 484 – Artificial Intelligence16 Bayesian Belief Networks (1) A belief network shows the dependencies between a group of variables. If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs. C and D are dependent on A; D and E are dependent on B. The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4

CS 484 – Artificial Intelligence17 Bayesian Belief Networks (2) A complete set of probabilities for this belief network might be: P(A) = 0.1 P(B) = 0.7 P(C|A) = 0.2 P(C|¬A) = 0.4 P(D|A Λ B) = 0.5 P(D|A Λ ¬B) = 0.4 P(D|¬A Λ B) = 0.2 P(D|¬A Λ ¬B) = P(E|B) = 0.2 P(E|¬B) = 0.1

CS 484 – Artificial Intelligence18 Bayesian Belief Networks (3) We can now calculate conditional probabilities: In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:

CS 484 – Artificial Intelligence19 College Life Example C = that you will go to college S = that you will study P = that you will party E = that you will be successful in your exams F = that you will have fun C S P E F

CS 484 – Artificial Intelligence20 College Life Example C S P E F P(C) 0.2 CP(S) true0.8 false0.2 CP(P) true0.6 false0.5 SPP(E) true 0.6 truefalse0.9 falsetrue0.1 false 0.2 PP(F) true0.9 false0.7

CS 484 – Artificial Intelligence21 College Example Using the tables to solve problems such as P(C==true, S = true, P = false, E = true, F = false) == P(C,S, ¬P,E, ¬F) General solution

CS 484 – Artificial Intelligence22 Noisy-V Function Want to assume know all reasons for a possible event E.g. Medical Diagnosis System P(HT|C) = 0.8 P(HT|P) = 0.99 Assume P(HT|C V P) = 1 (?) Assumption clearly not true Leak node – represents all other causes P(HT|O) = 0.9 Define noise parameters – conditional probabilities for ¬HT P(¬ HT|C) = 1 – P(HT|C) = 0.2 P(¬ HT|P) = P(¬ HT|O) = Further assumption – the causes of a high temperature are independent of each other and the noisy parameters are independent

CS 484 – Artificial Intelligence23 Noisy V-Function Benefit of Noisy V-Function If cold, plague, and other is all false, P(¬HT) = 1 Otherwise, P(¬HT) is equal to product of the noise parameters for all the variables that are true E.g. If plague and other is true and cold is false, P(HT) = 1 – (0.01 * 0.1) = Benefit – don’t need to store as many values as the Bayesian belief network

CS 484 – Artificial Intelligence24 Bayes’ Optimal Classifier A system that uses Bayes’ theory to classify data. We have a piece of data y, and are seeking the correct hypothesis from H 1 … H 5, each of which assigns a classification to y. The probability that y should be classified as c j is: x 1 to x n are the training data, and m is the number of hypotheses. This method provides the best possible classification for a piece of data. Example: Given some date will classify it as true or false P(true|x 1,…,x n ) = P(false|x 1,…,x n ) = P(H 1 | x 1,…,x n ) = 0.2P(false|H 1 ) = 0P(true|H 1 ) = 1 P(H 2 | x 1,…,x n ) = 0.3P(false|H 2 ) = 0P(true|H 2 ) = 1 P(H 3 | x 1,…,x n ) = 0.1P(false|H 3 ) = 1P(true|H 3 ) = 0 P(H 4 | x 1,…,x n ) = 0.25P(false|H 4 ) = 0P(true|H 4 ) = 1 P(H 5 | x 1,…,x n ) = 0.15P(false|H 5 ) = 1P(true|H 5 ) = 0

CS 484 – Artificial Intelligence25 The Naïve Bayes Classifier (1) A vector of data is classified as a single classification. p(c i | d 1, …, d n ) The classification with the highest posterior probability is chosen. The hypothesis which has the highest posterior probability is the maximum a posteriori, or MAP hypothesis. In this case, we are looking for the MAP classification. Bayes’ theorem is used to find the posterior probability:

CS 484 – Artificial Intelligence26 The Naïve Bayes Classifier (2) Since P(d 1, …, d n ) is a constant, independent of c i, we can eliminate it, and simply aim to find the classification c i, for which the following is maximised: We now assume that all the attributes d 1, …, d n are independent So P(d 1, …, d n |c i ) can be rewritten as: The classification for which this is highest is chosen to classify the data.

CS 484 – Artificial Intelligence27 Classifier Example xyzClassification 232A 414B 132A 243A 424B 213C 124A 233B 224A 333C 321A 121B 214A 434C 224A New piece of data to classify (x = 2, y = 3, z =4) Want P(c i |x=2,y=3,z=4) P(A) * P(x=2|A) * P(y=3|A) * P(z=4|A) P(B) * P(x=2|B) * P(y=3|B) * P(z=4|B) Training Data

CS 484 – Artificial Intelligence28 M-estimate Problem with too little training data (x=1, y=2, z=2) P(x=1 | B) = 1/4 P(y=2 | B) = 2/4 P(z=2 | B) = 0 Avoid problem by using M-estimate which pads the computation with additional samples Conditional probability = (a + mp) / (b + m) m = 5 (equivalent sample size) p = 1/num_values_for_category (1/4 for x) a = training example with category value and classification (x=1 and B is 1) b = training examples with classification (B is 4)

CS 484 – Artificial Intelligence29 Collaborative Filtering A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests. If we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D. P(Bob likes Z | Bob likes A, Bob likes B, …, Bob likes Y) Can be calculated using decision trees: B