More probability CS151 David Kauchak Fall 2010 Some material borrowed from: Sara Owsley Sood and others.

Slides:



Advertisements
Similar presentations
Basics of Statistical Estimation
Advertisements

Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
Bayesian Networks CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Review: Bayesian learning and inference
Probabilistic inference
CPSC 422 Review Of Probability Theory.
Probability.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Belief Networks
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
Naïve Bayes Model. Outline Independence and Conditional Independence Naïve Bayes Model Application: Spam Detection.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Review: Probability Random variables, events Axioms of probability
PROBABILITY David Kauchak CS451 – Fall Admin Midterm Grading Assignment 6 No office hours tomorrow from 10-11am (though I’ll be around most of the.
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
More probability CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.
Crash Course on Machine Learning
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 15and20, 2012.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
PROBABILITY David Kauchak CS159 – Spring Admin  Posted some links in Monday’s lecture for regular expressions  Logging in remotely  ssh to vpn.cs.pomona.edu.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
PRIORS David Kauchak CS159 Fall Admin Assignment 7 due Friday at 5pm Project proposals due Thursday at 11:59pm Grading update.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Bayesian inference, Naïve Bayes model
CS 188: Artificial Intelligence Spring 2007
Probability David Kauchak CS158 – Fall 2013.
Learning from Data. Learning from Data Learning sensors actuators environment agent ? As an agent interacts with the world, it should learn about its.
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Bayes Rule Which is shorthand for: 10 Mar 2004 CS Uncertainty.
CS 4/527: Artificial Intelligence
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
CSCI 5822 Probabilistic Models of Human and Machine Learning
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
Presentation transcript:

More probability CS151 David Kauchak Fall 2010 Some material borrowed from: Sara Owsley Sood and others

Admin Assignment 4 out –Work in partners –Part 1 due by Thursday at the beginning of class Midterm exam time –Review next Tuesday

Joint distributions For an expression with n boolean variables e.g. P(X 1, X 2, …, X n ) how many entries will be in the probability table? –2n–2n Does this always have to be the case?

Independence Two variables are independent if one has nothing whatever to do with the other For two independent variables, knowing the value of one does not change the probability distribution of the other variable (or the probability of any individual event) –the result of the toss of a coin is independent of a roll of a dice –price of tea in England is independent of the whether or not you pass AI

Independent or Dependent? Catching a cold and having cat-allergy Miles per gallon and driving habits Height and longevity of life

Independent variables How does independence affect our probability equations/properties? If A and B are independent (written …) –P(A,B) = P(A)P(B) –P(A|B) = P(A) –P(B|A) = P(B) AB

Independent variables If A and B are independent –P(A,B) = P(A)P(B) –P(A|B) = P(A) –P(B|A) = P(B) Reduces the storage requirement for the distributions

Conditional Independence Dependent events can become independent given certain other events Examples, –height and length of life –“correlation” studies size of your lawn and length of life If A, B are conditionally independent of C –P(A,B|C) = P(A|C)P(B|C) –P(A|B,C) = P(A|C) –P(B|A,C) = P(B|C) –but P(A,B) ≠ P(A)P(B)

Cavities What independences are encoded (both unconditional and conditional)?

Bayes nets Bayes nets are a way of representing joint distributions –Directed, acyclic graphs –Nodes represent random variables –Directed edges represent dependence –Associated with each node is a conditional probability distribution P(X | parents(X)) –They encode dependences/independences

Cavities Weather P(Cavity) P(Catch | Cavity) P(Toothache | Cavity) P(Weather) What independences are encoded?

Cavities Weather P(Cavity) P(Catch | Cavity) P(Toothache | Cavity) P(Weather) Weather is independent of all variables Toothache and Catch are conditionally independent GIVEN Cavity Does this help us in storing the distribution?

Why all the fuss about independences? Weather P(Cavity) P(Catch | Cavity) P(Toothache | Cavity) P(Weather) Basic joint distribution –2 4 = 16 entries With independences? – = 12 –If we’re sneaky: = 6 –Can be much more significant as number of variables increases!

Cavities Weather P(Cavity) P(Catch | Cavity) P(Toothache | Cavity) P(Weather) Independences?

Cavities Weather P(Cavity) P(Catch | Cavity) P(Toothache | Cavity) P(Weather) Graph allows us to figure out dependencies, and encodes the same information as the joint distribution.

Another Example Variables that give information about this question: DO: is the dog outside? FO: is the family out (away from home)? LO: are the lights on? BP: does the dog have a bowel problem? HB: can you hear the dog bark? Question: Is the family next door out?

Exploit Conditional Independence Variables that give information about this question: DO: is the dog outside? FO: is the family out (away from home)? LO: are the lights on? BP: does the dog have a bowel problem? HB: can you hear the dog bark? Which variables are directly dependent? Are LO and DO independent? What if you know that the family is away? Are HB and FO independent? What if you know that the dog is outside?

Some options lights (LO) depends on family out (FO) dog out (DO) depends on family out (FO) barking (HB) depends on dog out (DO) dog out (DO) depends on bowels (BP) What would the network look like?

Bayesian Network Example Graph structure represents direct influences between variables (Can think of it as causality—but it doesn't have to be) random variables direct probabilistic influence joint probability distribution

Learning from Data

Learning environment agent ? sensors actuators As an agent interacts with the world, it should learn about it’s environment

We’ve already seen one example… Number of times an event occurs in the data Total number of times experiment was run (total number of data collected) P(rock) = 4/10 = 0.4 P(rock | scissors) = 2/4 = 0.5 P(rock | scissors, scissors, scissors) = 1/1 = 1.0

Lots of different learning problems Unsupervised learning: put these into groups

Lots of different learning problems Unsupervised learning: put these into groups

Lots of different learning problems Unsupervised learning: put these into groups No explicit labels/categories specified

Lots of learning problems Supervised learning: given labeled data APPLES BANANAS

Lots of learning problems Given labeled examples, learn to label unlabeled examples Supervised learning: learn to classify unlabeled APPLE or BANANA?

Lots of learning problems Many others –semi-supervised learning: some labeled data and some unlabeled data –active learning: unlabeled data, but we can pick some examples to be labeled –reinforcement learning: maximize a cumulative reward. Learn to drive a car, reward = not crashing and variations –online vs. offline learning: do we have access to all of the data or do we have to learn as we go –classification vs. regression: are we predicting between a finite set or are we predicting a score/value

Supervised learning: training DataLabel train a predictive model classifier Labeled data

Training DataLabel not spam spam not spam train a predictive model classifier Labeled data s

Supervised learning: testing/classifying Unlabeled data predict the label classifier labels

testing/classifying Unlabeled data predict the label classifier labels spam not spam spam not spam s

Some examples image classification –does the image contain a person? apple? banana? text classification –is this a good/bad review? –is this article about sports or politics? –is this spam? character recognition –is this set of scribbles an ‘a’, ‘b’, ‘c’, … credit card transactions –fraud or not? audio classification –hit or not? –jazz, pop, blues, rap, … Tons of problems!!!

Features We’re given “raw data”, e.g. text documents, images, audio, … Need to extract “features” from these (or to think of it another way, we somehow need to represent these things) What might be features for: text, images, audio? Raw dataLabel extract features f 1, f 2, f 3, …, f n features

Examples Raw dataLabel extract features f 1, f 2, f 3, …, f n features Terminology: An example is a particular instantiation of the features (generally derived from the raw data). A labeled example, has an associated label while an unlabeled example does not.

Feature based classification Training or learning phase Raw dataLabel extract features f 1, f 2, f 3, …, f n featuresLabel train a predictive model classifier Training examples

Feature based classification Testing or classification phase Raw data extract features f 1, f 2, f 3, …, f n features predict the label classifier labels Testing examples

Bayesian Classification We represent a data item based on the features: Training For each label/class, learn a probability distribution based on the features a: b:

Bayesian Classification We represent a data item based on the features: Classifying For each label/class, learn a probability distribution based on the features How do we use this to classify a new example?

Bayesian Classification We represent a data item based on the features: Classifying Given an new example, classify it as the label with the largest conditional probability

Bayesian Classification We represent a data item based on the features: Classifying For each label/class, learn a probability distribution based on the features How do we use this to classify a new example?

Training a Bayesian Classifier f 1, f 2, f 3, …, f n featuresLabel train a predictive model classifier How are we going to learn these?

Bayes rule for classification number with features with label total number of items with features Is this ok?

Bayes rule for classification number with features with label total number of items with features Very sparse! Likely won’t have many with particular set of features.

Training a Bayesian Classifier

Bayes rule for classification prior probability conditional (posterior) probability

Bayes rule for classification How are we going to learn these?

Bayes rule for classification number with features with label total number of items with label Is this ok?

Bayes rule for classification number with features with label total number of items with label Better (at least denominator won’t be sparse), but still unlikely to see any given feature combination.

Flu x1x1 x2x2 x5x5 x3x3 x4x4 feversinuscoughrunnynosemuscle-ache The Naive Bayes Classifier Conditional Independence Assumption: features are independent of eachother given the class:

Estimating parameters How do we estimate these?

Maximum likelihood estimates number of items with label total number of items number of items with the label with feature number of items with label Any problems with this approach?

Naïve Bayes Text Classification How can we classify text using a Naïve Bayes classifier?

Naïve Bayes Text Classification Features: word occurring in a document (though others could be used…) spam buy viagra the now enlargement

Naïve Bayes Text Classification Does the Naïve Bayes assumption hold? Are word occurrences independent given the label? spam buy viagra the now enlargement

Naïve Bayes Text Classification We’ll look at a few application for this homework –sentiment analysis: positive vs. negative reviews –category classifiction spam buy viagra the now enlargement

Text classification: training? number of times the word occurred in documents in that class number of items in text class