CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
Advanced Artificial Intelligence
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 188: Artificial Intelligence Spring 2007
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
CS 4/527: Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Spring 2007
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
CS 188: Artificial Intelligence Spring 2007
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
CSCI 5582 Artificial Intelligence
Presentation transcript:

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 15 Jim Martin

CSCI 5582 Fall 2006 Today 10/19 Review Belief Net Computing Sequential Belief Nets

CSCI 5582 Fall 2006 Review Normalization Belief Net Semantics

CSCI 5582 Fall 2006 Normalization What do I know about P(~A| something) and P(A|same something) They sum to 1

CSCI 5582 Fall 2006 Normalization What if I have this… P(A, Y)/P(Y) and P(~A, Y)/P(Y) And I can compute the numerators but not the demoninator? Ignore it and compute what you have, then normalize P(A|Y) = P(A,Y)/(P(A,Y)+P(~A,Y)) P(~A|Y) = P(~A,Y)/(P(A,Y)+P(~A,Y))

CSCI 5582 Fall 2006 Normalization Alpha * =

CSCI 5582 Fall 2006 Bayesian Belief Nets A compact notation for representing conditional independence assumptions and hence a compact way of representing a joint distribution. Syntax: –A directed acyclic graph, one node per variable –Each node augmented with local conditional probability tables

CSCI 5582 Fall 2006 Bayesian Belief Nets Nodes with no incoming arcs (root nodes) simply have priors associated with them Nodes with incoming arcs have tables enumerating the –P(Node|Conjunction of Parents) –Where parent means the node at the other end of the incoming arc

CSCI 5582 Fall 2006 Bayesian Belief Nets: Semantics The full joint distribution for the N variables in a Belief Net can be recovered from the information in the tables.

CSCI 5582 Fall 2006 Alarm Example

CSCI 5582 Fall 2006 Alarm Example P(J^M^A^~B^~E)= P(J|A)*P(M|A)*P(A|~B^~E)*P(~B)*P(~E) 0.9 * 0.7 *.001 *.999 *.998 In other words, the probability of atomic events can be read right off the network as the product of the probability of the entries for each variable

CSCI 5582 Fall 2006 Events P(M ^J^E^B^A)+ P(M^J^E^B^~A)+ P(M^J^E^~B^A)+ …

CSCI 5582 Fall 2006 Chain Rule Basis P(B,E,A,J,M) P(M|B,E,A,J)P(B,E,A,J) P(J|B,E,A)P(B,E,A) P(A|B,E)P(B,E) P(B|E)P(E)

CSCI 5582 Fall 2006 Chain Rule Basis P(B,E,A,J,M) P(M|B,E,A,J)P(J|B,E,A)P(A|B,E)P(B|E)P(E) P(M|A) P(J|A) P(A|B,E)P(B)P(E)

CSCI 5582 Fall 2006 Alarm Example

CSCI 5582 Fall 2006 Details Where do the graphs come from? –Initially, the intuitions of domain experts Where do the numbers come from? –Hopefully, from hard data –Sometimes from experts intuitions How can we compute things efficiently? –Exactly by not redoing things unnecessarily –By approximating things

CSCI 5582 Fall 2006 Computing with BBNs Normal scenario –You have a belief net consisting of a bunch of variables Some of which you know to be true (evidence) Some of which you’re asking about (query) Some you haven’t specified (hidden)

CSCI 5582 Fall 2006 Example Probability that there’s a burglary given that John and Mary are calling P(B|J,M) –B is the query variable –J and M are evidence variables –A and E are hidden variables

CSCI 5582 Fall 2006 Example Probability that there’s a burglary given that John and Mary are calling P(B|J,M) = alpha P(B,J,M) = alpha * P(B,J,M,A,E) + P(B,J,M,~A,E)+ P(B,J,M,A,~E)+ P(B,J,M, ~A,~E)

CSCI 5582 Fall 2006 From the Network

CSCI 5582 Fall 2006 Expression Tree

CSCI 5582 Fall 2006 Speedups Don’t recompute things. –Dynamic programming Don’t compute somethings at all –Ignore variables that can’t effect the outcome.

CSCI 5582 Fall 2006 Example John calls given burglary P(J|B)

CSCI 5582 Fall 2006 Variable Elimination Every variable that is not an ancestor of a query variable or an evidence variable is irrelevant to the query –Operationally… You can eliminate leaf node that isn’t a query or evidence variable That may produce new leaves. Keep going.

CSCI 5582 Fall 2006 Alarm Example

CSCI 5582 Fall 2006 Break Questions?

CSCI 5582 Fall 2006 Chain Rule Basis P(B,E,A,J,M) P(M|B,E,A,J)P(B,E,A,J) P(J|B,E,A)P(B,E,A) P(A|B,E)P(B,E) P(B|E)P(E)

CSCI 5582 Fall 2006 Chain Rule P(E1,E2,E3,E4,E5) P(E5|E1,E2,E3,E4)P(E1,E2,E3,E4) P(E4|E1,E2,E3)P(E1,E2,E3) P(E3|E1,E2)P(E1,E2) P(E2|E1)P(E1)

CSCI 5582 Fall 2006 Chain Rule Rewriting that’s just P(E1)P(E2|E1)P(E3|E1,E2)P(E4|E1,E2,E3)P(E5|E1,E2,E3,E4) The probability of a sequence of events is just the product of the conditional probability of each event given it’s predecessors (parents/causes in belief net terms).

CSCI 5582 Fall 2006 Markov Assumption This is just a sequence based independence assumption just like with belief nets. –Not all the parents matter Remember P(toothache|catch, cavity)= P(toothache|cavity) Now P(Event_N|Event1 to Event_N-1)= P(Event_N|Event_N-1+K to Event_N-1)

CSCI 5582 Fall 2006 First Order Markov Assumption P(E1)P(E2|E1)P(E3|E1,E2)P(E4|E1,E2,E3)P(E5|E1,E2,E3,E4) P(E1)P(E2|E1)P(E3|E2)P(E4|E3)P(E5|E4)

CSCI 5582 Fall 2006 Markov Models As with all our models, let’s assume some fixed inventory of possible events that can occur in time Let’s assume for now that any given point in time, all events are possible, although not equally likely

CSCI 5582 Fall 2006 Markov Models You can view simple Markov assumptions as arising from underlying probabilistic state machines. In the simplest case (first order), events correspond to states and the probabilities are governed by probabilities on the transitions in the machine.

CSCI 5582 Fall 2006 Weather Let’s say we’re tracking the weather and there are 4 possible events (each day, only one per day) –Sun, clouds, rain, snow

CSCI 5582 Fall 2006 Example Rain Snow Sun Clouds

CSCI 5582 Fall 2006 Belief Net Version Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Time

CSCI 5582 Fall 2006 Example In this case we need a 4x4 matrix of transition probabilities. –For example P(Rain|Cloudy) or P(Sunny|Sunny) etc And we need a set of initial probabilities P(Rain). That’s just an array of 4 numbers.

CSCI 5582 Fall 2006 Example So to get the probability of a sequence like –Rain rain rain snow –You just march through the state machine –P(Rain)P(rain|rain)P(rain|rain)P(snow|rain)

CSCI 5582 Fall 2006 Belief Net Version Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Time

CSCI 5582 Fall 2006 Example Say that I tell you that –Rain rain rain snow has happened –How would you answer What’s the most likely thing to happen next?

CSCI 5582 Fall 2006 Belief Net Version Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Rain Snow Sun Clouds Time Rain Snow Sun Clouds Max

CSCI 5582 Fall 2006 Weird Example What if you couldn’t actually see the weather? –You’re a security guard who lives and works in a secure facility underground. –You watch people coming and going with various things (snow boots, umbrellas, ice cream cones) –Can you figure out the weather?

CSCI 5582 Fall 2006 Hidden Markov Models Add an output to the states. I.e. when a state is entered it outputs a symbol. You can view the outputs, but not the states directly. –States can output different symbols at different times –Same symbol can come from many states.

CSCI 5582 Fall 2006 Hidden Markov Models The point –The observable sequence of symbols does not uniquely determine a sequence of states. Can we nevertheless reason about the underlying model, given the observations?

CSCI 5582 Fall 2006 Hidden Markov Model Assumptions Now we’re going to make two independence assumptions –The state we’re in depends probabilistically only on the state we were last in (first order Markov assumpution) –The symbol we’re seeing only depends probabilistically on the state we’re in

CSCI 5582 Fall 2006 Hidden Markov Models Now the model needs –The initial state priors P(State i ) –The transition probabilities (as before) P(State j |State k ) –The output probabilities P(Observation i |State k )

CSCI 5582 Fall 2006 HMMs The joint probability of a state sequence and an observation sequence is…

CSCI 5582 Fall 2006 Noisy Channel Applications The hidden model represents an original signal (sequence of words, letters, etc) This signal is corrupted probabilistically. Use an HMM to recover the original signal Speech, OCR, language translation, spelling correction,…

CSCI 5582 Fall 2006 Three Problems The probability of an observation sequence given a model –Forward algorithm –Prediction falls out from this The most likely path through a model given an observed sequence –Viterbi algorithm –Sometimes called decoding Finding the most likely model (parameters) given an observed sequence –EM Algorithm