Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Homework 3: Naive Bayes Classification
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Probabilistic Reasoning (2)
Bayesian network inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian Belief Networks
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 561, Session 29 1 Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Probabilistic Reasoning
Bayesian Networks Textbook: Probabilistic Reasoning, Sections 1-2, pp
Approximate Inference 2: Monte Carlo Markov Chain
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
INC 551 Artificial Intelligence Lecture 8 Models of Uncertainty.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Bayesian approaches to knowledge representation and reasoning Part 1 (Chapter 13)
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Inference Algorithms for Bayes Networks
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 416 Artificial Intelligence Lecture 16 Uncertainty Chapter 14 Lecture 16 Uncertainty Chapter 14.
CS 541: Artificial Intelligence
Web-Mining Agents Data Mining
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Inference in Bayesian Networks
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Artificial Intelligence
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Approximate Inference by Sampling
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Bayesian Networks

Graphical Models Bayesian networks Conditional random fields etc.

Topic=Course “Grading” “Instructor”“Syllabus” “Vapnik” “Exams” Naive Bayes:

Bayesian networks Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized. Topic=Russian Mathematicians Topic=Course Vapnik Syllabus

true0.01 false0.99 Course truefalse true false Syllabus Course true0.2 false0.8 Russian Math truefalse True TrueFalse FalseTrue false Vapnik Course R.M Conditional probability tables for each node Topic=Russian Mathematicians Topic=Course Vapnik Syllabus

Semantics of Bayesian networks If network is correct, can calculate full joint probability distribution from network. where parents(X i ) denotes specific values of parents of X i. Compare with Naïve Bayes.

Example Calculate

Examples What is unconditional (marginal) probability that Vapnik is true?

What is the unconditional (marginal) probability that Russian Mathematicians is true?

Causal inference Evidence is cause, inference is probability of effect Example: Instantiate evidence Course= true. What is P(Syllabus | Course)? Different types of inference in Bayesian Networks

Diagnostic inference Evidence is effect, inference is probability of cause Example: Instantiate evidence Syllabus = true. What is P(Course | Syllabus)?

Example: What is P(Course|Vapnik)?

Inter-causal inference Explain away different possible causes of effect Example: What is P(Course|Vapnik,RussianMath)? Why is P(Course|Vapnik,RussianMath) < P(Course|Vapnik)?

Complexity of Bayesian Networks For n random Boolean variables: Full joint probability distribution: 2 n entries Bayesian network with at most k parents per node: –Each conditional probability table: at most 2 k entries –Entire network: n 2 k entries

What are the advantages of Bayesian networks? Intuitive, concise representation of joint probability distribution (i.e., conditional dependencies) of a set of random variables. Represents “beliefs and knowledge” about a particular class of situations. Efficient (?) (approximate) inference algorithms Efficient, effective learning algorithms

Issues in Bayesian Networks Building / learning network topology Assigning / learning conditional probability tables Approximate inference via sampling

In general, however, exact inference in Bayesian networks is too expensive.

Approximate inference in Bayesian networks Instead of enumerating all possibilities, sample to estimate probabilities. X 1 X 2 X 3 X n...

General question: What is P(X|e)? Notation convention: upper-case letters refer to random variables; lower-case letters refer to specific values of those variables

Direct Sampling Suppose we have no evidence, but we want to determine P(C,S,R,W) for all C,S,R,W. Direct sampling: –Sample each variable in topological order, conditioned on values of parents. –I.e., always sample from P(X i | parents(X i ))

1.Sample from P(Cloudy). Suppose returns true. 2.Sample from P(Sprinkler | Cloudy = true). Suppose returns false. 3.Sample from P(Rain | Cloudy = true). Suppose returns true. 4.Sample from P(WetGrass | Sprinkler = false, Rain = true). Suppose returns true. Here is the sampled event: [true, false, true, true] Example

Suppose there are N total samples, and let N S (x 1,..., x n ) be the observed frequency of the specific event x 1,..., x n. Suppose N samples, n nodes. Complexity O(Nn). Problem 1: Need lots of samples to get good probability estimates. Problem 2: Many samples are not realistic; low likelihood.

Markov Chain Monte Carlo Sampling One of most common methods used in real applications. Uses idea of “Markov blanket” of a variable X i : – parents, children, children’s parents

Illustration of “Markov Blanket” X

Recall that: By construction of Bayesian network, a node is conditionaly independent of its non-descendants, given its parents. Proposition: A node X i is conditionally independent of all other nodes in the network, given its Markov blanket.

Markov Chain Monte Carlo Sampling Algorithm Start with random sample from variables: (x 1,..., x n ). This is the current “state” of the algorithm. Next state: Randomly sample value for one non-evidence variable X i, conditioned on current values in “Markov Blanket” of X i.

Example Query: What is P(Rain | Sprinkler = true, WetGrass = true)? MCMC: –Random sample, with evidence variables fixed: [true, true, false, true] –Repeat: 1.Sample Cloudy, given current values of its Markov blanket: Sprinkler = true, Rain = false. Suppose result is false. New state: [false, true, false, true] 2.Sample Rain, given current values of its Markov blanket: Cloudy = false, Sprinkler = true, WetGrass = true. Suppose result is true. New state: [false, true, true, true].

Each sample contributes to estimate for query P(Rain | Sprinkler = true, WetGrass = true) Suppose we perform 100 such samples, 20 with Rain = true and 80 with Rain = false. Then answer to the query is Normalize (  20,80  ) = .20,.80  Claim: “The sampling process settles into a dynamic equilibrium in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability, given the evidence.” That is: for all variables X i, the probability of the value x i of X i appearing in a sample is equal to P(x i | e)..

Claim (again) Claim: MCMC settles into behavior in which each state is sampled exactly according to its posterior probability, given the evidence.