Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.

Similar presentations


Presentation on theme: "Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc."— Presentation transcript:

1 Bayesian Networks

2 Graphical Models Bayesian networks Conditional random fields etc.

3 Topic=Course “Grading” “Instructor”“Syllabus” “Vapnik” “Exams” Naive Bayes:

4 Bayesian networks Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized. Topic=Russian Mathematicians Topic=Course Vapnik Syllabus

5 true0.01 false0.99 Course truefalse true0.90.1 false0.20.8 Syllabus Course true0.2 false0.8 Russian Math truefalse True 0.950.05 TrueFalse0.80.2 FalseTrue0.60.4 false 0.050.95 Vapnik Course R.M Conditional probability tables for each node Topic=Russian Mathematicians Topic=Course Vapnik Syllabus

6 Semantics of Bayesian networks If network is correct, can calculate full joint probability distribution from network. where parents(X i ) denotes specific values of parents of X i. Compare with Naïve Bayes.

7 Example Calculate

8 Examples What is unconditional (marginal) probability that Vapnik is true?

9 What is the unconditional (marginal) probability that Russian Mathematicians is true?

10 Causal inference Evidence is cause, inference is probability of effect Example: Instantiate evidence Course= true. What is P(Syllabus | Course)? Different types of inference in Bayesian Networks

11 Diagnostic inference Evidence is effect, inference is probability of cause Example: Instantiate evidence Syllabus = true. What is P(Course | Syllabus)?

12 Example: What is P(Course|Vapnik)?

13 Inter-causal inference Explain away different possible causes of effect Example: What is P(Course|Vapnik,RussianMath)? Why is P(Course|Vapnik,RussianMath) < P(Course|Vapnik)?

14 Complexity of Bayesian Networks For n random Boolean variables: Full joint probability distribution: 2 n entries Bayesian network with at most k parents per node: –Each conditional probability table: at most 2 k entries –Entire network: n 2 k entries

15 What are the advantages of Bayesian networks? Intuitive, concise representation of joint probability distribution (i.e., conditional dependencies) of a set of random variables. Represents “beliefs and knowledge” about a particular class of situations. Efficient (?) (approximate) inference algorithms Efficient, effective learning algorithms

16 Issues in Bayesian Networks Building / learning network topology Assigning / learning conditional probability tables Approximate inference via sampling

17 In general, however, exact inference in Bayesian networks is too expensive.

18 Approximate inference in Bayesian networks Instead of enumerating all possibilities, sample to estimate probabilities. X 1 X 2 X 3 X n...

19 General question: What is P(X|e)? Notation convention: upper-case letters refer to random variables; lower-case letters refer to specific values of those variables

20 Direct Sampling Suppose we have no evidence, but we want to determine P(C,S,R,W) for all C,S,R,W. Direct sampling: –Sample each variable in topological order, conditioned on values of parents. –I.e., always sample from P(X i | parents(X i ))

21 1.Sample from P(Cloudy). Suppose returns true. 2.Sample from P(Sprinkler | Cloudy = true). Suppose returns false. 3.Sample from P(Rain | Cloudy = true). Suppose returns true. 4.Sample from P(WetGrass | Sprinkler = false, Rain = true). Suppose returns true. Here is the sampled event: [true, false, true, true] Example

22 Suppose there are N total samples, and let N S (x 1,..., x n ) be the observed frequency of the specific event x 1,..., x n. Suppose N samples, n nodes. Complexity O(Nn). Problem 1: Need lots of samples to get good probability estimates. Problem 2: Many samples are not realistic; low likelihood.

23 Markov Chain Monte Carlo Sampling One of most common methods used in real applications. Uses idea of “Markov blanket” of a variable X i : – parents, children, children’s parents

24 Illustration of “Markov Blanket” X

25 Recall that: By construction of Bayesian network, a node is conditionaly independent of its non-descendants, given its parents. Proposition: A node X i is conditionally independent of all other nodes in the network, given its Markov blanket.

26 Markov Chain Monte Carlo Sampling Algorithm Start with random sample from variables: (x 1,..., x n ). This is the current “state” of the algorithm. Next state: Randomly sample value for one non-evidence variable X i, conditioned on current values in “Markov Blanket” of X i.

27 Example Query: What is P(Rain | Sprinkler = true, WetGrass = true)? MCMC: –Random sample, with evidence variables fixed: [true, true, false, true] –Repeat: 1.Sample Cloudy, given current values of its Markov blanket: Sprinkler = true, Rain = false. Suppose result is false. New state: [false, true, false, true] 2.Sample Rain, given current values of its Markov blanket: Cloudy = false, Sprinkler = true, WetGrass = true. Suppose result is true. New state: [false, true, true, true].

28 Each sample contributes to estimate for query P(Rain | Sprinkler = true, WetGrass = true) Suppose we perform 100 such samples, 20 with Rain = true and 80 with Rain = false. Then answer to the query is Normalize (  20,80  ) = .20,.80  Claim: “The sampling process settles into a dynamic equilibrium in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability, given the evidence.” That is: for all variables X i, the probability of the value x i of X i appearing in a sample is equal to P(x i | e)..

29

30 Claim (again) Claim: MCMC settles into behavior in which each state is sampled exactly according to its posterior probability, given the evidence.


Download ppt "Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc."

Similar presentations


Ads by Google