Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Artificial Intelligence Probabilistic reasoning Fall 2008 professor: Luigi Ceccaroni.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Bayesian networks Chapter 14 Section 1 – 2.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin.
Bayesian Belief Networks
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian Networks Material used 1 Random variables
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Baye’s Rule.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Announcements Project 4: Ghostbusters Homework 7
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Bayesian networks Chapter 14 Slide Set 2. Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
A Brief Introduction to Bayesian networks
Another look at Bayesian inference
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
Bayesian Networks Probability In AI.
Read R&N Ch Next lecture: Read R&N
Probabilistic Reasoning; Network-based reasoning
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Bayesian Belief Network

The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI This can work well, even the assumption is not true!

v NB Naive Bayes assumption: which gives

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas Examples - Exercisos

Naive Bayes assumption of conditional independence too restrictive But it's intractable without some such assumptions... Bayesian Belief networks describe conditional independence among subsets of variables allows combining prior knowledge about (in)dependencies among variables with observed training data

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (X i | Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values

Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution X Y Z P  Nodes: random variables  Links: dependency  X,Y are the parents of Z, and Y is the parent of P  No dependency between Z and P  Has no loops or cycles

Conditional Independence Once we know that the patient has cavity we do not expect the probability of the probe catching to depend on the presence of toothache Independence between a and b

Example Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

Bayesian Belief Network: An Example Family History LungCancer PositiveXRay Smoker Emphysema Dyspnea LC ~LC (FH, S) (FH, ~S)(~FH, S)(~FH, ~S) Bayesian Belief Networks The conditional probability table for the variable LungCancer: Shows the conditional probability for each possible combination of its parents

Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Belief Networks Burglary P(B) Earthquake P(E) Alarm Burg. Earth.P(A) tt.95 tf.94 ft.29 f f.001 JohnCallsMaryCalls A P(J) t.90 f.05 A P(M) t.7 f.01

Full Joint Distribution

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n · 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, = 10 numbers (vs = 31)

Inference in Bayesian Networks How can one infer the (probabilities of) values of one or more network variables, given observed values of others? Bayes net contains all information needed for this inference If only one variable with unknown value, easy to infer it In general case, problem is NP hard

Example In the burglary network, we migth observe the event in which JohnCalls=true and MarryCalls=true We could ask for the probability that the burglary has occured P(Burglary|JohnCalls=ture,MarryCalls=true)

Remember - Joint distribution

Normalization

X is the query variable E evidence variable Y remaining unobservable variable Summation over all possible y (all possible values of the unobservable varables Y)

P(Burglary|JohnCalls=ture,MarryCalls=true) The hidden variables of the query are Earthquake and Alarm For Burglary=true in the Bayesain network

To compute we had to add four terms, each computed by multipling five numbers In the worst case, where we have to sum out almost all variables, the complexity of the network with n Boolean variables is O(n2 n )

P(b) is constant and can be moved out, P(e) term can be moved outside summation a JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%

Computation for Burglary=true

Variable elimination algorithm Eliminate repeated calculation Dynamic programming

Irrelevant variables (X query variable, E evidence variables)

Complexity of exact inference The burglary network belongs to a family of networks in which there is at most one undiracted path between tow nodes in the network These are called singly connected networks or polytrees The time and space complexity of exact inference in polytrees is linear in the size of network Size is defined by the number of CPT entries If the number of parents of each node is bounded by a constant, then the complexity will be also linear in the number of nodes

For multiply connected networks variable elimination can have exponentional time and space complexity

Constructing Bayesian Networks A Bayesian network is a correct representation of the domain only if each node is conditionally independent of its predecessors in the ordering, given its parents P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)

Conditional Independence relations in Bayesian networks The toopological semantics is given either of the spqcifications of DESCENDANTS or MARKOV BLANKET

Local semantics

Example JohnCalls is indipendent of Burglary and Earthquake given the value of Alarm

Example Burglary is indipendent of JohnCalls and MaryCalls given Alarm and Earthquake

Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n add X i to the network select parents from X 1, …,X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) This choice of parents guarantees: P (X 1, …,X n ) = π n i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π n i =1 P (X i | Parents(X i )) (by construction)

The compactness of Bayesian networks is an example of locally structured systems Each subcomponent interacts directly with only bounded number of other components Constructing Bayesian networks is difficult Each variable should be directly influenced by only a few others The network topology reflects thes direct influences

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)? Example

Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Example

Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: = 13 numbers needed Some links represent tenous relationship that require difficult and unnatural probability judgment, such the probability of Earthquake given Burglary and Alarm

Aprendizagem Redes Bayesianas Como preencher as entradas numa Tabela de Probabilidade Condicional 1º Caso: Se a estrutura da rede bayesiana fôr conhecida, e todas as variavéis podem ser observadas do conjunto de treino. Então: Entrada (i,j) = utilizando os valores observados no conjunto de treino 2º Caso: Se a estrutura da rede bayesiana fôr conhecida, e algumas das variavéis não podem ser observadas no conjunto de treino. Então utiliza-se método do algoritmo do gradiente ascendente

Exemplo 1º caso * Person FH S E LC PXRay D * P1 Sim Sim Não Sim + Sim * P2 Sim Não Não Sim - Sim * P3 Sim Não Sim Não + Não * P4 Não Sim Sim Sim - Sim * P5 Não Sim Não Não + Não * P6 Sim Sim ? ? ? ? LC ~LC (FH, S)(FH, ~S)(~FH, S)(~FH, ~S) 0.5 … … … … … … … P(LC = Sim \ FH=Sim, S=Sim) =0.5 Family History LungCancer Smoker Emphysema

Exemplo 2º caso Suppose structure known, variables partially observable Similar to training neural network with hidden units In fact, can learn network conditional probability tables using gradient ascent * Person FH S E LC PXRay D * P1 --- Sim --- Sim + Sim * P2 --- Não --- Sim - Sim * P3 --- Não --- Não + Não * P4 --- Sim --- Sim - Sim * P5 --- Sim --- Não + Não * P6 Sim Sim ? ? ? ?

Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct

-> P(d|a,b,c)=P(d|a,c)=0.66 ->

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas Examples - Exercisos

árv dec ID3