Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Classification Techniques: Decision Tree Learning
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Naïve Bayes Classifier
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Review: Bayesian learning and inference
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Bayesian networks. Weather data What is the Bayesian Network corresponding to Naïve Bayes?
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Bayesian networks Chapter 14 Section 1 – 2.
Data Mining with Naïve Bayesian Methods
Bayesian Belief Networks
Review. Belief and Probability The connection between toothaches and cavities is not a logical consequence in either direction. However, we can provide.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Practice of Bayesian Networks
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Can we certainly derive the diagnostic rule: if Toothache=true then Cavity=true.
Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Read R&N Ch Next lecture: Read R&N
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic Reasoning [Ch. 14] Bayes Networks – Part 1 ◦Syntax ◦Semantics ◦Parameterized distributions Inference – Part2 ◦Exact inference by enumeration.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
Bayesian networks Chapter 14 Slide Set 2. Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Bayesian networks practice (Weka). Weather data What is the Bayesian Network corresponding to Naïve Bayes?
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Another look at Bayesian inference
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Inference in Bayesian Networks
Final figure Classify it Classify it.
Naïve Bayes Classifier
Read R&N Ch Next lecture: Read R&N
Naïve Bayes Classifier
Bayesian Networks Probability In AI.
Bayesian Classification
Read R&N Ch Next lecture: Read R&N
Naïve Bayes Classifier
CS 188: Artificial Intelligence Fall 2007
Belief Networks CS121 – Winter 2003 Belief Networks.
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
NAÏVE BAYES CLASSIFICATION
Presentation transcript:

Bayesian networks practice

Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables X 1,…,X n. The probability for them to have the values x 1,…,x n respectively is P(x n,…,x 1 ): P(x n,…,x 1 ): is short for P(X n =x n,…, X n = x 1 ): We order them according to the topological of the given BayesNet

Inference in Bayesian Networks Basic task is to compute the posterior probability for a query variable, given some observed event –that is, some assignment of values to a set of evidence variables. Notation: –X denotes query variable –E denotes the set of evidence variables E 1,…,E m, and e is a particular event, i.e. an assignment to the variables in E. –Y will denote the set of the remaining variables (hidden variables). A typical query asks for the posterior probability P(x|e 1,…,e m ) E.g. We could ask: What’s the probability of a burglary if both Mary and John call, P(burglary | johhcalls, marycalls)?

Classification We compute and compare the following: However, how do we compute: What about the hidden variables Y 1,…,Y k ?

Inference by enumeration Example: P(burglary | johhcalls, marycalls)? (Abbrev. P(b|j,m))

Another example Once the right topology has been found. the probability table associated with each node is determined. Estimating such probabilities is fairly straightforward and is similar to the approach used by naïve Bayes classifiers.

High Blood Pressure Suppose we get to know that the new patient has high blood pressure. What’s the probability he has heart disease under this condition?

High Blood Pressure (Cont’d)

High Blood Pressure (  )

High Blood Pressure, Healthy Diet, and Regular Exercise

High Blood Pressure, Healthy Diet, and Regular Exercise (Cont’d)

The model therefore suggests that eating healthily and exercising regularly may reduce a person's risk of getting heart disease.

Weather data What is the Bayesian Network corresponding to Naïve Bayes?

“Effects” and “Causes” vs. “Evidence” and “Class” Why Naïve Bayes has this graph? Because when we compute in Naïve Bayes: P(play=yes | E) = P(Outlook=Sunny | play=yes) * P(Temp=Cool | play=yes) * P(Humidity=High | play=yes) * P(Windy=True | play=yes) * P(play=yes) / P(E) we are interested in computing P(…|play=yes), which are probabilities of our evidence “observations” given the class. Of course, “play” isn’t a cause for “outlook”, “temperature”, “humidity”, and “windy”. However, “play” is the class and knowing that it has a certain value, will influence the observational evidence probability values. For example, if play=yes, and we know that the playing happens indoors, then it is more probable (than without this class information) the outlook to be observed “rainy.”

Right or Wrong Topology? In general, there is no right or wrong graph topology. –Of course the calculated probabilities (from the data) will be different for different graphs. –Some graphs will induce better classifiers than some other. –If you reverse the arrows in the previous figure, then you get a pure causal graph, whose induced classifier might have estimated error (through cross- validation) better or worse than the Naïve Bayes one (depending on the data). If the topology is constructed manually, we (humans) tend to prefer the causal direction. –In domains such as medicine the graphs are usually less complex in the causal direction.

Weka suggestion How Weka finds the shape of the graph? Fixes an order of attributes (variables) and then adds and removes arcs until it gets the smallest estimated error (through cross-validation). By default it starts with a Naïve Bayes network. Also, it maintains a score of graph complexity, trying to keep the complexity low.

You can change to 2 for example. If you do, then the max number of parents for a node will be 2. It is going to start with a Naïve Bayes graph and then try to add/remove arcs. Laplace correction. Better change it to 1, to be compatible with the counter initialization in Naïve Bayes.

Play probability table Based on the data… P(play=yes) = 9/14 P(play=no) = 5/14 P(play=yes) = (9+1)/(14+2) =.625 P(play=yes) = (5+1)/(14+2) =.375 Let’s correct with Laplace …

Outlook probability table Based on the data… P(outlook=sunny|play=yes) = (2+1)/(9+3) =.25 P(outlook=overcast|play=yes) = (4+1)/(9+3) =.417 P(outlook=rainy|play=yes) = (3+1)/(9+3) =.333 P(outlook=sunny|play=no) = (3+1)/(5+3) =.5 P(outlook=overcast|play=no) = (0+1)/(5+3) =.125 P(outlook=rainy|play=no) = (2+1)/(5+3) =.375

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 Based on the data…let’s find the conditional probabilities for “windy”

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 P(windy=true|play=yes,outlook=overcast) = 0.5 P(windy=true|play=yes,outlook=rainy) = 0.2 P(windy=true|play=no,outlook=sunny) = 0.4 P(windy=true|play=no,outlook=overcast) = 0.5 P(windy=true|play=no,outlook=rainy) = 0.75 Based on the data…

Final figure Classify it

Classification I Classify it P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=yes) *P(outlook=sunny|play=yes) *P(temp=cool|play=yes, outlook=sunny) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes, outlook=sunny) =  *0.625*0.25*0.4*0.2*0.5 =  *

Classification II Classify it P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=no) *P(outlook=sunny|play=no) *P(temp=cool|play=no, outlook=sunny) *P(humidity=high|play= no, temp=cool) *P(windy=true|play=no, outlook=sunny) =  *0.375*0.5*0.167*0.333*0.4 =  *

Classification III Classify it P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  * P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *  = 1/( ) = P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) = * = 0.60

Classification IV (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes) *P(outlook|play=yes) *P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes,outlook) =…(next slide)

Classification V (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes)*P(outlook|play=yes)*P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook) =  *[ P(play=yes)*P(outlook= sunny|play=yes)*P(temp=cool|play=yes,outlook=sunny) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=sunny) +P(play=yes)*P(outlook= overcast|play=yes)*P(temp=cool|play=yes,outlook=overcast) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=overcast) +P(play=yes)*P(outlook= rainy|play=yes)*P(temp=cool|play=yes,outlook=rainy) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=rainy) ] =  *[ 0.625*0.25*0.4*0.2* *0.417*0.286*0.2* *0.33*0.333*0.2*0.2 ] =  *

Classification VI (missing values or hidden variables) P(play=no|temp=cool, humidity=high, windy=true) =  *  outlook P(play=no)*P(outlook|play=no)*P(temp=cool|play=no,outlook) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook) =  *[ P(play=no)*P(outlook=sunny|play=no)*P(temp=cool|play=no,outlook=sunny) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=sunny) +P(play=no)*P(outlook= overcast|play=no)*P(temp=cool|play=no,outlook=overcast) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=overcast) +P(play=no)*P(outlook= rainy|play=no)*P(temp=cool|play=no,outlook=rainy) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=rainy) ] =  *[ 0.375*0.5*0.167*0.333* *0.125*0.333*0.333* *0.375*0.4*0.333*0.75 ] =  *0.0208

Classification VII (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  * P(play=no|temp=cool, humidity=high, windy=true) =  *  =1/( )= P(play=yes|temp=cool, humidity=high, windy=true) = * = 0.44 P(play=no|temp=cool, humidity=high, windy=true) = * = 0.56 I.e. P(play=yes|temp=cool, humidity=high, windy=true) is 44% and P(play=no|temp=cool, humidity=high, windy=true) is 56% So, we predict ‘play=no.’