Bayesian networks. Weather data What is the Bayesian Network corresponding to Naïve Bayes?

Slides:

Advertisements

Similar presentations

Classification Techniques: Decision Tree Learning

Advertisements

What we will cover here What is a classifier

Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.

Naïve Bayes Classifier

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

From Variable Elimination to Junction Trees

Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:

Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.

Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.

Review: Bayesian learning and inference

1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.

1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.

Data Mining with Naïve Bayesian Methods

Bayesian Belief Networks

1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.

Practice of Bayesian Networks

Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Can we certainly derive the diagnostic rule: if Toothache=true then Cavity=true.

Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.

Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.

Data Mining – Algorithms: OneR Chapter 4, Section 4.1.

Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.

Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.

Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.

Classification Techniques: Bayesian Classification

Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.

Bayesian networks practice (Weka). Weather data What is the Bayesian Network corresponding to Naïve Bayes?

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

Inference in Bayesian Networks

CSE543: Machine Learning Lecture 2: August 6, 2014

Final figure Classify it Classify it.

text processing And naïve bayes

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

Naïve Bayes Classifier

Decision Tree Saed Sayad 9/21/2018.

Bayesian Classification

CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.

CS539: Project 3 Zach Pardos.

CS 188: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

Markov Networks.

Naïve Bayes Classifier

Generative Models and Naïve Bayes

CS 188: Artificial Intelligence Fall 2007

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence

Generative Models and Naïve Bayes

CS 188: Artificial Intelligence Spring 2006

NAÏVE BAYES CLASSIFICATION

Presentation transcript:

Bayesian networks

Weather data What is the Bayesian Network corresponding to Naïve Bayes?

“Effects” and “Causes” vs. “Evidence” and “Class” Why Naïve Bayes has this graph? Because when we compute in Naïve Bayes: P(play=yes | E) = P(Outlook=Sunny | play=yes) * P(Temp=Cool | play=yes) * P(Humidity=High | play=yes) * P(Windy=True | play=yes) * P(play=yes) / P(E) we are interested in computing P(…|play=yes), which are probabilities of our evidence “observations” given the class. Jelas, “play” bukan penyebab “outlook”, “temperature”, “humidity”, dan “windy”. Akan tetapi, “play” adalah class dan diketauhui memiliki nilai tertentu, akan mempengaruhi probabilitas kejadian. Misl, if play=yes, dan kita tahu bahwa permainan akan dilakukan diluar lebih mungkin jika outlook the playing happens indoors, kemungkinan diluar outlook “rainy”

Casual BN Causality is not a well understood concept. –No widely accepted denition. –No consensus on whether it is a property of the world or a concept in our minds Sometimes causal relations are obvious: –Alarm causes people to leave building. –Lung Cancer causes mass on chest X-ray. At other times, they are not that clear. Doctors believe smoking causes lung cancer but the tobacco industry has a different story: SC Surgeon General (1964) * CS Tobacco Industry

Right or Wrong Topology? In general, there is no right or wrong graph topology. –Of course the calculated probabilities (from the data) will be different for different graphs. –Some graphs will induce better classifiers than some other. –If you reverse the arrows in the previous figure, then you get a pure causal graph, whose induced classifier might have estimated error (through cross- validation) better or worse than the Naïve Bayes one (depending on the data). If the topology is constructed manually, we (humans) tend to prefer the causal direction.

Play probability table Based on the data… P(play=yes) = 9/14 P(play=no) = 5/14 P(play=yes) = (9+1)/(14+2) =.625 P(play=yes) = (5+1)/(14+2) =.375 Let’s correct with Laplace …

Outlook probability table Based on the data… P(outlook=sunny|play=yes) = (2+1)/(9+3) =.25 P(outlook=overcast|play=yes) = (4+1)/(9+3) =.417 P(outlook=rainy|play=yes) = (3+1)/(9+3) =.333 P(outlook=sunny|play=no) = (3+1)/(5+3) =.5 P(outlook=overcast|play=no) = (0+1)/(5+3) =.125 P(outlook=rainy|play=no) = (2+1)/(5+3) =.375

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 Based on the data…let’s find the conditional probabilities for “windy”

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 P(windy=true|play=yes,outlook=overcast) = 0.5 P(windy=true|play=yes,outlook=rainy) = 0.2 P(windy=true|play=no,outlook=sunny) = 0.4 P(windy=true|play=no,outlook=overcast) = 0.5 P(windy=true|play=no,outlook=rainy) = 0.75 Based on the data…

Final figure Classify it

P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=yes) *P(outlook=sunny|play=yes) *P(temp=cool|play=yes, outlook=sunny) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes, outlook=sunny) =  *0.625*0.25*0.4*0.2*0.5 =  *

Classify it P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=no) *P(outlook=sunny|play=no) *P(temp=cool|play=no, outlook=sunny) *P(humidity=high|play= no, temp=cool) *P(windy=true|play=no, outlook=sunny) =  *0.375*0.5*0.167*0.333*0.4 =  *

Classify it P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  * P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *  = 1/( ) = P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) = * = 0.60

hidden variables P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes) *P(outlook|play=yes) *P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes,outlook) =…(next slide)

P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes)*P(outlook|play=yes)*P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook) =  *[ P(play=yes)*P(outlook= sunny|play=yes)*P(temp=cool|play=yes,outlook=sunny) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=sunny) +P(play=yes)*P(outlook= overcast|play=yes)*P(temp=cool|play=yes,outlook=overcast) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=overcast) +P(play=yes)*P(outlook= rainy|play=yes)*P(temp=cool|play=yes,outlook=rainy) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=rainy) ] =  *[ 0.625*0.25*0.4*0.2* *0.417*0.286*0.2* *0.33*0.333*0.2*0.2 ] =  *

hidden variables P(play=no|temp=cool, humidity=high, windy=true) =  *  outlook P(play=no)*P(outlook|play=no)*P(temp=cool|play=no,outlook) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook) =  *[ P(play=no)*P(outlook=sunny|play=no)*P(temp=cool|play=no,outlook=sunny) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=sunny) +P(play=no)*P(outlook= overcast|play=no)*P(temp=cool|play=no,outlook=overcast) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=overcast) +P(play=no)*P(outlook= rainy|play=no)*P(temp=cool|play=no,outlook=rainy) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=rainy) ] =  *[ 0.375*0.5*0.167*0.333* *0.125*0.333*0.333* *0.375*0.4*0.333*0.75 ] =  *0.0208

Classification VII (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  * P(play=no|temp=cool, humidity=high, windy=true) =  *  =1/( )= P(play=yes|temp=cool, humidity=high, windy=true) = * = 0.44 P(play=no|temp=cool, humidity=high, windy=true) = * = 0.56 I.e. P(play=yes|temp=cool, humidity=high, windy=true) is 44% and P(play=no|temp=cool, humidity=high, windy=true) is 56% So, we predict ‘play=no.’

How Weka finds the shape of the graph? Fixes an order of attributes (variables) and then adds and removes arcs until it gets the smallest estimated error (through cross-validation). By default it starts with a Naïve Bayes network. Also, it maintains a score of graph complexity, trying to keep the complexity low.

You can change to 2 for example. If you do, then the max number of parents for a node will be 2. It is going to start with a Naïve Bayes graph and then try to add/remove arcs. Laplace correction. Better change it to 1, to be compatible with the counter initialization in Naïve Bayes.