Bayesian networks practice (Weka). Weather data What is the Bayesian Network corresponding to Naïve Bayes?

Slides:

Advertisements

Similar presentations

Classification Techniques: Decision Tree Learning

Advertisements

What we will cover here What is a classifier

Naïve Bayes Classifier

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:

Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.

Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.

Review: Bayesian learning and inference

1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.

Bayesian networks. Weather data What is the Bayesian Network corresponding to Naïve Bayes?

1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.

Data Mining with Naïve Bayesian Methods

Bayesian Belief Networks

1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.

Practice of Bayesian Networks

Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Can we certainly derive the diagnostic rule: if Toothache=true then Cavity=true.

Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.

Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.

Data Mining – Algorithms: OneR Chapter 4, Section 4.1.

Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.

Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.

Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.

Classification Techniques: Bayesian Classification

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

Inference in Bayesian Networks

Final figure Classify it Classify it.

text processing And naïve bayes

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

Naïve Bayes Classifier

Classification Techniques: Bayesian Classification

Bayesian Classification

Propagation Algorithm in Bayesian Networks

Naïve Bayes Classifier

Generative Models and Naïve Bayes

CS 188: Artificial Intelligence Fall 2007

An Algorithm for Bayesian Network Construction from Data

Probabilistic Reasoning

Generative Models and Naïve Bayes

CS 188: Artificial Intelligence Spring 2006

NAÏVE BAYES CLASSIFICATION

Data Mining CSCI 307, Spring 2019 Lecture 18

Presentation transcript:

Bayesian networks practice (Weka)

Weather data What is the Bayesian Network corresponding to Naïve Bayes?

“Effects” and “Causes” vs. “Evidence” and “Class” Why Naïve Bayes has this graph? Because when we compute in Naïve Bayes: P(play=yes | E) = P(Outlook=Sunny | play=yes) * P(Temp=Cool | play=yes) * P(Humidity=High | play=yes) * P(Windy=True | play=yes) * P(play=yes) / P(E) we are interested in computing P(…|play=yes), which are probabilities of our evidence “observations” given the class. Of course, “play” isn’t a cause for “outlook”, “temperature”, “humidity”, and “windy”. However, “play” is the class and knowing that it has a certain value, will influence the observational evidence probability values. For example, if play=yes, and we know that the playing happens indoors, then it is more probable (than without this class information) the outlook to be observed “rainy.”

Right or Wrong Topology? In general, there is no right or wrong graph topology. –Of course the calculated probabilities (from the data) will be different for different graphs. –Some graphs will induce better classifiers than some other. –If you reverse the arrows in the previous figure, then you get a pure causal graph, whose induced classifier might have estimated error (through cross- validation) better or worse than the Naïve Bayes one (depending on the data). If the topology is constructed manually, we (humans) tend to prefer the causal direction. –In domains such as medicine the graphs are usually less complex in the causal direction.

Weka suggestion How Weka finds the shape of the graph? Fixes an order of attributes (variables) and then adds and removes arcs until it gets the smallest estimated error (through cross-validation). By default it starts with a Naïve Bayes network. Also, it maintains a score of graph complexity, trying to keep the complexity low.

You can change to 2 for example. If you do, then the max number of parents for a node will be 2. It is going to start with a Naïve Bayes graph and then try to add/remove arcs. Laplace correction. Better change it to 1, to be compatible with the counter initialization in Naïve Bayes.

Play probability table Based on the data… P(play=yes) = 9/14 P(play=no) = 5/14 P(play=yes) = (9+1)/(14+2) =.625 P(play=yes) = (5+1)/(14+2) =.375 Let’s correct with Laplace …

Outlook probability table Based on the data… P(outlook=sunny|play=yes) = (2+1)/(9+3) =.25 P(outlook=overcast|play=yes) = (4+1)/(9+3) =.417 P(outlook=rainy|play=yes) = (3+1)/(9+3) =.333 P(outlook=sunny|play=no) = (3+1)/(5+3) =.5 P(outlook=overcast|play=no) = (0+1)/(5+3) =.125 P(outlook=rainy|play=no) = (2+1)/(5+3) =.375

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 Based on the data…let’s find the conditional probabilities for “windy”

Windy probability table P(windy=true|play=yes,outlook=sunny) = (1+1)/(2+2) =.5 P(windy=true|play=yes,outlook=overcast) = 0.5 P(windy=true|play=yes,outlook=rainy) = 0.2 P(windy=true|play=no,outlook=sunny) = 0.4 P(windy=true|play=no,outlook=overcast) = 0.5 P(windy=true|play=no,outlook=rainy) = 0.75 Based on the data…

Final figure Classify it

Classification I Classify it P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=yes) *P(outlook=sunny|play=yes) *P(temp=cool|play=yes, outlook=sunny) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes, outlook=sunny) =  *0.625*0.25*0.4*0.2*0.5 =  *

Classification II Classify it P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *P(play=no) *P(outlook=sunny|play=no) *P(temp=cool|play=no, outlook=sunny) *P(humidity=high|play= no, temp=cool) *P(windy=true|play=no, outlook=sunny) =  *0.375*0.5*0.167*0.333*0.4 =  *

Classification III Classify it P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) =  * P(play=no|outlook=sunny, temp=cool,humidity=high, windy=true) =  *  = 1/( ) = P(play=yes|outlook=sunny, temp=cool,humidity=high, windy=true) = * = 0.60

Classification IV (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes) *P(outlook|play=yes) *P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes, temp=cool) *P(windy=true|play=yes,outlook) =…(next slide)

Classification V (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  *  outlook P(play=yes)*P(outlook|play=yes)*P(temp=cool|play=yes,outlook) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook) =  *[ P(play=yes)*P(outlook= sunny|play=yes)*P(temp=cool|play=yes,outlook=sunny) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=sunny) +P(play=yes)*P(outlook= overcast|play=yes)*P(temp=cool|play=yes,outlook=overcast) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=overcast) +P(play=yes)*P(outlook= rainy|play=yes)*P(temp=cool|play=yes,outlook=rainy) *P(humidity=high|play=yes,temp=cool)*P(windy=true|play=yes,outlook=rainy) ] =  *[ 0.625*0.25*0.4*0.2* *0.417*0.286*0.2* *0.33*0.333*0.2*0.2 ] =  *

Classification VI (missing values or hidden variables) P(play=no|temp=cool, humidity=high, windy=true) =  *  outlook P(play=no)*P(outlook|play=no)*P(temp=cool|play=no,outlook) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook) =  *[ P(play=no)*P(outlook=sunny|play=no)*P(temp=cool|play=no,outlook=sunny) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=sunny) +P(play=no)*P(outlook= overcast|play=no)*P(temp=cool|play=no,outlook=overcast) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=overcast) +P(play=no)*P(outlook= rainy|play=no)*P(temp=cool|play=no,outlook=rainy) *P(humidity=high|play=no,temp=cool)*P(windy=true|play=no,outlook=rainy) ] =  *[ 0.375*0.5*0.167*0.333* *0.125*0.333*0.333* *0.375*0.4*0.333*0.75 ] =  *0.0208

Classification VII (missing values or hidden variables) P(play=yes|temp=cool, humidity=high, windy=true) =  * P(play=no|temp=cool, humidity=high, windy=true) =  *  =1/( )= P(play=yes|temp=cool, humidity=high, windy=true) = * = 0.44 P(play=no|temp=cool, humidity=high, windy=true) = * = 0.56 I.e. P(play=yes|temp=cool, humidity=high, windy=true) is 44% and P(play=no|temp=cool, humidity=high, windy=true) is 56% So, we predict ‘play=no.’