Classification And Bayesian Learning

Classification And Bayesian Learning
Supervisor Prof. Dr. Mohamed Batouche Presented By Abdu Hassan AL- Gomai

Contents Classification vs. Prediction. Classification Step Process.
Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification.

Classification vs. Prediction
What is the difference between classification and prediction? The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( )]. classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data. Typical Applications Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis.

Classification—A Two-Step Process
Model construction: describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of tuples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula. Model usage: for classifying future or unknown objects Estimate accuracy of the model. The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known.

Classification Process (1): Model Construction
Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classification Process (2): Use the Model in Prediction
Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Supervised vs. Unsupervised Learning
Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents input-output pairs). New data is classified based on the training set. Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.

Major Classification Models
Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods KNN Boosting Bagging …

Evaluating Classification Methods
Predictive accuracy Speed and scalability time to construct the model. time to use the model. Robustness handling noise and missing values. Scalability efficiency with respect to large data. Interpretability: understanding and insight provided by the model. Goodness of rules compactness of classification rules.

Bayesian Classification
Here we learn: Bayesian classification E.g. How to decide if a patient is ill or healthy, based on A probabilistic model of the observed data Prior knowledge.

Classification problem
Training data: examples of the form (d,h(d)) where d are the data objects to classify (inputs) and h(d) are the correct class info for d, h(d){1,…K} Goal: given dnew, provide h(dnew)

Why Bayesian? Provides practical learning algorithms E.g. Naïve Bayes
Prior knowledge and observed data can be combined It is a generative (model based) approach, which offers a useful conceptual framework E.g. sequences could also be classified, based on a probabilistic model specification Any kind of objects can be classified, based on a probabilistic model specification

Bayes’ Rule Who is who in Bayes’ rule

Naïve Bayes Classifier
What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier One of the most practical learning methods Successful applications: Medical Diagnosis Text classification

Naïve Bayesian Classifier: Example1
The Evidence relates all attributes without Exceptions. Outlook Temp. Humidity Windy Play Sunny Cool High True ? Evidence E Probability of class “yes”

Outlook Humidity Windy Play 2 3 4 6 9 5 1
Temperature Humidity Windy Play Yes No Sunny 2 3 Hot High 4 False 6 9 5 Overcast Mild Normal 1 True Rainy Cool Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool Outlook Temp Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Cool Normal

Compute Prediction For New Day
Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool For compute prediction for new day: Outlook Temp. Humidity Windy Play Sunny Cool High True ? Likelihood of the two classes For “yes” = 2/9  3/9  3/9  3/9  9/14 = For “no” = 3/5  1/5  4/5  3/5  5/14 = Conversion into a probability by normalization: P(“yes”) = / ( ) = 0.205 P(“no”) = / ( ) = 0.795

Training dataset Class: C1:buys_computer= ‘yes’ C2:buys_computer= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)

Compute P(X/Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 X=(age<=30 ,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= x x x =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007 X belongs to class “buys_computer=yes”

Naïve Bayesian Classifier: Advantages and Disadvantages
Easy to implement. Good results obtained in most of the cases. Disadvantages Assumption: class conditional independence , therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier. How to deal with these dependencies? Bayesian Belief Networks.

References Software: NB for classifying text:
Useful reading for those interested to learn more about NB classification, beyond the scope of this module: s08/Lecturenotes. Introduction to Bayesian Learning, School of Computer Science, University of Birmingham,

Classification And Bayesian Learning

Similar presentations

Presentation on theme: "Classification And Bayesian Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classification And Bayesian Learning

Similar presentations

Presentation on theme: "Classification And Bayesian Learning"— Presentation transcript:

Similar presentations

About project

Feedback