Classification And Bayesian Learning Supervisor Prof. Dr. Mohamed Batouche Presented By Abdu Hassan AL- Gomai
Contents Classification vs. Prediction. Classification Step Process. Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification.
Classification vs. Prediction What is the difference between classification and prediction? The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( http://www.kdnuggets.com/faq/classification-vs-prediction.html )]. classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data. Typical Applications Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis.
Classification—A Two-Step Process Model construction: describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of tuples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula. Model usage: for classifying future or unknown objects Estimate accuracy of the model. The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known.
Classification Process (1): Model Construction Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’
Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?
Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents input-output pairs). New data is classified based on the training set. Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.
Major Classification Models Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods KNN Boosting Bagging …
Evaluating Classification Methods Predictive accuracy Speed and scalability time to construct the model. time to use the model. Robustness handling noise and missing values. Scalability efficiency with respect to large data. Interpretability: understanding and insight provided by the model. Goodness of rules compactness of classification rules.
Bayesian Classification Here we learn: Bayesian classification E.g. How to decide if a patient is ill or healthy, based on A probabilistic model of the observed data Prior knowledge.
Classification problem Training data: examples of the form (d,h(d)) where d are the data objects to classify (inputs) and h(d) are the correct class info for d, h(d){1,…K} Goal: given dnew, provide h(dnew)
Why Bayesian? Provides practical learning algorithms E.g. Naïve Bayes Prior knowledge and observed data can be combined It is a generative (model based) approach, which offers a useful conceptual framework E.g. sequences could also be classified, based on a probabilistic model specification Any kind of objects can be classified, based on a probabilistic model specification
Bayes’ Rule Who is who in Bayes’ rule
Naïve Bayes Classifier What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier One of the most practical learning methods Successful applications: Medical Diagnosis Text classification
Naïve Bayesian Classifier: Example1 The Evidence relates all attributes without Exceptions. Outlook Temp. Humidity Windy Play Sunny Cool High True ? Evidence E Probability of class “yes”
Outlook Humidity Windy Play 2 3 4 6 9 5 1 Temperature Humidity Windy Play Yes No Sunny 2 3 Hot High 4 False 6 9 5 Overcast Mild Normal 1 True Rainy Cool Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool Outlook Temp Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Cool Normal
Compute Prediction For New Day Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool For compute prediction for new day: Outlook Temp. Humidity Windy Play Sunny Cool High True ? Likelihood of the two classes For “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053 For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206 Conversion into a probability by normalization: P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205 P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
Naïve Bayesian Classifier: Example2 Training dataset Class: C1:buys_computer= ‘yes’ C2:buys_computer= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)
Naïve Bayesian Classifier: Example2 Compute P(X/Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 X=(age<=30 ,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007 X belongs to class “buys_computer=yes”
Naïve Bayesian Classifier: Advantages and Disadvantages Easy to implement. Good results obtained in most of the cases. Disadvantages Assumption: class conditional independence , therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier. How to deal with these dependencies? Bayesian Belief Networks.
References Software: NB for classifying text: http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html Useful reading for those interested to learn more about NB classification, beyond the scope of this module: http://www-2.cs.cmu.edu/~tom/NewChapters.html. http:// www.cs.unc.edu/Courses/comp790-090 s08/Lecturenotes. Introduction to Bayesian Learning, School of Computer Science, University of Birmingham, A.Kaban@cs.bham.ac.uk.