Classification And Bayesian Learning

Slides:



Advertisements
Similar presentations
Bayesian Classification
Advertisements

Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Classification and Prediction
Classification & Prediction
Data Mining with Naïve Bayesian Methods
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
K Nearest Neighbor Classification Methods Qiang Yang.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Algorithms for Classification: The Basic Methods.
Classification.
Chapter 4 Classification and Scoring
Bayes Classification.
K Nearest Neighbor Classification Methods Qiang Yang.
Data Warehousing and Data Mining
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Bayesian Classification
Classification and Prediction
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
K Nearest Neighbor Classification Methods. Training Set.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Classification Today: Basic Problem Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 6 Classification and Prediction
Bayesian Classification
Classification and Prediction
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Classification and Prediction
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
©Jiawei Han and Micheline Kamber
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
Classification 1.
Intro. to Data Mining Chapter 6. Bayesian.
Presentation transcript:

Classification And Bayesian Learning Supervisor Prof. Dr. Mohamed Batouche Presented By Abdu Hassan AL- Gomai

Contents Classification vs. Prediction. Classification Step Process. Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification.

Classification vs. Prediction What is the difference between classification and prediction? The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( http://www.kdnuggets.com/faq/classification-vs-prediction.html )]. classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data. Typical Applications Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis.

Classification—A Two-Step Process Model construction: describing a set of predetermined classes. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of tuples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula. Model usage: for classifying future or unknown objects Estimate accuracy of the model. The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known.

Classification Process (1): Model Construction Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents input-output pairs). New data is classified based on the training set. Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.

Major Classification Models Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods KNN Boosting Bagging …

Evaluating Classification Methods Predictive accuracy Speed and scalability time to construct the model. time to use the model. Robustness handling noise and missing values. Scalability efficiency with respect to large data. Interpretability: understanding and insight provided by the model. Goodness of rules compactness of classification rules.

Bayesian Classification Here we learn: Bayesian classification E.g. How to decide if a patient is ill or healthy, based on A probabilistic model of the observed data Prior knowledge.

Classification problem Training data: examples of the form (d,h(d)) where d are the data objects to classify (inputs) and h(d) are the correct class info for d, h(d){1,…K} Goal: given dnew, provide h(dnew)

Why Bayesian? Provides practical learning algorithms E.g. Naïve Bayes Prior knowledge and observed data can be combined It is a generative (model based) approach, which offers a useful conceptual framework E.g. sequences could also be classified, based on a probabilistic model specification Any kind of objects can be classified, based on a probabilistic model specification

Bayes’ Rule Who is who in Bayes’ rule

Naïve Bayes Classifier What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier One of the most practical learning methods Successful applications: Medical Diagnosis Text classification

Naïve Bayesian Classifier: Example1 The Evidence relates all attributes without Exceptions. Outlook Temp. Humidity Windy Play Sunny Cool High True ? Evidence E Probability of class “yes”

Outlook Humidity Windy Play 2 3 4 6 9 5 1 Temperature Humidity Windy Play Yes No Sunny 2 3 Hot High 4 False 6 9 5 Overcast Mild Normal 1 True Rainy Cool Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool Outlook Temp Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Cool Normal

Compute Prediction For New Day Sunny 2/9 3/5 Hot 2/5 High 3/9 4/5 False 6/9 9/14 5/14 Overcast 4/9 0/5 Mild Normal 1/5 True Rainy Cool For compute prediction for new day: Outlook Temp. Humidity Windy Play Sunny Cool High True ? Likelihood of the two classes For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053 For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206 Conversion into a probability by normalization: P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205 P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

Naïve Bayesian Classifier: Example2 Training dataset Class: C1:buys_computer= ‘yes’ C2:buys_computer= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)

Naïve Bayesian Classifier: Example2 Compute P(X/Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 X=(age<=30 ,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007 X belongs to class “buys_computer=yes”

Naïve Bayesian Classifier: Advantages and Disadvantages Easy to implement. Good results obtained in most of the cases. Disadvantages Assumption: class conditional independence , therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier. How to deal with these dependencies? Bayesian Belief Networks.

References Software: NB for classifying text: http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html Useful reading for those interested to learn more about NB classification, beyond the scope of this module: http://www-2.cs.cmu.edu/~tom/NewChapters.html. http:// www.cs.unc.edu/Courses/comp790-090 s08/Lecturenotes. Introduction to Bayesian Learning, School of Computer Science, University of Birmingham, A.Kaban@cs.bham.ac.uk.