Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 4: Classification Methods (Definition) Prepared by: Mahmoud Rafeek Al-Farra 2013 www.cst.ps/staff/mfarra
Course’s Out Lines Introduction Data Preparation and Preprocessing Data Representation Classification Methods Evaluation Clustering Methods Mid Exam Association Rules Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students
Out Lines Definition of Classification Learning Supervised vs Unsupervised Classification vs. Prediction How does the classification work ?!
Definition of Classification Classification is (Techniques used to predict group membership for data instances). For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy” or “cloudy”.
Definition of Classification Classification is a classic data mining task, with roots in machine learning. A typical application is : "Given past records of customers who switched to another supplier, predict which current customers are likely to do the same."
Classes are pre-defined Learning This type of learning called supervised learning Example: Classification model may be built to categorize bank loan applications as either safe or risky. Classes are pre-defined
Supervised vs Unsupervised Supervised learning (classification) The set of possible classes is known in advance. New data is classified based on the training set Unsupervised learning (clustering) Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering.
Classification vs. Prediction predicts categorical class labels (discrete or nominal) classifies data based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction: models continuous-valued functions, i.e., predicts unknown or missing values
How ?! Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
How ?! Example: Process 1 Model Construction Classification Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’
How ?! Tenured? Process 2 Using the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?
How ?! 12
Next … Machine learning techniques Decision Trees Neural Networks k-Nearest Neighbors Naïve Bayesian Classifiers
Thanks