Download presentation
Presentation is loading. Please wait.
Published byBernadette Hill Modified over 9 years ago
1
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia
2
What Does Data Mining Do? Extract patterns from data – Pattern? A mathematical (numeric and/or symbolic) relationship among data items. Types of patterns – Association – Classification & Prediction – Cluster (segmentation)
3
Knowledge Discovery Steps in a Knowledge Discovery process
4
Supervised vs. Unsupervised Learning Supervised learning (classification) –Supervision: The training of data (observations, constructs, variables, eye-movement parameters, etc.) indicating the class of the observations (out put, dependent variable, known class, etc.). = model to be tested. Unsupervised learning (clustering & association) n –Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
5
Classification vs. Prediction Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction (Regression): Similar to classification but with identifying the unknown or missing values
6
Classification My DV My IV
7
Classification: A Two-Step Process Model construction: describing a set of predetermined classes – Each case/instance is assumed to belong to a predefined class, as determined by the class label attribute (DV) – The set of cases used for model construction name training set Model usage: for classifying future or unknown objects – Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model
8
Classification Process (1): Model Construction Training Data Classification Algorithms IF Hosam= ‘ Senior lecturer ’ OR years > 3 THEN tenured = ‘ yes ’ Classifier (Model)
9
Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Anwer, Assoicate, 4) Bonus?
10
10 Learning and using a model Learning – Learning algorithm takes instances of concept as input – Produces a structural description (model) as output Input: concept to learn Learning algorithm Model Prediction Model takes new instance as input Outputs prediction Input Model Prediction
11
Other Classification Techniques Decision tree analysis, J48 (most popular) Neural networks Support vector machines (most popular) Naïve Baye (most popular)
12
Classification by Decision Tree Induction Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution
13
Accuracy Measures Most accuracy measures are derived from the classification matrix (also called the confusion matrix.) This matrix summarizes the correct and incorrect classifications that a classifier produced for a certain dataset. Rows and columns of the confusion matrix correspond to the true and predicted classes respectively. 13
14
ROC Curves Receiver operator characteristic Summarize & present performance of any binary classification model Models ability to distinguish between false & true positives
15
Cont…. Receiver Operator Characteristic (ROC) curves are commonly used to show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples.
16
ROC vs Precision & Recall (PR)
18
Classification? I use classifier to identify the characteristics for each animal to be used later for prediction model testing. TailHoofRibDewlapStirrupReinsTwistAnimal yesYesNo Yes NoHorse yesYesNo Yes NoHorse noYesNoYesNo YesSheep yesNoYesNo Rabbit yesNoYesNo Rabbit noYesNoYesNo YesSheep yesYeNo Yes NoHorse
19
Prediction? To have the characteristics but do not know to whom it belongs!! TailHoofRibDewlapStirrupReinsTwistAnimal yesYesNo Yes No? yesYesNo Yes No? noYesNoYesNo Yes? yesNoYesNo ? yesNoYesNo ? noYesNoYesNo Yes? yesYeNo Yes No?
20
Summary Classification predicts class labels Numeric prediction models continued-valued functions Two steps of classification: 1) Training 2) Testing and using
21
Now lets check it out using Weka
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.