Download presentation
Presentation is loading. Please wait.
1
Classification and Prediction
2
Classification, Regression, and Prediction
Predict categorical class labels Classify data (constructs a model) based on training set and values (class labels) in a classifying attribute and uses it in classifying new data Regression: Model continuous-valued functions; i.e., predicts unknown or missing values Prediction: Classification + Regression Sometimes refers only to regression (e.g., in the text book)
3
Classification—A Two-Step Process
Step 1. Model construction: describing a set of predetermined classes Set of tuples used for model construction: training set Each tuple/sample is assumed to belong to a predefined class, as determined by class label attribute Model is represented as classification rules, decision trees, or mathematical formulae IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’
4
Classification—A Two-Step Process
Step 2. Model usage: for classifying future or unknown objects Estimate predictive accuracy of model Known label of test sample is compared with classified result from model Accuracy rate is percentage of test set samples that are correctly classified by model Test set is independent of training set, otherwise over-fitting will occur IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ (Jeff, Professor, 4)
5
Classification Process (1): Model Construction
Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’
6
Classification Process (2): Use Model in Prediction
Classifier (Model) (Jeff, Professor, 4) Tenured? Unseen Data Yes IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Test Data
7
Supervised versus Unsupervised Learning
Supervised learning (classification) Supervision: Training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on training set Unsupervised learning (clustering) Class labels of training data are unknown Given a set of measurements, observations, etc., need to establish existence of classes or clusters in data
8
Classification and Prediction
What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification based on concepts from association rule mining Other Classification Methods Prediction Classification accuracy Summary
9
Issues (1): Data Preparation
Data cleaning Preprocess data in order to reduce noise (e.g., by smoothing) and handle missing values (e.g., use most commonly occurring value) Help to reduce confusion during learning Relevance analysis (feature selection) Remove irrelevant or redundant attributes Data transformation Generalize (to higher level concepts) and/or normalize data (scaling values so that they fall within specified range)
10
Issues (2): Evaluating Classification Methods
Predictive accuracy Predict class label Speed Time to construct model Time to use model Robustness Make correct prediction given noise and missing values Scalability Construct model efficiently given data size Interpretability: Understanding and insight provided by model Goodness of rules Decision tree size Compactness of classification rules
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.