Classification and Prediction

Slides:



Advertisements
Similar presentations
Classification and Prediction
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Decision Tree Approach in Data Mining
Exploratory Data Mining and Data Preparation
Classification and Prediction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Classification and Prediction
Classification.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Chapter 4 Classification and Scoring
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
Data Mining – Day 2 Fabiano Dalpiaz Department of Information and Communication Technology University of Trento - Italy
11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Han: KDD --- Classification 1 Classification — Slides for Textbook — — Chapter 7 — ©Jiawei Han and Micheline Kamber Intelligent Database Systems Research.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Classification And Bayesian Learning
Classification and Prediction
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Waqas Haider Bangyal. Classification Vs Clustering In general, in classification you have a set of predefined classes and want to know which class a new.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
Machine Learning with Spark MLlib
Classification and Prediction
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 6 Classification and Prediction
Information Management course
Dipartimento di Ingegneria «Enzo Ferrari»,
Classification and Prediction
Classification and Prediction — Slides for Textbook — — Chapter 7 —
Data Mining II: Association Rule mining & Classification
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining: Concepts and Techniques
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining – Chapter 3 Classification
Classification & Prediction
Data Mining Functionalities (2)
Discriminative Frequent Pattern Analysis for Effective Classification
Supervised vs. unsupervised Learning
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
Classification and Prediction
©Jiawei Han and Micheline Kamber
Classification.
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
Classification 1.
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Classification and Prediction

Classification, Regression, and Prediction Predict categorical class labels Classify data (constructs a model) based on training set and values (class labels) in a classifying attribute and uses it in classifying new data Regression: Model continuous-valued functions; i.e., predicts unknown or missing values Prediction: Classification + Regression Sometimes refers only to regression (e.g., in the text book)

Classification—A Two-Step Process Step 1. Model construction: describing a set of predetermined classes Set of tuples used for model construction: training set Each tuple/sample is assumed to belong to a predefined class, as determined by class label attribute Model is represented as classification rules, decision trees, or mathematical formulae IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classification—A Two-Step Process Step 2. Model usage: for classifying future or unknown objects Estimate predictive accuracy of model Known label of test sample is compared with classified result from model Accuracy rate is percentage of test set samples that are correctly classified by model Test set is independent of training set, otherwise over-fitting will occur IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ (Jeff, Professor, 4)

Classification Process (1): Model Construction Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classification Process (2): Use Model in Prediction Classifier (Model) (Jeff, Professor, 4) Tenured? Unseen Data Yes IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Test Data

Supervised versus Unsupervised Learning Supervised learning (classification) Supervision: Training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on training set Unsupervised learning (clustering) Class labels of training data are unknown Given a set of measurements, observations, etc., need to establish existence of classes or clusters in data

Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction Bayesian Classification Classification based on concepts from association rule mining Other Classification Methods Prediction Classification accuracy Summary

Issues (1): Data Preparation Data cleaning Preprocess data in order to reduce noise (e.g., by smoothing) and handle missing values (e.g., use most commonly occurring value) Help to reduce confusion during learning Relevance analysis (feature selection) Remove irrelevant or redundant attributes Data transformation Generalize (to higher level concepts) and/or normalize data (scaling values so that they fall within specified range)

Issues (2): Evaluating Classification Methods Predictive accuracy Predict class label Speed Time to construct model Time to use model Robustness Make correct prediction given noise and missing values Scalability Construct model efficiently given data size Interpretability: Understanding and insight provided by model Goodness of rules Decision tree size Compactness of classification rules