Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Techniques: Decision Tree Learning
Data Mining: A Closer Look Chapter Data Mining Strategies.
Lecture outline Classification Decision-tree classification.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Classification and Prediction
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Introduction to Machine Learning Approach Lecture 5.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Classification Techniques: Bayesian Classification
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Classification And Bayesian Learning
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Data Mining and Decision Support
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
CIS 335 CIS 335 Data Mining Classification Part I.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
Chapter 6 Classification and Prediction
Dipartimento di Ingegneria «Enzo Ferrari»,
Data Mining Lecture 11.
Classification and Prediction
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Supervised vs. unsupervised Learning
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
©Jiawei Han and Micheline Kamber
Practice Project Overview
Presentation transcript:

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia

What Does Data Mining Do? Extract patterns from data – Pattern? A mathematical (numeric and/or symbolic) relationship among data items. Types of patterns – Association – Classification & Prediction – Cluster (segmentation)

Knowledge Discovery Steps in a Knowledge Discovery process

Supervised vs. Unsupervised Learning Supervised learning (classification) –Supervision: The training of data (observations, constructs, variables, eye-movement parameters, etc.) indicating the class of the observations (out put, dependent variable, known class, etc.). = model to be tested. Unsupervised learning (clustering & association) n –Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Classification vs. Prediction Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction (Regression): Similar to classification but with identifying the unknown or missing values

Classification My DV My IV

Classification: A Two-Step Process Model construction: describing a set of predetermined classes – Each case/instance is assumed to belong to a predefined class, as determined by the class label attribute (DV) – The set of cases used for model construction name training set Model usage: for classifying future or unknown objects – Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model

Classification Process (1): Model Construction Training Data Classification Algorithms IF Hosam= ‘ Senior lecturer ’ OR years > 3 THEN tenured = ‘ yes ’ Classifier (Model)

Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Anwer, Assoicate, 4) Bonus?

10 Learning and using a model Learning – Learning algorithm takes instances of concept as input – Produces a structural description (model) as output Input: concept to learn Learning algorithm Model Prediction  Model takes new instance as input  Outputs prediction Input Model Prediction

Other Classification Techniques Decision tree analysis, J48 (most popular) Neural networks Support vector machines (most popular) Naïve Baye (most popular)

Classification by Decision Tree Induction Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution

Accuracy Measures Most accuracy measures are derived from the classification matrix (also called the confusion matrix.) This matrix summarizes the correct and incorrect classifications that a classifier produced for a certain dataset. Rows and columns of the confusion matrix correspond to the true and predicted classes respectively. 13

ROC Curves Receiver operator characteristic Summarize & present performance of any binary classification model Models ability to distinguish between false & true positives

Cont…. Receiver Operator Characteristic (ROC) curves are commonly used to show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples.

ROC vs Precision & Recall (PR)

Classification? I use classifier to identify the characteristics for each animal to be used later for prediction model testing. TailHoofRibDewlapStirrupReinsTwistAnimal yesYesNo Yes NoHorse yesYesNo Yes NoHorse noYesNoYesNo YesSheep yesNoYesNo Rabbit yesNoYesNo Rabbit noYesNoYesNo YesSheep yesYeNo Yes NoHorse

Prediction? To have the characteristics but do not know to whom it belongs!! TailHoofRibDewlapStirrupReinsTwistAnimal yesYesNo Yes No? yesYesNo Yes No? noYesNoYesNo Yes? yesNoYesNo ? yesNoYesNo ? noYesNoYesNo Yes? yesYeNo Yes No?

Summary Classification predicts class labels Numeric prediction models continued-valued functions Two steps of classification: 1) Training 2) Testing and using

Now lets check it out using Weka