CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Algorithms and applications
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Statistics 202: Statistical Aspects of Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines
What is Statistical Modeling
Data Mining Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Online Algorithms – II Amrinder Arora Permalink:
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Ensemble Learning: An Introduction
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Classification II.
Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you.
CISC 4631 Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Linear Discrimination Reading: Chapter 2 of textbook.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining – Clustering and Classification 1.  Review Questions ◦ Question 1: Clustering and Classification  Algorithm Questions ◦ Question 2: K-Means.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Introduction to Classification using Linear Classifiers
2- Syllable Classification
The Classification Problem
Introduction to Machine Learning
Erich Smith Coleman Platt
*Currently, self driving cars do a bit of both.
CS 235 Decision Tree Classification
EECS 647: Introduction to Database Systems
Lecture Notes for Chapter 4 Introduction to Data Mining
Figure 1.1 Rules for the contact lens data.
*Currently, self driving cars do a bit of both.
Introduction to Data Mining, 2nd Edition by
Classifiers Fujinaga.
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Classifiers Fujinaga.
Machine Learning: Methodology Chapter
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1

Classification: Definition Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 2

Illustrating Classification Task 3

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc 4

Classification Techniques Decision Tree based Methods Rule-based Methods Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines We will start with a simple linear classifier 5

Grasshoppers Katydids The Classification Problem (informal definition) Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is. Katydid or Grasshopper? 6

Thorax Length AbdomenLength AntennaeLength MandibleSize Spiracle Diameter Leg Length For any domain of interest, we can measure features Color {Green, Brown, Gray, Other} Has Wings? 7

Insect ID AbdomenLengthAntennaeLength Insect Class Grasshopper Katydid Grasshopper Grasshopper Katydid Grasshopper Katydid Grasshopper Katydid Katydids ??????? We can store features in a database. My_Collection The classification problem can now be expressed as: Given a training database (My_Collection), predict the class label of a previously unseen instance previously unseen instance = 8

Antenna Length Grasshoppers Katydids Abdomen Length 9

Antenna Length Grasshoppers Katydids Abdomen Length Each of these data objects are called… exemplars (training) examples instances tuples 10

We will return to the previous slide in two minutes. In the meantime, we are going to play a quick game. 11

Examples of class A Examples of class B Problem 1 12

Examples of class A Examples of class B What class is this object? What about this one, A or B? Problem 1 13

Examples of class A Examples of class B Problem 2 Oh! This ones hard! 14

Examples of class A Examples of class B Problem 3 This one is really hard! What is this, A or B? This one is really hard! What is this, A or B? 15

Why did we spend so much time with this game? Because we wanted to show that almost all classification problems have a geometric interpretation, check out the next 3 slides… 16

Examples of class A Examples of class B Problem 1 Here is the rule again. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. Here is the rule again. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. Left Bar Right Bar 17

Examples of class A Examples of class B Problem 2 Left Bar Right Bar Let me look it up… here it is.. the rule is, if the two bars are equal sizes, it is an A. Otherwise it is a B. 18

Examples of class A Examples of class B Problem 3 Left Bar Right Bar The rule again: if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B. The rule again: if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B. 19

Antenna Length Grasshoppers Katydids Abdomen Length 20

Antenna Length Abdomen Length Katydids Grasshoppers We can “project” the previously unseen instance into the same space as the database. We have now abstracted away the details of our particular problem. It will be much easier to talk about points in space. We can “project” the previously unseen instance into the same space as the database. We have now abstracted away the details of our particular problem. It will be much easier to talk about points in space ??????? previously unseen instance = 21

Simple Linear Classifier If previously unseen instance above the line then class is Katydid else class is Grasshopper Katydids Grasshoppers R.A. Fisher

Classification Accuracy Predicted class Class = Katydid (1)Class = Grasshopper (0) Actual Class Class = Katydid (1)f 11 f 10 Class = Grasshopper (0)f 01 f 00 Number of correct predictions f 11 + f 00 Accuracy = = Total number of predictions f 11 + f 10 + f 01 + f 00 Number of wrong predictions f 10 + f 01 Error rate = = Total number of predictions f 11 + f 10 + f 01 + f 00 23

Confusion Matrix In a binary decision problem, a classifier labels examples as either positive or negative. Classifiers produce confusion/ contingency matrix, which shows four entities: TP (true positive), TN (true negative), FP (false positive), FN (false negative) Positive (+) Negative (-) Predicted positive (Y) TPFP Predicted negative (N) FNTN Confusion Matrix 24

The simple linear classifier is defined for higher dimensional spaces… 25

… we can visualize it as being an n-dimensional hyperplane 26

It is interesting to think about what would happen in this example if we did not have the 3 rd dimension… 27

We can no longer get perfect accuracy with the simple linear classifier… We could try to solve this problem by user a simple quadratic classifier or a simple cubic classifier.. However, as we will later see, this is probably a bad idea… 28

Which of the “Problems” can be solved by the Simple Linear Classifier? 1)Perfect 2)Useless 3)Pretty Good Problems that can be solved by a linear classifier are call linearly separable. 29

A Famous Problem R. A. Fisher’s Iris Dataset. 3 classes 50 of each class The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width. Iris SetosaIris VersicolorIris Virginica Setosa Versicolor Virginica 30

Setosa Versicolor Virginica We can generalize the piecewise linear classifier to N classes, by fitting N-1 lines. In this case we first learned the line to (perfectly) discriminate between Setosa and Virginica/Versicolor, then we learned to approximately discriminate between Virginica and Versicolor. If petal width > – (0.325 * petal length) then class = Virginica Elseif petal width… 31

Predictive accuracy Speed and scalability –time to construct the model –time to use the model –efficiency in disk-resident databases Robustness –handling noise, missing values and irrelevant features, streaming data Interpretability: –understanding and insight provided by the model We have now seen one classification algorithm, and we are about to see more. How should we compare them ? 32