Data Mining (and machine learning)

Slides:



Advertisements
Similar presentations
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Advertisements

Jeremy Wyatt Thanks to Gavin Brown
Evaluating Classifiers
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Data Mining (and machine learning) ROC curves Rule Induction Basics of Text Mining.
Weka Project assignment 3
Machine Learning Overview
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Evaluating Classification Performance
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Copyright © 2009 Pearson Education, Inc. 4.4 Statistical Paradoxes LEARNING GOAL Investigate a few common paradoxes that arise in statistics, such as how.
A Generic Approach to Big Data Alarms Prioritization
7. Performance Measurement
Evolving Decision Rules (EDR)
Alan P. Reynolds*, David W. Corne and Michael J. Chantler
Machine Learning – Classification David Fenyő
Reading: R. Schapire, A brief introduction to boosting
Evaluating Classifiers
Evaluation – next steps
Machine Learning: Methodology Chapter
Rule Induction for Classification Using
Prepared by: Mahmoud Rafeek Al-Farra
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Session 7: Face Detection (cont.)
CSSE463: Image Recognition Day 11
Performance Measures II
Cost-Sensitive Learning
Data Mining (and machine learning)
Data Mining (and machine learning)
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Alternative Techniques
Features & Decision regions
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
Cost-Sensitive Learning
CSSE463: Image Recognition Day 11
دكتر محسن ميرزائي MD , MPH
Experiments in Machine Learning
Machine Learning: Lecture 3
CS 1120: Computer Science II Software Life Cycle
Learning Algorithm Evaluation
Data Mining (and machine learning)
Ensembles.
Classification and Prediction
Approaching an ML Problem
Data Mining (and machine learning)
ROC Curves and Operating Points
Overfitting and Underfitting
Data Mining (and machine learning)
Data Mining Ensembles Last modified 1/9/19.
Text Mining CSC 576: Data Mining.
Data Mining Class Imbalance
Basics of ML Rohan Suri.
CS 1120: Computer Science II Software Life Cycle
Evaluating Classifiers
CSSE463: Image Recognition Day 11
Roc curves By Vittoria Cozza, matr
CSSE463: Image Recognition Day 11
Evaluation and Its Methods
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Presentation transcript:

Data Mining (and machine learning) ROC curves Rule Induction CW3

Two classes is a common and special case

Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … …

Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … … Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications True Negative?: Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition? Predicted Y Predicted N Actually Y True Positive False Negative Actually N False Positive True Negative

YES NO

YES NO

Sensitivity: 100% Specificity: 25% YES NO YES NO

Sensitivity: 93.8% Specificity: 50% YES NO

Sensitivity: 81.3% Specificity: 83.3% YES NO YES NO

Sensitivity: 56.3% Specificity: 100% YES NO YES NO

detects all cancer cases (or whatever) Sensitivity: 100% Specificity: 25% YES NO YES NO 100% Sensitivity means: detects all cancer cases (or whatever) but possibly with many false positives

misses some cancer cases (or whatever) but no false positives Sensitivity: 56.3% Specificity: 100% YES NO YES NO 100% Specificity means: misses some cancer cases (or whatever) but no false positives

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases? A highly sensitive test for cancer: if “NO” then you be sure it’s “NO” Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”. With many trained classifiers, you can ‘move the line’ in this way. E.g. with NB, we could use a threshold indicating how much higher the log likelihood for Y should be than for N

ROC curves David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html

Rule Induction Rules are useful when you want to learn a clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible There are a number of different ways to ‘learn’ rules or rulesets. Before we go there, what is a rule / ruleset?

Rules IF Condition … Then Class Value is …

YES NO Rules are Rectangular IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

YES NO Rules are Rectangular IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

A Ruleset IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C …

What’s wrong with this ruleset? (two things) YES NO What’s wrong with this ruleset? (two things) 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

What about this ruleset? YES NO What about this ruleset? 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Two ways to interpret a ruleset:

Two ways to interpret a ruleset: As a Decision List IF Condition1 … Then Class = A ELSE IF Condition2 … Then Class = A ELSE IF Condition3 … Then Class = B ELSE IF Condition4 … Then Class = C … ELSE … predict Background Majority Class

Two ways to interpret a ruleset: As an unordered set IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C Check each rule and gather votes for each class If no winner, predict background majority class

Three broad ways to learn rulesets

Three broad ways to learn rulesets 1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!

Three broad ways to learn rulesets 2. Use any good search/optimisation algorithm. Evolutionary (genetic) algorithms are the most common. You will do this coursework 3. This means simply guessing a ruleset at random, and then trying mutations and variants, gradually improving them over time.

Three broad ways to learn rulesets 3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage

Take each class in turn .. YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Pick a random member of that class in the training set YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Next class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

Next class YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

And so on… YES NO 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12

CW3 Run expts program that evolves a ruleset Try different sizes of training and test set Observe ‘overfitting’ and report