More Classifier and Accuracy Measure of Classifiers

Slides:



Advertisements
Similar presentations
Learning Algorithm Evaluation
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ The generated tree may overfit the training data –Too many branches,
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Chapter 6. Classification and Prediction
Classification and Decision Boundaries
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Instance Based Learning
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Evaluation of Learning Models
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Classification II (continued) Model Evaluation
Evaluating Classifiers
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
10/5/2015Data Mining: Concepts and Techniques1 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding classification.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Basic Data Mining Technique
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
Feature Selection: Why?
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
CpSc 810: Machine Learning Evaluation of Classifier.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
7. Performance Measurement
Machine Learning: Methodology Chapter
CSE 4705 Artificial Intelligence
Data Mining: Concepts and Techniques (3rd ed
Dipartimento di Ingegneria «Enzo Ferrari»,
Chapter 7 – K-Nearest-Neighbor
Genetic Algorithms (GA)
COSC 4335: Other Classification Techniques
Machine Learning: UNIT-4 CHAPTER-1
Machine Learning: Methodology Chapter
Presentation transcript:

More Classifier and Accuracy Measure of Classifiers

Comparison of BPNN and SVM SVM learning involves quadratic programming and BPNN can easily be learned in incremental fashion. The complexity of SVM training is characterized by the # of support vectors rather than the dimensionality of the data Thus, an SVM with a small number of support vectors can have good generalization, even when the dimensionality of the data is high. BPNN will have poor generalization when the number of class samples is limited. SVM is trained as a convex optimization problem resulting in a unique solution. BPNN tends to suffer from over-fitting problems and doesn’t lead to unique solution owing to differences in their initial weight. Upper bound on the expected error rate of the SVM classifier, is independent of the data dimensionality.

Associative Classification Associative classification: Major steps Mine data to find strong associations between frequent patterns (conjunctions of attribute-value pairs) and class labels Association rules are generated in the form of P1 ^ p2 … ^ pl  “Aclass = C” (conf, sup) Organize the rules to form a rule-based classifier Why effective? It explores highly confident associations among multiple attributes and may overcome some constraints introduced by decision-tree induction, which considers only one attribute at a time Associative classification has been found to be often more accurate than some traditional classification methods, such as C4.5 3 3

Typical Associative Classification Methods CBA (Classification Based on Associations: Liu, Hsu & Ma, KDD’98) Mine possible association rules in the form of Cond-set (a set of attribute-value pairs)  class label Build classifier: Organize rules according to decreasing precedence based on confidence and then support CMAR (Classification based on Multiple Association Rules: Li, Han, Pei, ICDM’01) Classification: Statistical analysis on multiple rules CPAR (Classification based on Predictive Association Rules: Yin & Han, SDM’03) Generation of predictive rules (FOIL-like analysis) but allow covered rules to retain with reduced weight Prediction using best k rules High efficiency, accuracy similar to CMAR 4

Lazy vs. Eager Learning Lazy vs. eager learning Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple Eager learning (the above discussed methods): Given a set of training tuples, constructs a classification model before receiving new (e.g., test) data to classify Lazy: less time in training but more time in predicting Accuracy Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form an implicit global approximation to the target function Eager: must commit to a single hypothesis that covers the entire instance space

Lazy Learner: Instance-Based Methods Typical approaches k-nearest neighbor approach Instances represented as points in a Euclidean space. The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2) k-NN returns the most common value among the k training examples nearest to xq (test sample) Curse of dimensionality: distance between neighbors could be dominated by irrelevant attributes.( axes stretch, remove attributes) Locally weighted regression Weight the contribution of each of the k neighbors according to their distance to the query Xq Give greater weight to closer neighbours. Case-based reasoning Uses symbolic representations and knowledge-based inference

Other classifiers Fuzzy Set Approach: Genetic Algorithm: Similar to biological evolution Each rule is represented by a string of bits if A1 and ¬A2 then C2 can be encoded as 100 The fitness of a rule is represented by its classification accuracy on a set of training examples. Offspring are generated by crossover and mutation Rough Sets: a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to C) . Fuzzy Set Approach: Attribute values are converted to fuzzy values along with degree of membership. Each applicable rule contributes a vote for membership in the categories.

Model Evaluation and Selection Evaluation metrics: How can we measure accuracy? Other metrics to consider? Use validation test set of class-labeled tuples instead of training set when assessing accuracy Methods for estimating a classifier’s accuracy: Holdout method, random subsampling Cross-validation Bootstrap Comparing classifiers: Confidence intervals Cost-benefit analysis and ROC Curves 8

Classifier Evaluation Metrics: Confusion Matrix Actual class\Predicted class C1 ¬ C1 True Positives (TP) False Negatives (FN) False Positives (FP) True Negatives (TN) Example of Confusion Matrix: Actual class\Predicted class buy_computer = yes buy_computer = no Total buy_computer = yes 6954 46 7000 412 2588 3000 7366 2634 10000 Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i that were labeled by the classifier as class j May have extra rows/columns to provide totals 9

Classifier Evaluation Metrics: Accuracy, Error Rate, Sensitivity and Specificity A\P C ¬C TP FN P FP TN N P’ N’ All Class Imbalance Problem: One class may be rare, e.g. fraud, or HIV-positive Significant majority of the negative class and minority of the positive class Sensitivity: True Positive recognition rate Sensitivity = TP/P Specificity: True Negative recognition rate Specificity = TN/N Classifier Accuracy, or recognition rate: percentage of test set tuples that are correctly classified Accuracy = (TP + TN)/All Error rate: 1 – accuracy, or Error rate = (FP + FN)/All 10

Classifier Evaluation Metrics: Precision and Recall, and F-measures Precision: exactness – what % of tuples that the classifier labeled as positive are actually positive Recall: completeness – what % of positive tuples did the classifier label as positive? Perfect score is 1.0 Inverse relationship between precision & recall F measure (F1 or F-score): harmonic mean of precision and recall, Fß: weighted measure of precision and recall assigns ß times as much weight to recall as to precision 11

Classifier Evaluation Metrics: Example Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%) 90 210 300 30.00 (sensitivity 140 9560 9700 98.56 (specificity) 230 9770 10000 96.40 (accuracy) Precision = 90/230 = 39.13% Recall = 90/300 = 30.00% 12

Evaluating Classifier Accuracy: Holdout & Cross-Validation Methods Holdout method Given data is randomly partitioned into two independent sets Training set (e.g., 2/3) for model construction Test set (e.g., 1/3) for accuracy estimation Random sampling: a variation of holdout Repeat holdout k times, accuracy = avg. of the accuracies obtained Cross-validation (k-fold, where k = 10 is most popular) Randomly partition the data into k mutually exclusive subsets, each approximately equal size At i-th iteration, use Di as test set and others as training set Leave-one-out: k folds where k = # of tuples, for small sized data *Stratified cross-validation*: folds are stratified so that class dist. in each fold is approx. the same as that in the initial data 13

Evaluating Classifier Accuracy: Bootstrap Works well with small data sets Samples the given training tuples uniformly with replacement i.e., each time a tuple is selected, it is equally likely to be selected again and re-added to the training set Several bootstrap methods, and a common one is .632 boostrap A data set with d tuples is sampled d times, with replacement, resulting in a training set of d samples. The data tuples that did not make it into the training set end up forming the test set. About 63.2% of the original data end up in the bootstrap, and the remaining 36.8% form the test set (since (1 – 1/d)d ≈ e-1 = 0.368) Repeat the sampling procedure k times, overall accuracy of the model: 14

Model Selection: ROC Curves ROC (Receiver Operating Characteristics) curves: for visual comparison of classification models Originated from signal detection theory Shows the trade-off between the true positive rate and the false positive rate The area under the ROC curve is a measure of the accuracy of the model Rank the test tuples in decreasing order: the one that is most likely to belong to the positive class appears at the top of the list The closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate is the model Vertical axis represents the true positive rate Horizontal axis rep. the false positive rate The plot also shows a diagonal line A model with perfect accuracy will have an area of 1.0 15

Summary Effective and advanced classification methods Selection of Classifier: BPNN vs Support Vector Machine (SVM) comparisons of the different classification methods: No single method has been found to be superior over all others for all data sets Issues such as accuracy, training time, robustness, scalability, and interpretability must be considered and can involve trade-offs. Other classification methods: lazy learners (KNN), genetic algorithms, rough set and fuzzy set approaches Evaluation metrics include: accuracy, sensitivity, specificity, precision, recall, F measure, and Fß measure. Stratified k-fold cross-validation is recommended for accuracy estimation. Bagging and boosting can be used to increase overall accuracy by learning and combining a series of individual models. ROC curves are often found useful for model selection.