© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Evaluating Classifiers
Data Mining and Machine Learning
Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Longin Jan Latecki Temple University
Model Evaluation Metrics for Performance Evaluation
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Data Mining Techniques Outline
Ensemble Learning what is an ensemble? why use an ensemble?
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
Ensemble Learning: An Introduction
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Examples of Ensemble Methods
Machine Learning: Ensemble Methods
For Better Accuracy Eick: Ensemble Learning
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Ensembles of Classifiers Evgueni Smirnov
Evaluating Classifiers
Machine Learning CS 165B Spring 2012
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Class Imbalance in Text Classification
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Evaluating Classification Performance
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
7. Performance Measurement
Chapter 13 – Ensembles and Uplift
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Data Mining Classification: Alternative Techniques
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Ensembles.
Ensemble learning.
Data Mining Ensembles Last modified 1/9/19.
Presentation transcript:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each rule is represented by a string of bits l An initial population is created consisting of randomly generated rules l Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offspring l The fitness of a rule is represented by its classification accuracy on a set of training examples l Offspring are generated by crossover and mutation l GA’s are a general search/optimization method, not just a classification method. This can be contrasted with other methods

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Ensemble Methods l Construct a set of classifiers from the training data l Predict class label of previously unseen records by aggregating predictions made by multiple classifiers l In Olympic Ice-Skating you have multiple judges? Why?

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ General Idea

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate,  = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction: –Practice has shown that even when independence does not hold results are good

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Methods for generating Multiple Classifiers l Manipulate the training data –Sample the data differently each time –Examples: Bagging and Boosting l Manipulate the input features –Sample the featurres differently each time  Makes especially good sense if there is redundancy –Example: Random Forest l Manipulate the learning algorithm –Vary some parameter of the learning algorithm  E.g., amount of pruning, ANN network topology, etc.  Use different learning algorithms

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Background l Classifier performance can be impacted by: –Bias: assumptions made to help with generalization  "Simpler is better" is a bias –Variance: a learning method will give different results based on small changes (e.g., in training data).  When I run experiments and use random sampling with repeated runs, I get different results each time. –Noise: measurements may have errors or the class may be inherently probabilistic

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How Ensembles Help l Ensemble methods can assist with the bias and variance –Averaging the results over multiple runs will reduce the variance  I observe this when I use 10 runs with random sampling and see that my learning curves are much smoother –Ensemble methods especially helpful for unstable classifier algorithms  Decision trees are unstable since small changes in the training data can greatly impact the structure of the learned decision tree –If you combine different classifier methods into an ensemble, then you are using methods with different biases  You are more likely to use a classifier with a bias that is a good match for the problem  You may even be able to identify the best methods and weight them more

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting l These methods have been shown to be quite effective l A technique ignored by the textbook is to combine classifiers built separately –By simple voting –By voting and factoring in the reliability of each classifier

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected (about 63% for large n) –Some values will be picked more than once l Combine the resulting classifiers, such as by majority voting l Greatly reduces the variance when compared to a single base classifier

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Class Imbalance l Class imbalance occurs when the classes in the distribution are very unevenly distributed –Examples would include fraud prediction and identification of rare diseases l If there is class imbalance, accuracy may be high even if the rare class is never predicted –This could be okay, but only if both classes are equally important  This is usually not the case. Typically the rare class is more important  The cost of a false negative is usually much higher than the cost of a false positive –The rare class is designated the positive class

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Confusion Matrix l The following abbreviations are standard: –FP: false positive –TP: True Positive –FN: False Negative –TN: True Negative Predicted Class Actual Class TPFN FPTN

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classifier Evaluation Metrics l Accuracy = (TP + TN)/(TP + TN + FN + FP) l Precision and Recall –Recall = TP/(TP + FN)  Recall measure the fraction of positive class examples that are correctly identified –Precision = TP/(TP + FP)  Precision is essentially the accuracy of the examples that are classified as positive –We would like to maximize precision and recall  They are usually competing goals  These measure are appropriate for class imbalance since recall explicitly addresses the rare class

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classifier Evaluation Metrics Cont. l One issue with precision and recall is that they are two numbers, not one –That makes simple comparisons more difficulty and it is not clear how to determine the best classifier –Solution: combine the two l F-measure combines precision and recall –The F 1 measure is defined as:  (2 x recall x precision)/(recall + precision)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Cost-Sensitive Learning l Cost-sensitive learning will factor in cost information –For example, you may be given the relative cost of a FN vs. FP –You may be given the costs or utilities associated with each quadrant in the confusion matrix l Cost-sensitive learning can be implemented by sampling the data to reflect the costs –If the cost of a FN is twice that of a FP, then you can increase the ratio of positive examples by a factor of 2 when constructing the training set

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ More on autonomous vehicles l DARPA Grand Challenges. $1,000,000 prize. –Motivation: by /3 of ground military forces autonomous. –2004: 150 mile race through the Mojave desert. No one finished. CMU’s car made it the farthest at 7.3 miles –2005: Same race. 22 of 23 surpassed the best distance from Five vehicles completed the course. Stanford first, CMU second. Sebastian Thrun leader for Stanford team. –2005 Grand Challenge Video2005 Grand Challenge Video

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ More DARPA Grand Challenge l 2007 Urban Challenge –60 mile urban area course to be completed in 6 hours.  Must obey all traffic laws and avoid other robot cars –Urban Challenge VideoUrban Challenge Video

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Google Autonomous Vehicle l Google “commercial” videovideo l Second Google driverless car videovideo l Alternative future autonomous “vehicles” –videovideo