9/03Data Mining – Classification G Dong 1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Spring 2003Data Mining by H. Liu, ASU1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Sparse vs. Ensemble Approaches to Supervised Learning
Induction of Decision Trees
Learning From Data Chichang Jou Tamkang University.
Classification Continued
Three kinds of learning
Classification.
Thanks to Nir Friedman, HU
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Issues with Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Bayesian Networks. Male brain wiring Female brain wiring.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Basic Data Mining Technique
Learning from observations
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CLASSIFICATION: Ensemble Methods
Classification Techniques: Bayesian Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
CIS 335 CIS 335 Data Mining Classification Part I.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Machine Learning Inductive Learning and Decision Trees
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Data Mining Lecture 11.
A task of induction to find patterns
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Statistical Learning Dong Liu Dept. EEIS, USTC.
A task of induction to find patterns
A task of induction to find patterns
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
A task of induction to find patterns
Presentation transcript:

9/03Data Mining – Classification G Dong 1 3. Classification Methods Patterns and Models Regression, NBC k-Nearest Neighbors Decision Trees and Rules Large size data

9/03Data Mining – Classification G Dong 2 Models and Patterns A model is a global description of data, or an abstract representation of a real-world process –Estimating parameters of a model –Data-driven model building –Examples: Regression, Graphical model (BN), HMM A pattern is about some local aspects of data –Patterns in data matrices Predicates (age < 40) ^ (income < 10) –Patterns for strings (ASCII characters, DNA alphabet) –Pattern discovery: rules

9/03Data Mining – Classification G Dong 3 Performance Measures Generality –How many instances are covered Applicability –Or is it useful? All husbands are male. Accuracy –Is it always correct? If not, how often? Comprehensibility –Is it easy to understand? (a subjective measure)

9/03Data Mining – Classification G Dong 4 Forms of Knowledge Concepts –Probabilistic, logical (proposition/predicate), functional Rules Taxonomies and Hierarchies –Dendrograms, decision trees Clusters Structures and Weights/Probabilities –ANN, BN

9/03Data Mining – Classification G Dong 5 Induction from Data Inferring knowledge from data - generalization Supervised vs. unsupervised learning –Some graphical illustrations of learning tasks (regression, classification, clustering) –Any other types of learning? Compare: The task of deduction –Infer information/fact that is a logical consequence of facts in a database Who is John’s grandpa? (deduced from e.g. Mary is John’s mother, Joe is Mary’s father) –Deductive databases: extending the RDBMS

9/03Data Mining – Classification G Dong 6 The Classification Problem From a set of labeled training data, build a system (a classifier) for predicting the class of future data instances (tuples). A related problem is to build a system from training data to predict the value of an attribute (feature) of future data instances.

9/03Data Mining – Classification G Dong 7 What is a bad classifier? Some simplest classifiers –Table-Lookup What if x cannot be found in the training data? We give up!? –Or, we can … A simple classifier Cs can be built as a reference –If it can be found in the table (training data), return its class; otherwise, what should it return? A bad classifier is one that does worse than Cs. Do we need to learn a classifier for data of one class?

9/03Data Mining – Classification G Dong 8 Many Techniques Decision trees Linear regression Neural networks k-nearest neighbour Naïve Bayesian classifiers Support Vector Machines and many more...

9/03Data Mining – Classification G Dong 9 Regression for Numeric Prediction Linear regression is a statistical technique when class and all the attributes are numeric. y = α + βx, where α and β are regression coefficients We need to use instances to find α and β –by minimizing SSE (least squares) –SSE = Σ (y i -y i ’) 2 = Σ (y i - α - βx i ) 2 Extensions –Multiple regression –Piecewise linear regression –Polynomial regression

9/03Data Mining – Classification G Dong 10 Nearest Neighbor Also called instance based learning Algorithm –Given a new instance x, –find its nearest neighbor –Return y’ as the class of x Distance measures –Normalization?! Some interesting questions –What’s its time complexity? –Does it learn?

9/03Data Mining – Classification G Dong 11 Nearest Neighbor (2) Dealing with noise – k-nearest neighbor –Use more than 1 neighbor –How many neighbors? –Weighted nearest neighbors How to speed up? –Huge storage –Use representatives (a problem of instance selection) Sampling Grid Clustering

9/03Data Mining – Classification G Dong 12 Naïve Bayes Classification This is a direct application of Bayes’ rule P(C|x) = P(x|C)P(C)/P(x) x - a vector of x 1,x 2,…,x n That’s the best classifier you can ever build –You don’t even need to select features, it takes care of it automatically But, there are problems –There are a limited number of instances –How to estimate P(x|C)

9/03Data Mining – Classification G Dong 13 NBC (2) Assume conditional independence between x i ’s We have P(C|x) ≈ P(x 1 |C) P(x i |C) (x n |C)P(C) How good is it in reality? Let’s build one NBC for a very simple data set –Estimate the priors and conditional probabilities with the training data –P(C=1) = ? P(C=2) =? P(x1=1|C=1)? P(x1=2|C=1)? … –What is the class for x=(1,2,1)? P(1|x) ≈ P(x1=1|1) P(x2=2|1) P(x3=1|1) P(1), P(2|x) ≈ –What is the class for (1,2,2)?

9/03Data Mining – Classification G Dong 14 Example of NBC C12 7 =43 A1=020 A1=121 A1=202 A2=0 A2=1 A2=2 A3=1 A3=2 A1A2A3C

9/03Data Mining – Classification G Dong 15 Golf Data

9/03Data Mining – Classification G Dong 16 Decision Trees A decision tree Outlook HumidityWind sunnyovercastrain YES highnormalstrongweak NO YESNOYES

9/03Data Mining – Classification G Dong 17 How to `grow’ a tree? Randomly  Random Forests (Breiman, 2001) What are the criteria to build a tree? –Accurate –Compact A straightforward way to grow is –Pick an attribute –Split data according to its values –Recursively do the first two steps until No data left No feature left

9/03Data Mining – Classification G Dong 18 Discussion There are many possible trees –let’s try it on the golf data How to find the most compact one –that is consistent with the data? Why the most compact? –Occam’s razor principle Issue of efficiency w.r.t. optimality –One attribute at a time or …

9/03Data Mining – Classification G Dong 19 Grow a good tree efficiently The heuristic – to find commonality in feature values associated with class values –To build a compact tree generalized from the data It means we look for features and splits that can lead to pure leaf nodes. Is it a good heuristic? –What do you think? –How to judge it? –Is it really efficient? –How to implement it?

9/03Data Mining – Classification G Dong 20 Measuring the purity of a data set – Entropy Information gain (see the brief review) Choose the feature with max gain Let’s grow one Outlook (7,7) Sun (5) Rain (5) OCa (4)

9/03Data Mining – Classification G Dong 21 Different numbers of values Different attributes can have varied numbers of values Some treatments –Removing useless attributes before learning –Binarization –Discretization Gain-ratio is another practical solution –Gain = root-Info – InfoAttribute(i) –Split-Info = -  ((|T i |/|T|)log 2 (|T i |/|T|)) –Gain-ratio = Gain / Split-Info

9/03Data Mining – Classification G Dong 22 Another kind of problems A difficult problem. Why is it difficult? Similar ones are Parity, Majority problems. XOR problem

9/03Data Mining – Classification G Dong 23 Tree Pruning Overfitting: Model fits training data too well, but won’t work well for unseen data. An effective approach to avoid overfitting and for a more compact tree (easy to understand) Two general ways to prune –Pre-pruning: stop splitting further Any significant difference in classification accuracy before and after division –Post-pruning to trim back

9/03Data Mining – Classification G Dong 24 Rules from Decision Trees Two types of rules –Order sensitive (more compact, less efficient) –Order insensitive The most straightforward way is … Class-based method –Group rules according to classes –Select most general rules (or remove redundant ones) Data-based method –Select one rule at a time (keep the most general one) –Work on the remaining data until all data is covered

9/03Data Mining – Classification G Dong 25 Variants of Decision Trees and Rules Tree stumps Holte’s 1R rules (1992) –For each attribute A Sort according to its values v Find the most frequent class value c for each v –Breaking tie with coin flipping Output the most accurate rule as if A=v then c –An example (the Golf data)

9/03Data Mining – Classification G Dong 26 Handling Large Size Data When data simply cannot fit in memory … –Is it a big problem? Three representative approaches –Smart data structures to avoid unnecessary recalculation Hash trees SPRINT –Sufficient statistics AVC-set (Attribute-Value, Class label) to summarize the class distribution for each attribute Example: RainForest –Parallel processing Make data parallelizable

9/03Data Mining – Classification G Dong 27 Ensemble Methods A group of classifiers –Hybrid (Stacking) –Single type Strong vs. weak learners A good ensemble –Accuracy –Diversity Some major approaches form ensembles –Bagging –Boosting

9/03Data Mining – Classification G Dong 28 Bibliography I.H. Witten and E. Frank. Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations Morgan Kaufmann. M. Kantardzic. Data Mining – Concepts, Models, Methods, and Algorithms IEEE. J. Han and M. Kamber. Data Mining – Concepts and Techniques Morgan Kaufmann. D. Hand, H. Mannila, P. Smyth. Principals of Data Mining MIT. T. G. Dietterich. Ensemble Methods in Machine Learning. I. J. Kittler and F. Roli (eds.) 1 st Intl Workshop on Multiple Classifier Systems, pp 1-15, Springer-Verlag, 2000.