Handling Numeric Attributes in Hoeffding Trees Bernhard Pfahringer, Geoff Holmes and Richard Kirkby.

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Conceptual Clustering
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Mining High-Speed Data Streams
Fast Algorithms For Hierarchical Range Histogram Constructions
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
M INING H IGH -S PEED D ATA S TREAMS Presented by: Yumou Wang Dongyun Zhang Hao Zhou.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Forest Trees for On-line Data Joao Gama, Pedro Medas, Ricado Rocha Proc. ACM Symposium on Applied Computing- SAC /5/27 報告人:董原賓.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
1 Decision Tree Classification Tomi Yiu CS 632 — Advanced Database Systems April 5, 2001.
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana Cornell CS Stefan.
Evaluating Hypotheses
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Classification.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Scaling Decision Tree Induction. Outline Why do we need scaling? Cover state of the art methods Details on my research (which is one of the state of the.
For Better Accuracy Eick: Ensemble Learning
Scalable Approximate Query Processing through Scalable Error Estimation Kai Zeng UCLA Advisor: Carlo Zaniolo 1.
Module 04: Algorithms Topic 07: Instance-Based Learning
Gaussian process modelling
by B. Zadrozny and C. Elkan
From Time-Changing Data Streams Blaž Sovdat August 27, 2014.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Chapter 9 – Classification and Regression Trees
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Powerpoint Templates 1 Mining High-Speed Data Streams Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Confrence Presented by: Afsoon.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Tracking Malicious Regions of the IP Address Space Dynamically.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
University of Waikato, New Zealand
Data Transformation: Normalization
University of Waikato, New Zealand
Data Science Algorithms: The Basic Methods
Mining Time-Changing Data Streams
Introduction to Data Mining, 2nd Edition by
Communication and Memory Efficient Parallel Decision Tree Construction
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Overfitting and Underfitting
CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.
Data Transformations targeted at minimizing experimental variance
Decision Trees for Mining Data Streams
Mining Decision Trees from Data Streams
Data Mining CSCI 307, Spring 2019 Lecture 21
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Handling Numeric Attributes in Hoeffding Trees Bernhard Pfahringer, Geoff Holmes and Richard Kirkby

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Overview Hoeffding trees are excellent for classification tasks on data streams. Handling numeric attributes well is crucial to performance in conventional decision trees (for example, C4.5 -> C4.8) Does handling numeric attributes matter for streamed data? We implement a range of methods and empirically evaluate their accuracy and costs.

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Data Streams - reminder Idea is that data is being provided from a continuous source: –Examples processed one at a time (inspected once) –Memory is limited (!) –Model construction must scale (NlogN in num examples) –Be ready to predict at any time As memory is limited this will have implications for any numeric handling method you might construct Only consider methods that work as the tree is built

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Main assumptions/limitations Assume a stationary concept, i.e. no concept drift or change –may seem very limiting, but … Three-way trade-off: –memory –speed –accuracy Used only artificial data sources

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Hoeffding Trees Introduced by Domingos and Hulten (VFDT) “Extension” of decision trees to streams HT Algorithm: –Init tree T to root node –For each example from stream Find leaf L for this example Update counts in L with attr values of example and compute split function (eg Info Gain, IG) for each attribute If IG(best attr) – IG(next best attr) > ε then split L on best attr

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Active leaf data structure For each class value: –for each nominal attribute: for each possible value: –keep sum of counts/weights –for each numeric attribute: keep sufficient stats to approximate the distribution various possibilities: here assume normal distribution so estimate/record: n,mean,variance, + min/max

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Numeric Handling Methods VFDT (VFML – Hulten & Domingos, 2003) –Summarize the numeric distribution with a histogram made up of a maximum number of bins N (default 1000) –Bin boundaries determined by first N unique values seen in the stream. –Issues: method sensitive to data order and choosing a good N for a particular problem Exhaustive Binary Tree (BINTREE – Gama et al, 2003) –Closest implementation of a batch method –Incrementally update a binary tree as data is observed –Issues: high memory cost, high cost of split search, data order

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Numeric Handling Methods Quantile Summaries (GK – Greenwald and Khanna, 2001) –Motivation comes from VLDB –Maintain sample of values (quantiles) plus range of possible ranks that the samples can take (tuples) –Extremely space efficient –Issues: use max number of tuples per summary

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Handling Numeric Methods Gaussian Approximation (GAUSS) –Assume values conform to Normal Distribution –Maintain five numbers (eg mean, variance, weight, max, min) –Note: not sensitive to data order –Incrementally updateable –Using the max, min information per class – split the range into N equal parts –For each part use the 5 numbers per class to compute the approx class distribution Use the above to compute the IG of that split

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Gaussian approximation – 2 class problem

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Gaussian approximation – 3 class problem

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Gaussian approximation – 4 class problem

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Empirical Evaluation Use each numeric handling method (8 in total) to build a Hoeffding Tree (HTMC) Vary parameters of some methods (VFML10,100,1000; BT; GK100,1000; GAUSS10,100) Train models for 10 hours – then test on one million (holdout) examples Define three application scenarios –Sensor network (100K memory limit) –Handheld (32MB) –Server (400MB)

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Data generators Random tree (Domingos&Hulten): –(RTS) 10 num, 10 nom 5 values, 2 classes, leaves start at level 3, max level 5, plus version with 10% noise added (RTSN) –(RTC) 50 num, 50 nom 5 values, 2 classes, leaves start at level 5, max level 10, plus version with 10% noise added (RTCN) Random RBF (Kirkby): –(RRBFS) 10 num, 100 centers, 2 classes –(RRBFC) 50 num, 1000 centers, 2 classes Waveform (Aha): –(Wave21): 21 noisy num, (Wave40): +19 irrelevant num; 3 classes (GenF1-GenF10) (Agrawal etal): –hypothetical loan applications, 10 different rule(s) over 6 num + 3 nom attrs, 5% noise, 2 classes

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Tree Measurements Accuracy (% correct) Number of training examples processed in 10 hours (in millions) Number of active leaves (in hundreds) Number of inactive leaves (in hundreds) Total nodes (in hundreds) Tree depth Training speed (% of generation speed) Prediction speed (% of generation speed)

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Sensor Network (100K memory limit) Method% correct Train (million) Active Leaves Inactive (hdrds) Total Nodes AvgTree Depth Train Spd % Pred Spd % VF VF VF BT GK GK GAUSS GAUSS

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Handheld Environment (32MB memory limit) Method% correct Train (million) Active Leaves Inactive (hdrds) Total Nodes AvgTree Depth Train Spd % Pred Spd % VF VF VF BT GK GK GAUSS GAUSS

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Server Environment (400MB memory limit) Method% correct Train (million) Active Leaves Inactive (hdrds) Total Nodes AvgTree Depth Train Spd % Pred Spd % VF VF VF BT GK GK GAUSS GAUSS

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Overall results - comments VFML10 is superior on average in all environments, followed closely by GAUSS10 GK methods are generally competitive BINTREE is only competitive in a server setting Default setting of 1000 for VFML is a poor choice Crude binning provides more space which leads to faster growth and better trees (more room to grow) Higher values for GAUSS leads to very deep trees (in excess of the # of attributes) suggesting repeated splitting (too fine grained)

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Remarks – sensor network environment Number of training examples low because learning stops when last active leaf is deactivated (mem mgmt freezes nodes – low # examples, low probability of splitting) Most accurate methods VFML10, GAUSS10

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Remarks – Handheld Environment Generates smaller trees (than server) and can therefore process more examples

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Remarks – Server Environment

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group VFML10 vs GAUSS10 – Closer Analysis Recall VFML10 is superior on average Sensor (avg 87.7 vs 86.2) –GAUSS10 superior on 10 –VFML10 superior on 6 (2 no difference) Handheld (avg 91.5 vs 91.4) –GAUSS10 superior on 4 –VFML10 superior on 8 (6 no difference) Server (avg 91.4 vs 91.2) –GAUSS10 superior on 6 –VFML10 superior on 6 (6 no difference)

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Data order

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group Conclusion We have presented a method for handling numeric attributes in data streams that performs well in empirical studies The methods employing the most approximation were superior – they allow greater growth when memory is limited. On a dataset by dataset analysis there is not much to choose between VFML10 and GAUSS10 Gains made in handling numeric variables come at a cost in terms of training and prediction speed – the cost is high in some environments

Handling Numeric Attributes in Hoeffding treesPfahringer, Holmes and Kirkby - Machine Learning Group All algorithms available All methods and an environment for experimental evaluation of data streams is available from the above URL – system is called Massive Online Analysis (MOA)