1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD With slides from a KDD04 tutorial by Haixun Wang, Jian Pei & Philip.

Slides:



Advertisements
Similar presentations
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Advertisements

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Is Random Model Better? -On its accuracy and efficiency-
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Mining High-Speed Data Streams
Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
M INING H IGH -S PEED D ATA S TREAMS Presented by: Yumou Wang Dongyun Zhang Hao Zhou.
Third International Workshop on Knowledge Discovery from Data Streams, 2006 Classification of Changes in Evolving Data Streams using Online Clustering.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams YING YANG, XINDONG WU, XINGQUAN ZHU Data Mining and Knowledge.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,
Ensemble Learning: An Introduction
1 Mining Decision Trees from Data Streams Tong Suk Man Ivy CSIS DB Seminar February 12, 2003.
Three kinds of learning
Machine Learning: Ensemble Methods
Scaling Decision Tree Induction. Outline Why do we need scaling? Cover state of the art methods Details on my research (which is one of the state of the.
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
Issues with Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Stream Mining and Incremental Discretization John Russo CS561 Final Project April 26, 2007.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Powerpoint Templates 1 Mining High-Speed Data Streams Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Confrence Presented by: Afsoon.
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 Mining Decision Trees from Data Streams Thanks: Tong Suk Man Ivy HKU.
Ensemble Methods in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
University of Waikato, New Zealand
Ensemble methods with Data Streams
Mining Time-Changing Data Streams
Data Mining Practical Machine Learning Tools and Techniques
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Decision Trees for Mining Data Streams
Mining Decision Trees from Data Streams
Learning from Data Streams
Frequent Pattern Mining for Data Streams
Presentation transcript:

1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD With slides from a KDD04 tutorial by Haixun Wang, Jian Pei & Philip Yu

2 Mining Data Streams: Challenges zOn-line response (NB), limited memory, most recent windows only zFast & Light algorithms needed: yMust minimize usage of memory and CPU yRequires only one (or a few) passes through data zConcept shift/drift: change mining set statistics yRender previously learned models inaccurate or invalid yRobustness and Adaptability: quickly recover/adjust after concept changes. zPopular machine learning algorithms no longer effective: yNeural nets: slow learner requires many passes ySupport Vector Machines (SVM): computationally expensive yApriori: many passes and expensive (association rule mine difficult for on data streams)

3 The Decision Tree Classifier zLearning (Training) : yInput: a data set of (a, b), where a is a vector, b a class label yOutput: a model (decision tree) zTesting: yInput: a test sample (x, ?) yOutput: a class label prediction for x

4 Decision Tree Classifiers zA divide-and-conquer approach ySimple algorithm, intuitive model zTypically a decision tree grows one level for each scan of data y Multiple scans are required y But if we can use small samples these problem disappears z But data structure is not ‘stable’ ySubtle changes of data can cause global changes in the data structure

5 Stable Trees Using Samples How many samples do we need to build a tree in constant time that is nearly identical to the tree a batch learner (C4.5, Sprint,...) Nearly identical? zCategorical attributes: y with high probability, the attribute we choose for split is the same attribute as would be chosen by a batch learner yidentical decision tree zContinuous attributes: ydiscretize them into categorical ones...Forget concept changes for now

6 Hoeffding Trees zHoeffding bound is applied to the information gain zError decreases when n (# of samples) increases zAt each node, we shall accumulate enough samples (n) before we make a split zScales better than traditional DT algorithms yIncremental: the nodes are are created incrementally as news samples stream in ySub-linear with sampling ySmall memory requirement zCons: yOnly consider top 2 attributes yTie breaking takes time yGrow a deep tree takes time yDiscrete attribute only

7 VFDT zVery Fast Decision Tree [Domingos, Hulten 2000] ySeveral Improvements: faster and less memory  Concept Changes? A na ï ve approach: yPlace a sliding window on the stream yReapply C4.5 or VFDT whenever window moves yTime consuming!

8 CVFDT zConcept-adapting VFDT yHulten, Spencer, Domingos, 2001 zGoal yClassifying concept-drifting data streams zApproach yMake use of Hoeffding bound  Incorporate “ windowing ” yMonitor changes of information gain for attributes.  If change reaches threshold, generate alternate subtree with new “ best ” attribute, but keep on background. yReplace if new subtree becomes more accurate.

9 Classifiers for Data Streams zFast and Light Classifiers: yNaïve Bayesian: one pass to count occurrences x Sliding windows, tumbles and slides x Adaptive Nearest Neighbor Classification Algorithm-- ANNCAD Fast and Light Classifiers zEnsembles of Classifiers--decision trees or others yBagging Ensembles and yBoosting Ensembles

10 Basic Ideas zStream partitioned into sequential chunks zTrain a classifier from each chunk zAccuracy of voting ensembles is normally better than that of a single classfier. zMethod1. Bagging yWeighted voting: weights are assigned to classifiers based on their recent performance on the current test examples yOnly top K classifiers are used zMethod2. Boosting yMajority voting yClassifiers retired by age yBoosting used in training

11 Bagging Ensemble Method

12 Mining Streams with Concept Changes z Changes detected by drop in accuracy or by other methods yBuild new classifiers on new windows ySearch among old ones those that have now become accurate

13 Boosting Ensembles for Adaptive Mining of Data Streams Andrea Fang Chu, Carlo Zaniolo [PAKDD2004]

14 Mining Data Stream: Desiderata yFast learning (preferably in one pass of the data.) yLight requirements (low time complexity, low memory requirement) yAdaptation (model always reflects the time- changing concept)

15 Adaptive Boosting Ensembles Training stream is split into blocks (i.e., windows) Each individual classifier is learned from a block. A boosting ensemble of (7—19 members) is maintained over time Decisions are taken by simple majority As the N+1 classifier is build, boost the weight of the tuples misclassified by the first N Change detection is explored to achieve adaptation.

16 Fast and Light Experiments show that boosting ensembles of “weak learners” provide accurate prediction Weak Learners An aggressively pruned decision tree, e.g., shallow tree (this means fast!) Trained on a small set of examples (this mean light in memory requirements!)

17 Adaptation: Detect changes that cause significant drops in ensemble performance  gradual changes: concept drift abrupt changes: concept schift

18 Adaptability zThe error rate is viewed as a random variable zWhen it drops significantly from the recent average the whole ensemble is dropped zAnd a new one is quickly re-learned zCost/performance of boosting ensembles is better than that of bagging ensembles [KDD04] zBUT ???

19 References zHaixun Wang, Wei Fan, Philip S. Yu, Jiawei Han. Mining Concept Drifting Data Streams using Ensemble Classifiers. In the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) zPedro Domingos, Geoff Hulten. Mining High Speed Data Streams. In the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) zGeoff Hulten, Laurie Spencer, Pedro Domingos. Mining Time-Changing Data Streams. In the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) zWei Fan, Yi-an Huang, Haixun Wang, Philip S Yu. Active Mining of Data Streams. In the SIAM International Conference on Data Mining (SIAM DM) z2004Fang Chu, Yizhou Wang, Carlo Zaniolo, An adaptive learning approach for noisy data streams, 4th IEEE International Conference on Data Mining (ICDM), 2004 zFang Chu, Carlo Zaniolo: Fast and Light Boosting for Adaptive Mining of Data Streams. PAKDD 2004: zYan-Nei Law, Carlo Zaniolo, An Adaptive Nearest Neighbor Classification Algorithm for Data Streams, 2005 ECML/PKDD Conference, Porto, Portugal, October 3-7, 2005.