Learning from Data Streams

Slides:



Advertisements
Similar presentations
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Advertisements

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014.
Slides from: Doug Gray, David Poole
Random Forest Predrag Radenković 3237/10
Mining High-Speed Data Streams
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Decision Tree Approach in Data Mining
Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
M INING H IGH -S PEED D ATA S TREAMS Presented by: Yumou Wang Dongyun Zhang Hao Zhou.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD With slides from a KDD04 tutorial by Haixun Wang, Jian Pei & Philip.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
CMPUT 466/551 Principal Source: CMU
Machine Learning Neural Networks
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,
Ensemble Learning: An Introduction
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Machine Learning: Ensemble Methods
Ensemble Learning (2), Tree and Forest
Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
University of Waikato, New Zealand
Data Transformation: Normalization
Ensemble methods with Data Streams
Mining Time-Changing Data Streams
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
COMP61011 : Machine Learning Ensemble Models
Introduction to Data Mining, 2nd Edition by
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Classification of class-imbalanced data
Ensemble learning.
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Decision Trees for Mining Data Streams
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Learning from Data Streams CS240B Notes by Carlo Zaniolo UCLA CSD With slides from a ICDE 2005 tutorial by Haixun Wang, Jian Pei & Philip Yu

What are the Challenges? Data Volume impossible to mine the entire data at one time can only afford constant memory per data sample Concept Drifts previously learned models are invalid Cost of Learning model updates can be costly can only afford constant time per data sample.

On-Line Learning Learning (Training) : Testing: Input: a data set of (a, b), where a is a vector, b a class label Output: a model (decision tree) Testing: Input: a test sample (x, ?) Output: a class label prediction for x When mining data streams the two phases are often combined since concept shift requires continuous training.

Mining Data Streams: Challenges On-line response (NB), limited memory, most recent windows only Fast & Light algorithms needed that Minimize usage of memory and CPU Require only one (or a few) passes through data Concept shift/drift: change mining set statistics Render previously learned models inaccurate or invalid Robustness and Adaptability: quickly recover/adjust after concept changes. Popular machine learning algorithms no longer effective: Neural nets: slow learner requires many passes Support Vector Machines (SVM): computationally too expensive.

Classifiers Algorithms: from databases to data streams New Algorithm have emerged, Bloom Filters, ANNCAD Existing algorithms have adapted, NBC survives with only minor changes, Decision Trees require significant adaptation, Classifier ensembles remain effective—after significant changes Popular algorithms no longer effective: Neural nets: slow learner requires many passes Support Vector Machines (SVM): computationally too expensive

Decision Tree Classifiers A divide-and-conquer approach Simple algorithm, intuitive model Typically a decision tree grows one level for each scan of data Multiple scans are required But if we can use small samples these problem disappears But data structure is not ‘stable’ Subtle changes of data can cause global changes in the data structure

Challenge #1 How many samples do we need to build a tree in constant time that is nearly identical to the tree built by batch learner (C4.5, Sprint,...)? Nearly identical? Categorical attributes: with high probability, the attribute we choose for split is the same attribute as would be chosen by a batch learner identical decision tree Continuous attributes: discretize them into categorical ones ...Forget concept shift/drift for now

Hoeffding Bound Also known as additive Chernoff Bound Given – r : real valued random variable – n : # independent observations of r – R : range of r Mean of r is at least ravg- ε, with probability 1-δ, i.e.: P(μr ≥ ravg- ε) is 1-δ where:

Hoeffding Bound Properties: Hoeffding bound is independent of data distribution Error ε decreases when n (# of samples) increases At each node, we shall accumulate enough samples (n) before we make a split.

Categorical attributes: Nearly Identical? Categorical attributes: with high probability, the attribute we choose for split is the same attribute as would be chosen by a batch learner thus we seek identical decision tree Continuous attributes:discretize them into categorical ones.