Ensemble methods with Data Streams Jungbeom Lee CS240B
Outline Intro Ensemble in Machine learning Online ensemble algorithms Future work
Intro Previous class: Data Streams Classifiers Ensemble methods Online algorithm
Classifiers The batch classification problem: Given a finite training set D={(x,y)} , where y={y1, y2, …, yk}, |D|=n, find a function y=f(x) that can predict the y value for an unseen instance x The data stream classification problem: Given an infinite sequence of pairs of the form (x,y) where y={y1, y2, …, yk}, find a function y=f(x) that can predict the y value for an unseen instance x Example applications: Fraud detection in credit card transactions Topic classification in a news aggregation site, e.g. Google news Translator for foreign languages Supervised learning
Motivations Data Volume Changing data characteristics Cost of Learning Online mining different from static mining Data Volume impossible to mine the entire data at one time can only afford constant memory per data sample Changing data characteristics previously learned models are invalid Cost of Learning model updates can be costly can only afford constant time per data sample.
Ensemble A set of classifiers whose individual decisions are combined in some way to classify new examples An ensemble of classifiers to be more accurate than any of its individual members one key to successful is to use individual classifiers with error rates below .5
Reasons
Ensemble methods Manipulating the Training Examples Bagging Adaboost Injecting Randomness C4.5 decision tree algorithm
Bagging algorithm
Bagging algorithm
Online bagging algorithm
Online weighted bagging algorithm
AdaBoost algorithm
AdaBoost algorithm
Adaptive boosting algorithm
Experimental Results
Type of Data
Experimental Results
Experimental Results
Experimental Results
Future work Better online algorithm for Bagging Dealing with multiple data types
References http://web.engr.oregonstate.edu/~tgd/publications /mcs-ensembles.pdf http://pages.bangor.ac.uk/~mas00a/papers/lkSUEM A2008.pdf http://web.cs.ucla.edu/~zaniolo/papers/NBCAJM W77MW0J8CP.pdf https://ti.arc.nasa.gov/m/pub- archive/archive/0962.pdf https://engineering.purdue.edu/~givan/papers/bp.p df http://hanj.cs.illinois.edu/pdf/kdd03_emsemble.pdf