From Time-Changing Data Streams Blaž Sovdat August 27, 2014.

from Time-Changing Data Streams Blaž Sovdat August 27, 2014

)  Data arrives in the form of examples (tuples)  Examples arrive sequentially, one by one  No control over the speed and order of arrival  The underlying “process” that generates stream examples might change (non-stationary data)  Use a limited amount of memory, independent of the size of the stream (infinite data) THE STREAM MODEL (adult, female, 3.141, 0.577) (child, male, 2.1728, 0.1123) (child, female, 2.1728, 1.12) (child, male, 149, 1.23) … Example

 Requirements of the data stream environment: 1)Process one example at a time, inspect it only once 2)Use a limited amount of memory 3)Work in a limited amount of time 4)Be ready to predict at any time  Typical use of data stream learner: a)The learner receives a new example from the stream (1) b)The learner processes the example (2, 3) c)The learner is ready for the next example (4)  Different evaluation techniques DATA STREAM ENVIRONMENT Data stream prediction cycle Alfred Bifet and Richard Kirkby. Data Stream Mining: A Practical Approach. 2009.

INTERMEZZO: DECISION TREES Example Example: ((male,first,adult),no) Tom Mitchell. Machine Learning. McGraw Hill. 1997.

INTERMEZZO: CART L. Breiman, J. Friedman, C.J. Slone, R.A. Olshen. Classification and Regression Trees. CRC Press. 1984.

 Let’s modify CART to a streaming setting  Data is not available in advance, and we only see a (small) sample of the stream  When and on what attribute to split?  What attribute is “the best” relative to the whole stream?  Idea: Apply Hoeffding bound to confidently decide when to split A PROBLEM

SIMPLIFIED HOEFFDING BOUND Rajeev Motwani, Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press. 1995. Wassily Hoeffding. Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association. 1963.

APPLYING THE HOEFFDING BOUND Elena Ikonomovska. Algorithms for Learning Regression Trees and Ensembles on Evolving Data Streams. PhD thesis. 2012.

FAST INCREMENTAL MODEL TREES The big picture Elena Ikonomovska. Algorithms for Learning Regression Trees and Ensembles on Evolving Data Streams. PhD thesis. 2012.

 Handling numeric attributes (histogram, BST, etc.)  Stopping criteria (tree size, thresholds, etc.)  Fitting a linear model in leaves (unthresholded perceptron)  Handling concept drift (with Page-Hinkley test) EXTENSIONS OF THE FIMT LEARNER

 Syntactically no difference between regression and classification (almost)  A variant of the FIMT-DD learner available in QMiner  The learner exposed via QMiner Javascript API  Pass algorithm parameters and data stream specification in JSON format  Several stopping and splitting criteria  Change detection mechanism, using Page-Hinkley test  Can export the model anytime (XML and DOT formats supported)  Usage examples available on GitHub  The algorithm expects two (learning) or three (predicting) parameters: 1)vector of discrete attribute values; 2)vector of numeric attribute values; 3)target variable value (not needed for prediction) REGRESSION TREES IN QMINER

// algorithm parameters var algorithmParams = { "gracePeriod": 300, "splitConfidence": 1e-6, "tieBreaking": 0.005, "driftCheck": 1000, "windowSize": 100000, "conceptDriftP": false, "maxNodes": 15, "regLeafModel": "mean" "sdrThreshold": 0.1, "sdThreshold": 0.01, "phAlpha": 0.005, "phLambda": 50.0, "phInit": 100, }; // describe the data stream var streamConfig = { "dataFormat": ["A", "B", "Y"], "A": { "type": "discrete", "values": ["t", "f"] }, "B": { "type": "discrete", "values": ["t", "f"] }, "Y": { "type": "numeric" } }; // create a new learner var ht = analytics.newHoeffdingTree(streamConfig, algorithmParams); // process the stream while (!streamData.eof) { /* parse example */ ht.process(vec_discrete, vec_numeric, target); } // use the model var val = ht.predict(["t", "f"], []); // export the model ht.exportModel({ "file": "./sandbox/ht/model.gv", "type": "DOT" });

REGRESSION TREES IN QMINER

 Been flirting with NIPS 2013 paper  A completely different approach to regression tree learning  Essentially boils down to approximate nearest neighbor search  Very general setting (metric-measure spaces)  Strong theoretical guarantees THE END Samory Kpotufe, Francesco Orabona. Regression-tree Tuning in a Streaming Setting. NIPS 2013.

From Time-Changing Data Streams Blaž Sovdat August 27, 2014.

Similar presentations

Presentation on theme: "From Time-Changing Data Streams Blaž Sovdat August 27, 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Time-Changing Data Streams Blaž Sovdat August 27, 2014.

Similar presentations

Presentation on theme: "From Time-Changing Data Streams Blaž Sovdat August 27, 2014."— Presentation transcript:

Similar presentations

About project

Feedback