The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)

Slides:



Advertisements
Similar presentations
Learning under concept drift: an overview Zhimin He iTechs – ISCAS
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams YING YANG, XINDONG WU, XINGQUAN ZHU Data Mining and Knowledge.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
Managing Data Resources
The Decision-Making Process IT Brainpower
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Sparse vs. Ensemble Approaches to Supervised Learning
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
CS Instance Based Learning1 Instance Based Learning.
Data Mining – Intro.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Enterprise systems infrastructure and architecture DT211 4
Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Artificial Neural Networks
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Classification Techniques: Bayesian Classification
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
ICDCS 2014 Madrid, Spain 30 June-3 July 2014
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining and Decision Support
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Machine Learning: Ensemble Methods
Data Mining – Intro.
Supervised Time Series Pattern Discovery through Local Importance
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Classification Techniques: Bayesian Classification
Incremental Training of Deep Convolutional Neural Networks
Instance Based Learning
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Learning from Data Streams
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
A task of induction to find patterns
Credit Card Fraudulent Transaction Detection
Presentation transcript:

The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)

Abstract A. Tsymbal, “The problem of concept drift: definitions and related work”, Available here.here Real World Problem Concepts are often not stable but change with time. –Weather Prediction –Customers’ Preference The underlying data distribution may change with time.

Definitions and Peculiarities Concept Drift –Changes in the hidden context that can induce more or less radical changes in the target concept. The cause of the change is hidden and not known a priori. –Such as an effect of a car accident on a yearly budget. Often Reoccur –Weather patterns such as El Nino and La Nina. Hidden Context –A dependency not given explicitly in the form of predictive features.

An Ideal Concept Drift Handling System –Quickly adapts to concept drift. –Is robust to noise and distinguishes it from concept drift. –Recognizes and reacts to reoccurring contexts. Such as seasonal differences.

Types of Concept Drift There are two kinds of concept drift –Sudden (abrupt, instantaneous) –Gradual Moderate Slow Hidden changes can change the target concept, but may also cause a change of the underlying data distribution. –Such as a week of record warm temperatures.

Virtual Concept Drift –The necessity in the change of current model due to the change of data distribution. Sampling Shift Real Concept Drift –Concept Shift Virtual concept drift often occurs with real concept drift.

Systems for Handling Concept Drift Three main approaches –Instance Selection –Instance Weighting –Ensemble Learning (learning with multiple concept descriptions)

Systems for Handling Concept Drift (Instance Selection) The goal is to select instances relevant to the current concept. Usually generalized via a window that moves over recently arrived instances and uses the learnt concepts for prediction only in the immediate future. –The window size can be fixed or heuristically determined (Adaptive).

Systems for Handling Concept Drift (Instance Selection) Case-based editing strategies in case- based reasoning that delete noise, irrelevant cases, and redundant cases are also considered instance selection.

Systems for Handling Concept Drift (Instance Weighting) Uses the ability of some learning algorithms such to process weighted instances –Support Vector Machines Weighting by: –Age –Relevance to the current concept. Instance weighting handles concept drift worse than analogous instance selection techniques. –Likely due to data overfitting.

Systems for Handling Concept Drift (Ensemble Learning) Maintains a set of: –concept descriptions –predictions of which are combined using voting or weighted voting –most relevant description Complicated concept descriptions are produced iteratively using feature construction (according to relevance).

All incremental ensemble approaches use some criteria to dynamically delete, reactivate, or create new ensemble members, which are normally based on the base models’ consistency with the current data.

Base Learning Algorithms Rule-Based Learning Decision trees –Including incremental decision trees Naïve Bayes SVMs Radial Basis Functions – networks Instance-Based Learning

Global Eager Learners –Unable to adapt to local concept drift Concept drift is often local –Record highs temps in a part of the world doesn’t necessarily mean that temps around the globe are higher. Local Lazy Learning –able to adapt well to local concept drift due to its nature. –Performs well with disjoint concepts. –Easy to update (Case-Based Learners). –Allows easy sharing of knowledge for some problems. Easier to maintain multiple distributed case-bases.

Common Testing Datasets STAGGER & Moving Hyper-plane –Allow controlling the type and rate of concept drift context recurrence presence of noise irrelevant attributes –Disallow Checking Scalability

Real-World Test Problems –Flight simulator data –Web page access data –Text Retrieval Conference (TREC) –Credit card fraud data –Breast cancer –Anonymous web browsing –US Census Bureau data – data Unfortunately most real-world data sets contain little concept drift.

Theoretical Results A maximal frequency of concept changes (rate of drift) that is acceptable by any learner, implies a lower bound for the size of a window of drifting concepts to be learnable. It is sufficient for a learner to see a fixed number of the most recent instance. Large window sizes in the theoretical bounds would be impractical to employ.

Incremental (Online) Learning vs. Batch Learning Most of the algorithms for handling concept drift consider incremental (online) learning environments as opposed to batch learning. –Because real life data often needs to be processed in an online manner. Data Streams := incremental learning Databases := batch learning

Criteria for Updating the Current Model Many algorithms for handling concept drift employ regular model updates while new data arrive. –Can be very costly An alternative is to detect changes and adapt the model only if inevitable. –Based on the average confidence in correct prediction of the model on new instances –Observes the fraction of instances for which the confidence is below a given threshold.

Cased-Based Criteria –Problem-solution regularity –Problem-distribution regularity May be good measures of quality of a case-base –Real-World: Not easy to apply these measures as triggers for model updating because the drift rate and the level of noise may vary drastically with time.

Conclusions Two kinds of concept drift –Real Hidden Contexts –Virtual Data Distribution Three Basic approaches –Instance Selection –Instance Weighting –Ensemble learning

There are problems with most of the real-world datasets. –These data sets contain little concept drift or contain concept drift that is introduced artificially. Criteria needs to be developed for detecting crucial changes that allow adapting the model only if inevitable. –Triggers are not robust enough to differentiate types of concept drift and different levels of noise.

Thank You