1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1
2 OUTLINE Introduction Entropy and Concept Shift Adaption Calculating Entropy on Data Streams Algorithm Control Strategy using Entropy Measure Experimental setup Experimental Results Discussion of the Experiments Application to a Real-World Problem: Context Switches in Sensor Data Limitations, Future Work, and Conclusion 2
3 INTRODUCTION Problem: In many applications data is gathered over time, which raises the problem that the concepts to be learned may drift (i.e., change) over time. The increasing amount of data (e.g., multimedia content, data warehouse ) and limitation of computing power due to miniaturization (e.g., wearable computing) call for faster and more resource friendly algorithms. Motivation: the analysis of sensor data on wearable devices. 3
4 INTRODUCTION Context-awareness: A Scenario-based Approach for Direct Interruptablity Prediction on Wearable Devices Classifiers predict peoples’ anticipated behavior based on sensory input Contexts (or contextual situations) switch rather than gradually change Contextual information could be reused, even for new, not yet encountered situations An ongoing monitoring of the sensor stream is needed 4
5 INTRODUCTION Problem: online pattern matching mechanism comparing the sensor stream to the entire library of already known contexts is computational complex and not yet suitable for today’s wearable devices. Solution: indicate possible candidates (or hot spots) for context changes limiting the computationally intensive context (re-)determination on those candidates. 5
6 ENTROPY AND CONCEPT SHIFT ADAPTION Assumptions: As long as the distribution of older instances (features and target values) is similar to the distribution of new instances no concept drift occurred A distribution difference between older and more recent instances indicates a change in the target concept Measure the distribution inequality: If two distributions are equal, the entropy measure results in a value of 1 If they are absolutely different the measure will result in a value of 0 6
7 CALCULATING ENTROPY ON DATA STREAMS Sliding window technique: compares two windows, one presenting older and the other representing more recent instances in the stream Compare the two windows by counting and comparing all instances with respect to their class and stream membership Discretize the range of instance values to a fixed number of bins to take the approximate value distribution into account 7
8 CALCULATING ENTROPY ON DATA STREAMS A data stream: a sequence consisting of sequentially ordered tuples in time t i i (1, 2, 3, …) := (, l i ) where is the vector of all feature stream instances s n i at time t i The domain of the label stream l is discrete and contains all class values c C 8
9 CALCULATING ENTROPY ON DATA STREAMS H i : the resulting entropy at time t i and is defined as the mean of all data stream entropies H is at time t i where S : the number of feature-streams H is is calculated from the entropies H iscb
10 CALCULATING ENTROPY ON DATA STREAMS H iscb : represent the entropy of each class ( c C ) and bin ( b B ) given the stream s at time t i Bins: discrete aggregation of the values of each feature stream s : the probability that an instance occurs in the old window at time t i, belong to class c, with feature domain of stream s in bin b w iscb : depend on i, s, c, b
ALGORITHM CONTROL STRATEGY USING ENTROPY MEASURE Instance selection style algorithm 11
EXPERIMENTAL SETUP Real concept drifts: changes in the actual target concepts Virtual concept drifts: changes in the distribution Generate synthetic data set: H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In KDD ‘03: proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining Real drift data set Virtual drift data set Mixed data set 12
EXPERIMENTAL SETUP Performance measure: Accuracy The area under the ROC-curve A representative set of benchmarks: Perfect benchmark: assumes an oracle-given ideal window size ξ for any point in time A selection of ensemble classifiers: the literature so far showed to have the highest accuracy and robustness against noise P. Vorburger and A. Bernstein. Entropy-based detection of real and virtual concept shifts. Working Paper – University of Zurich, Department of Informatics,
EXPERIMENTAL RESULTS 14
EXPERIMENTAL RESULTS The prediction quality against increasing noise levels 15
EXPERIMENTAL RESULTS Computational complexity Compare ensemble classifiers and the entropy measure based algorithm Measure the elapsed time: three committee classifiers: ±15s Entropy based algorithm: 148.6s Entropy calculation without Naïve Bayes model building: 1.1±0.1s 16
DISCUSSION OF THE EXPERIMENT Entropy measure outperforms the ensemble benchmark algorithm on real concept shifts Exhibit a greater predictive power while requiring less computational resources The entropy measure based algorithm showed the nearly the same robustness towards noise as the perfect benchmark and the committee classifiers 17
APPLICATION TO A REAL-WORLD PROBLEM: CONTEXT SWITCHES IN SENSOR DATA Data set: Audio: decomposed into 10 features accelerometer data recorded over a time of 15381s: merged in one single feature The wearable data acquisition set up: a microphone and three three-dimentional accelerometers attached on the subject’s shoulder, wrist, and leg 18
APPLICATION TO A REAL-WORLD PROBLEM: CONTEXT SWITCHES IN SENSOR DATA (A)walking (B)streetcar (C)office work (D)lecture (E)cafeteria (F)meeting 19
LIMITATIONS, FUTURE WORK, AND CONCLUSION Gradual concept drifts Find boundary conditions Recognize recurring concepts and exploit this information Generalizability The choice of the suitable parameters could be optimized 20
LIMITATIONS, FUTURE WORK, AND CONCLUSION Find a measure for detecting and measuring concept shifts as an analogue for context switches Formulation of entropy on data streams is capable to detect and measure concept shifts Algorithm with an entropy based instance selection strategy outperformed ensemble based algorithms on real concept shift data sets Given algorithm robustness towards noise, its sensitivity towards concept shifts, its computational efficiency, and predictive power on real concept shift data sets 21