Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN.

Similar presentations


Presentation on theme: "Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN."— Presentation transcript:

1 Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN KEOGH DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING REPORTED BY WANG YAWEN

2 Outline  Introduction  Definitions and Notation  MDL Modeling of Time Series  Algorithm  Experimental Evaluation  Complexity  Conclusion

3 Introduction  Choose the best representation and abstraction level  Discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series  Select the best parameters for particular algorithms  An important sub-routine in algorithms for classification, clustering and outlier discovery  Minimal Description Length(MDL) fame work

4 Introduction  Dimension reduction  Discrete Fourier Transform(DFT)  Discrete Wavelet Transform(DWT)  Adaptive Piecewise Constant Approximation(APCA)  Piecewise Linear Approximation(PLA)  Choose the best abstraction level and/or representation of the data for a given task/dataset  Useful in its own right to understand/describe the data and an important sub-routine in algorithms for classification, clustering and outlier discovery

5 Introduction  Actual cardinality: 14, 500, 62  Intrinsic cardinality: 2, 2, 12

6 Introduction  Objective  Not simply save memory  Increasing interest in using specialized hardware for data mining, but the complexity of implementing data mining algorithms in hardware typically grows super linearly with the cardinality of the alphabet  Some data mining benefit from having the data represented in the lowest meaningful cardinality

7 Introduction  Objective  Most time series indexing algorithms critically depend on the ability to reduce the dimensionality or the cardinality of the time series, and searching over the compacted representation in main memory  Remove the spurious precision induced by a cardinality/dimensionally that is too high in resource- limited devices  Create very simple outlier detection models

8 Introduction  MDL framework  Automatically discover the parameters that reflect the intrinsic model/cardinality/dimensionally of the data  Without requiring external information or expensive cross validation search

9 Definitions and Notations  MDL is defined for discrete values  Reduce the original number of possible values to a manageable amount  The quantization makes no perceptible difference

10 Definitions and Notations

11  How many bits it takes to represent a time series T

12 Definitions and Notations  Convert a given time series to other representation or model  DFT, APCA, PLA

13 Definitions and Notations  DL(H): model cost  DL(T|H): correction cost(description cost or error term)  DL(T|H) = DL(T-H)

14 MDL Modeling of Time Series

15  APCA  Mean 8  16 possible values, DL(H) = 4

16 MDL Modeling of Time Series

17

18 Algorithm  Discover the intrinsic cardinality and dimensionality of an input time series  Find the right model or data representation for the given time series

19 Algorithm

20  APCA  Constant lines  Dimensionality: m/2  d constant segments  d-1 pointers to Indicate the offset of the end of each segment

21 Algorithm  PLA  Starting value  Ending value  Ending offset

22 Algorithm  DFT  Linear combination of sine waves  Half set of all coefficients  Subsets of half coef to approximately regenerate T  Sort by absolute value  Use top-d coefficients  inverseDFT  Constant bits(32 bits) for max and min value of the real parts and of the imaginary parts Hence

23 Experimental Evaluation  A detailed example on a famous problem  Baseline  L-Method: explain the residual error vs. size-of-model curve using all possible pairs of two regression lines  10  Bayesian Information Criterion based method  4

24 Experimental Evaluation  An example application in physiology

25 Experimental Evaluation  An example application in astronomy  Anomaly detector

26 Experimental Evaluation  An example application in cardiology

27 Experimental Evaluation  An example application in geosciences

28 Complexity  Space complexity  Linear in the size of the original data  Time complexity  O(mlog 2 m)

29 Conclusion  Simple methodology based on MDL  Robustly specify the intrinsic model, cardinality and dimensionality of time series data from a wide variety of domains  General and parameter-free


Download ppt "Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN."

Similar presentations


Ads by Google