© 2009 IBM Corporation DUST: A Generalized Notion of Similarity between Uncertain Time Series Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore,

Slides:



Advertisements
Similar presentations
The Complex Number System
Advertisements

Informational Complexity Notion of Reduction for Concept Classes Shai Ben-David Cornell University, and Technion Joint work with Ami Litman Technion.
Differential Equations Brannan Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved. Chapter 08: Series Solutions of Second Order Linear Equations.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Fast Algorithms For Hierarchical Range Histogram Constructions
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Mutual Information Mathematical Biology Seminar
Regressions and approximation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
1 Validation and Verification of Simulation Models.
8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Bootstrapping applied to t-tests
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Chapter 12 Review of Calculus and Probability
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Finding Limits Algebraically Chapter 2: Limits and Continuity.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Mining Time Series.
Great Theoretical Ideas in Computer Science.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
This is an example of an infinite series. 1 1 Start with a square one unit by one unit: This series converges (approaches a limiting value.) Many series.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
Interval Notation Interval Notation to/from Inequalities Number Line Plots open & closed endpoint conventions Unions and Intersections Bounded vs. unbounded.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
Statistics Presentation Ch En 475 Unit Operations.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
From the population to the sample The sampling distribution FETP India.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Using Graphs and Tables to Solve Linear Systems 3-1
Computer Performance Modeling Dirk Grunwald Prelude to Jain, Chapter 12 Laws of Large Numbers and The normal distribution.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
Solving systems of equations
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Supervised Time Series Pattern Discovery through Local Importance
Lecture 05: K-nearest neighbors
Central Limit Theorem, z-tests, & t-tests
Trigonometric Identities
School of Computer Science & Engineering
Statistics Presentation
Pattern Recognition PhD Course.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Counting Statistics and Error Prediction
Dependencies in Structures of Decision Tables
Lecture 03: K-nearest neighbors
Chapter 12: Limits, Derivatives, and Definite Integrals
Feature Selection Methods
Math review - scalars, vectors, and matrices
Applied Statistics and Probability for Engineers
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Approximation of Functions
Presentation transcript:

© 2009 IBM Corporation DUST: A Generalized Notion of Similarity between Uncertain Time Series Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore, India

© 2009 IBM Corporation Uncertainty in Data  Uncertainty introduced due to massive amount of sensor data Server Millions of Sensors Analytics Business Decisions  Privacy preserving techniques  A certain degree of uncertainty is sometimes intentionally introduced 2

© 2009 IBM Corporation Outline  Motivation  Generalized Distance Measure – Properties of a Distance Measure – Algebraic Derivation  DUST Distance – Computation – Properties – Examples  Results – Setup – Classification, Motif Detection, 1-NN search  Conclusion 3

© 2009 IBM Corporation What does Uncertain Data Look Like? 4 x = r(x) + ε(x) observed value real value error error distribution observedoriginalerror Uncertain Time Series

© 2009 IBM Corporation Data Mining on Uncertain Time Series ClusteringClassificationPattern Discovery… Require at least a partial order on the distances between time series elements However, a total order between the distances is better We need a distance function to measure the distance between uncertain time series elements Are x and x’ closer than y and y’ ? Ensures that all pairs are comparable Easy to store the distance and manage it later

© 2009 IBM Corporation Distance between Uncertain Time Series 6 T1T1 T2T2 T3T3 time value T1T1 T2T2 T3T3 time value T1T1 T2T2 T3T3 time value Is T 2 closer to T 1, or is T 3 closer to T 1 ? Doesn’t Matter Clearly T 3 T 2 or T 3 ???

© 2009 IBM Corporation How to Measure the Distance between two Time Series Elements? 7 x = r(x) + ε(x)x’ = r(x’) + ε(x’) Consider two values Axiom: The distance between x and x’, should say something about the distance between normal Euclidean distance between r(x) and r(x’) Prior Approaches Compute the apriori probability distribution of the random variable X = (r(x) – r(x’)) Work with only the mean and standard deviation of X X is not a distance measure. It is hard to work with probabilities. 1 2

© 2009 IBM Corporation Resolving the Question  T 2 should be closer to T 1 than T 3 – This is because it is possible that T 2 and T 1 are the same time series. T 2 just has some additional error. – T 3 and T 1 can never be the same time series because the last value has a very large divergence 8 T1T1 T2T2 T3T3 time value T 2 or T 3 ??? Euclidean distance (EUCL) and Dynamic Time Warping (DTW) T3T3 DUST T2T2

© 2009 IBM Corporation Outline  Motivation  Generalized Distance Measure – Properties of a Distance Measure – Algebraic Derivation  DUST Distance – Computation – Properties – Examples  Results – Setup – Classification, Motif Detection, 1-NN search  Conclusion 9

© 2009 IBM Corporation Arriving at a Distance Measure 10 Properties of a Distance Measure 1. Non-negativity: d(A,B) ≥ 0 2. Identity of Indiscernibles: d(A,B) = 0 iff A= B 3.Symmetry: d(A,B) = d(B,A) 4.Triangle Inequality: d(A,B) + d(A,C) ≥ d(B,C) 5. The distance should be similar to EUCL or DTW if the magnitude of the error is small. (Extra Condition for an uncertain distance measure)

© 2009 IBM Corporation Extending Prior Work 11 Two time series are considered similar if : P(DIST(T 1,T 2 ) ≤ ε) ≥ τ DIST(T 1, T 2 ) = sqrt(Σ i dist(T 1 [i], T 2 [i]) 2 ) dist(x,y) = |x-y| Assumption P(DIST(T 1,T 2 ) ≤ ε) = p(DIST(T 1,T 2 ) = 0) ε (irrespective of the size of ε) Prior Work

© 2009 IBM Corporation12 -log (φ(|T 1 [i] – T 2 [i]|) Some Algebra P(DIST(T 1,T 2 ) ≤ ε) > P(DIST(T 1,T 3 ) ≤ ε) p(DIST(T 1,T 2 ) = 0) > p(DIST(T 1,T 3 ) = 0) Π i p(dist(T 1 [i], T 2 [i]) = 0) > Π i p(dist(T 1 [i], T 3 [i]) = 0) Σ i –log(p(dist(T 1 [i], T 2 [i]) = 0)) ≤ Σ i –log(p(dist(T 1 [i], T 3 [i]) = 0)) ≈ φ(x) = p(dist(0,x) = 0) dist(x,y) is only dependent on |x-y| proved in the paper dust(x,y) = -log(φ(|x-y|)) + log(φ(0) Definition

© 2009 IBM Corporation Some Algebra - II 13 P(DIST(T 1,T 2 ) ≤ ε) > P(DIST(T 1,T 3 ) ≤ ε) Σ i –log(p(dist(T 1 [i], T 2 [i]) = 0)) ≤ Σ i –log(p(dist(T 1 [i], T 3 [i]) = 0)) ≈ dust(x,y) = -log(φ(|x-y|)) + log(φ(0) Definition Σ i dust(T 1 [i], T 2 [i]) 2 ≤ Σ i dust(T 1 [i], T 3 [i]) 2 Definition DUST(T 1, T 2 ) =Σ i dust(T 1 [i], T 2 [i]) 2 DUST(T 1, T 2 ) ≤ DUST(T 1, T 3 ) DUST behaves like a standard distance measure T1T1 T3T3 T2T2 time value

© 2009 IBM Corporation Outline  Motivation  Generalized Distance Measure – Properties of a Distance Measure – Algebraic Derivation  DUST Distance – Computation – Properties – Examples  Results – Setup – Classification, Motif Detection, 1-NN search  Conclusion 14

© 2009 IBM Corporation Computing the DUST Distance 15 Compute dust(0,Δx) 1. Assume values are independent 2. Use Bayes’ Theorem 3. Arrive at final solution through numerical integration Δ xΔ x Original distribution of data error distribution dust(0, Δ x) Offline Computation Online Computation Δ xΔ x Check the last segment in the lookup table  Save the values in a lookup table  Compress it using a piece-wise linear representation Perform a binary search to find the right segment calculate value dust(0, Δ x) Yes No |x-y| dust(0,Δx)

© 2009 IBM Corporation The dust Distance 16 Normal DistributionOther Distributions The dust distance is exactly the same as Euclidean distance for the Normal distribution dust ultimately converges with Euclidean distance

© 2009 IBM Corporation Combining Multiple Distributions 17 Let the values in a time series have different error distributions f 1 … f n. Let their standard deviations be σ 1 … σ n. Let us choose σ e = min (σ 1, …, σ n )/5 Adjusted f’(x) η 1 ≤ x ≤ η 2 x < η 1 x > η 2 f(x) N (0, σ e ) η1η1 η2η2 Not interested Interested T1T1 T2T2 NormalUniformExponential

© 2009 IBM Corporation Combining Multiple Normal Distributions 18 Combining multiple normal distributions with different Standard deviations Converge to the same distance func.

© 2009 IBM Corporation19 Results

© 2009 IBM Corporation Classification Accuracy 20 No Error : 77%, DUST: 72%, Euclidean Distance: 62%

© 2009 IBM Corporation Classification Accuracy: Dynamic Time Warping 21 No Error : 78%, DUST: 74%, Euclidean Distance: 67%

© 2009 IBM Corporation Top-k Motifs : EEG Dataset 22 Anomalous Behavior Superior performance of DUST

© 2009 IBM Corporation #of Matches vs Standard Deviation for k- NN classification – wafer dataset 23 DUST Euclidean Dist.

© 2009 IBM Corporation Conclusions  Uncertainty in data is increasingly prevalent in – Sensor data – Privacy preserving techniques  Conventional approaches – Don’t produce good results with mining uncertain data  Propose novel metric DUST – Incorporates theoretical measures of similarity – Easy to compute  DUST makes up for half the accuracy lost due to uncertainty 24

© 2009 IBM Corporation DUST: A Generalized Notion of Similarity between Uncertain Time Series Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore, India