Based on Slides by D. Gunopulos (UCR)

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

Choosing Distance Measures for Mining Time Series Data
Indexing DNA Sequences Using q-Grams
Patch to the Future: Unsupervised Visual Prediction
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Data Mining: Concepts and Techniques Mining time-series data.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
Distance Functions for Sequence Data and Time Series
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
Spatial and Temporal Data Mining
Cluster Analysis (1).
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Using Relevance Feedback in Multimedia Databases
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Indexing Time Series.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Time Series I.
Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.
Multimedia and Time-series Data
CS910: Foundations of Data Analytics Graham Cormode Time Series Analysis.
Analysis of Constrained Time-Series Similarity Measures
Video Mosaics AllisonW. Klein Tyler Grant Adam Finkelstein Michael F. Cohen.
Dynamic Time Warping Algorithm for Gene Expression Time Series
Similarity based Retrieval from Sequence Databases using Automata as Queries 作者 : A. Prasad Sistla, Tao Hu, Vikas howdhry 出處 :CIKM 2002 ACM 指導教授 : 郭煌政老師.
Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for.
CS 4487/6587 Algorithms for Image Analysis
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Object Recognition Based on Shape Similarity Longin Jan Latecki Computer and Information Sciences Dept. Temple Univ.,
Project Lachesis: Parsing and Modeling Location Histories Daniel Keeney CS 4440.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Fast Subsequence Matching in Time-Series Databases.
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Distance Functions for Sequence Data and Time Series
Time Series Filtering Time Series
Jiawei Han Department of Computer Science
Distance Functions for Sequence Data and Time Series
School of Computer Science & Engineering
The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.
Robust Similarity Measures for Mobile Object Trajectories
Finding Similar Time Series
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Mining: Concepts and Techniques — Chapter 8 — 8
Handwritten Characters Recognition Based on an HMM Model
Time Series Filtering Time Series
Data Mining: Concepts and Techniques — Chapter 8 — 8
Presentation transcript:

Based on Slides by D. Gunopulos (UCR) Indexing Time Series Based on Slides by D. Gunopulos (UCR)

Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Time Series databases Image and video databases Data Mining

Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time intervals Stock price movements Volume of sales over time Daily temperature readings ECG data all NYSE stocks A time series database is a large collection of time series

Time Series Problems (from a databases perspective) The Similarity Problem X = x1, x2, …, xn and Y = y1, y2, …, yn Define and compute Sim(X, Y) E.g. do stocks X and Y have similar movements? Retrieve efficiently similar time series (Similarity Queries)

Types of queries whole match vs sub-pattern match range query vs nearest neighbors all-pairs query

Examples Find companies with similar stock prices over a time interval Find products with similar sell cycles Cluster users with similar credit card utilization Find similar subsequences in DNA sequences Find scenes in video streams

distance function: by expert (eg, Euclidean distance) day $price 1 365 distance function: by expert (eg, Euclidean distance)

Problems Define the similarity (or distance) function Find an efficient algorithm to retrieve similar time series from a database (Faster than sequential scan) The Similarity function depends on the Application

Euclidean Similarity Measure View each sequence as a point in n-dimensional Euclidean space (n = length of each sequence) Define (dis-)similarity between sequences X and Y as p=1 Manhattan distance p=2 Euclidean distance

Advantages Easy to compute: O(n) Allows scalable solutions to other problems, such as indexing clustering etc...

Disadvantages Does not allow for different baselines Stock X fluctuates at $100, stock Y at $30 Does not allow for different scales Stock X fluctuates between $95 and $105, stock Y between $20 and $40

x x3 x2 x1 t

[Goldin and Kanellakis, 1995] Normalization [Goldin and Kanellakis, 1995] Normalize the mean and variance for each sequence Let µ(X) and (X) be the mean and variance of sequence X Replace sequence X by sequence X’, where x’i = (xi - µ (X) )/ (X)

Similarity definition still too rigid But.. Similarity definition still too rigid Does not allow for noise or short-term fluctuations Does not allow for phase shifts in time Does not allow for acceleration-deceleration along the time dimension etc ….

Example

A general similarity framework involving a transformation rules language [Jagadish, Mendelzon, Milo] Each rule has an associated cost

Examples of Transformation Rules Collapse adjacent segments into one segment new slope = weighted average of previous slopes new length = sum of previous lengths [l1, s1] [l2, s2] [l1+l2, (l1s1+l2s2)/(l1+l2)]

Combinations of Moving Averages, Scales, and Shifts [Rafiei and Mendelzon, 1998] Moving averages are a well-known technique for smoothening time sequences Example of a 3-day moving average x’i = (xi–1 + xi + xi+1)/3

Disadvantages of Transformation Rules Subsequent computations (such as the indexing problem) become more complicated We will see later why!

Dynamic Time Warping [Berndt, Clifford, 1994] Allows acceleration-deceleration of signals along the time dimension Basic idea Consider X = x1, x2, …, xn , and Y = y1, y2, …, yn We are allowed to extend each sequence by repeating elements Euclidean distance now calculated between the extended sequences X’ and Y’ Matrix M, where mij = d(xi, yj)

Dynamic Time Warping [Berndt, Clifford, 1994] X Y warping path j = i – w j = i + w

Restrictions on Warping Paths Monotonicity Path should not go down or to the left Continuity No elements may be skipped in a sequence Warping Window | i – j | <= w

Formulation Let D(i, j) refer to the dynamic time warping distance between the subsequences x1, x2, …, xi y1, y2, …, yj D(i, j) = | xi – yj | + min { D(i – 1, j), D(i – 1, j – 1), D(i, j – 1) }

Solution by Dynamic Programming Basic implementation = O(n2) where n is the length of the sequences will have to solve the problem for each (i, j) pair If warping window is specified, then O(nw) Only solve for the (i, j) pairs where | i – j | <= w

Longest Common Subsequence Measures (Allowing for Gaps in Sequences) Gap skipped

Basic LCS Idea Sim(X,Y) = |LCS| or Sim(X,Y) = |LCS| /n Edit Distance is another possibility

Probabilistic Generative Modeling Method [Ge & Smyth, 2000] Previous methods primarily “distance based”, this method “model based” Basic ideas Given sequence Q, construct a model MQ(i.e. a probability distribution on waveforms) Given a new pattern Q’, measure similarity by computing p(Q’|MQ)

The model MQ a discrete-time finite-state Markov model each segment in data corresponds to a state data in each state typically generated by a regression curve a state to state transition matrix is provided

On entering state i, a duration t is drawn from a state-duration distribution p(t) the process remains in state i for time t after this, the process transits to another state according to the state transition matrix

Example: output of Markov Model Solid lines: the two states of the model Dashed lines: the actual noisy observations

Landmarks [Perng et. al., 2000] Similarity definition much closer to human perception (unlike Euclidean distance) A point on the curve is a n-th order landmark if the n-th derivative is 0 Thus, local max and mins are first order landmarks Landmark distances are tuples (e.g. in time and amplitude) that satisfy the triangle inequality Several transformations are defined, such as shifting, amplitude scaling, time warping, etc

Similarity Models Euclidean and Lp based Edit Distance and LCS based Probabilistic (using Markov Models) Landmarks Next, how to index a database for similarity retrieval…

References [Goldin & Kanellakis 95] On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. CP 1995: 137-153 [Jagadish et al. 95] Similarity-Based Queries. PODS 1995: 36-45 [Rafiei and Mendelson] Querying Time Series Data Based on Similarity. TKDE 12(5): 675-693 (2000) [Berndt and Clifford 94] Using Dynamic Time Warping to Find Patterns in Time Series. KDD Workshop 1994: 359-370 [Ge and Smyth 2000] Deformable Markov model templates for time-series pattern matching. KDD 2000: 81-90 [Perng et al. 2000] Landmarks: a New Model for Similarity-based Pattern Querying in Time Series Databases. ICDE 2000: 33-42