Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

SAX: a Novel Symbolic Representation of Time Series
Indexing DNA Sequences Using q-Grams
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Word Spotting DTW.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Efficient Anomaly Monitoring over Moving Object Trajectory Streams joint work with Lei Chen (HKUST) Ada Wai-Chee Fu (CUHK) Dawei Liu (CUHK) Yingyi Bu (Microsoft)
Mining Time Series.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Chapter 8 File organization and Indices.
Efficient Query Filtering for Streaming Time Series
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
1. 2 General problem Retrieval of time-series similar to a given pattern.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Detecting Time Series Motifs Under
Using Relevance Feedback in Multimedia Databases
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
A Multiresolution Symbolic Representation of Time Series
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Exact Indexing of Dynamic Time Warping
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
Reference-Based Indexing of Sequence Databases (VLDB ’ 06) Jayendra Venkateswaran Deepak Lachwani Tamer Kahveci Christopher Jermaine Presented by Angela.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.
k-Shape: Efficient and Accurate Clustering of Time Series
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Exact indexing of Dynamic Time Warping
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
15 th ACM GIS: Seattle, WA: Nov 7—9, 2007 TS2-Tree: An Efficient Similarity Based Organization for Trajectory Data Petko Bakalov Eamonn Keogh Vassilis.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Fast Indexes and Algorithms For Set Similarity Selection Queries M. Hadjieleftheriou A.Chandel N. Koudas D. Srivastava.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Supervised Time Series Pattern Discovery through Local Importance
Query in Streaming Environment
Majkowska University of California. Los Angeles
Spatio-temporal Pattern Queries
Distance Functions for Sequence Data and Time Series
Robust Similarity Measures for Mobile Object Trajectories
Finding Fastest Paths on A Road Network with Speed Patterns
Similarity Search: A Matching Based Approach
Time Relaxed Spatiotemporal Trajectory Joins
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides for teaching, if You send me an telling me the class number/ university in advance. My name and address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides). You may freely use these slides for a conference presentation, if You send me an telling me the conference name in advance. My name appears on each slide you use. You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, me first, it is highly likely I will grant you permission. (c) Eamonn Keogh,

Indexing Large Human-Motion Databases Eamonn Keogh, Themis Palpanas Victor B. Zordan,Dimitrios Gunopulos University of California, Riverside Marc Cardle University of Cambridge

Themis Palpanas3 VLDB - Aug 2004 Motion Capture records motion data from live actors

Themis Palpanas4 VLDB - Aug 2004 Motion Capture records motion data from live actors used for data-driven animation

Themis Palpanas5 VLDB - Aug 2004 Motion Capture in Games Industry Street NBA Madden

Themis Palpanas6 VLDB - Aug 2004 Motion Capture in Movie Industry Troy Lord of the Rings

Themis Palpanas7 VLDB - Aug 2004 Motivation motion capture data  segmented in short sequences, stored in motion libraries  composed to create long, realistic motion sequences important to find similar sequences  form pool of similar sequences  choose the most promising, to continue the motion

Themis Palpanas8 VLDB - Aug 2004 Motivation Dynamic Time Warping (DTW)  Considers only local adjustments in time, to match two time series  However sometimes global adjustments are required DTW is being extensively used uniform scaling is complementary  combination of both techniques offers rich, high-quality result set DTW Uniform Scaling

Themis Palpanas9 VLDB - Aug 2004 Uniform Scaling time series  query, Q, length n  candidate, C, length m (m>n)

Themis Palpanas10 VLDB - Aug 2004 Uniform Scaling time series  query, Q, length n  candidate, C, length m (m>n) stretch Q to length p (n≤p≤m): Q p  Q p j = Q┌ j*n/p ┐, 1 ≤ j ≤ p scaling factor, sf = p/n  max scaling factor, sf max = m/n QpQp

Themis Palpanas11 VLDB - Aug 2004 Problem Statement given  time series, Q  database of candidate time series, {D} find argmin p { dist(Q p, {D} ) }  dist(Q p, {D} )= Euclidean Distance between time series

Themis Palpanas12 VLDB - Aug 2004 Problem Statement given  time series, Q  database of candidate time series, {D} find argmin p { dist(Q p, {D} ) }  dist(Q p, {D} )= Euclidean Distance between time series challenges  quickly solve the problem for two time series  extend solution to scale-up to large time series databases

Themis Palpanas13 VLDB - Aug 2004 Outline Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions

Themis Palpanas14 VLDB - Aug 2004 Best Uniform Scaling Match brute force algorithm:  for each time series in {D} for each sf, 1 ≤ sf ≤ sf max compute distance between the two time series find the best overall match time complexity: O(|D|(m-n))  extremely expensive!

Themis Palpanas15 VLDB - Aug 2004 Lower Bounding Uniform Scaling lower bound distance between two time series, for any sf, 1 ≤ sf ≤ sf max desiderata:  fast to compute  tight bound results in fast pruning of candidates that are guaranteed not to belong to the solution  compute distance only for time series not pruned by lower bound

Themis Palpanas16 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between C m = 100

Themis Palpanas17 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between build envelopes, length 80: U L n = 80 U i = max( C  (i-1)*m/n  +1,…, C  i*m/n  ) L i = min( C  (i-1)*m/n  +1,…, C  i*m/n  )

Themis Palpanas18 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between build envelopes, length 80: Q U i = max( C  (i-1)*m/n  +1,…, C  i*m/n  ) L i = min( C  (i-1)*m/n  +1,…, C  i*m/n  )

Themis Palpanas19 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between build envelopes, length 80: U i = max( C  (i-1)*m/n  +1,…, C  i*m/n  ) L i = min( C  (i-1)*m/n  +1,…, C  i*m/n  )

Themis Palpanas20 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between compute lower bound:

Themis Palpanas21 VLDB - Aug 2004 Envelope Indexing dimensionality of envelopes is high points

Themis Palpanas22 VLDB - Aug 2004 Envelope Indexing dimensionality of envelopes is high  reduce dimensionality by approximating them Piecewise Constant Approximation points

Themis Palpanas23 VLDB - Aug 2004 Envelope Indexing dimensionality of envelopes is high  reduce dimensionality by approximating them Piecewise Constant Approximation assume query Q, length Q

Themis Palpanas24 VLDB - Aug 2004 Envelope Indexing dimensionality of envelopes is high  reduce dimensionality by approximating them Piecewise Constant Approximation assume query Q, length 80  we approximate it with 8 points

Themis Palpanas25 VLDB - Aug 2004 Envelope Indexing dimensionality of envelopes is high  reduce dimensionality by approximating them Piecewise Constant Approximation assume query Q, length 80  approximated with 8 points compute approximation of lower bound:

Themis Palpanas26 VLDB - Aug 2004 Algorithms for Secondary Storage use a multidimensional index  VA-file -> FastScan algorithm  R-tree -> RtreeProbe algorithm 2-pass algorithms: 1. scan approximated envelopes, prune search space 2. find exact answer using original series

Themis Palpanas27 VLDB - Aug 2004 Outline Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions

Themis Palpanas28 VLDB - Aug 2004 Datasets Used motion capture  data from 124 sensors placed on human actors mixed bag  time series coming from: medicine, manufacturing, environmental monitoring, economics, sensor data experimented with time series databases of:  size 5,000 – 80,000  time series length 64 – 1,024 points

Themis Palpanas29 VLDB - Aug 2004 Main Memory Experiments assume database fits in memory measure pruning power:  fraction of times each approach calls distance function our technique:  1 order of magnitude faster than CD-criterion

Themis Palpanas30 VLDB - Aug 2004 Main Memory Experiments assume database fits in memory measure pruning power:  fraction of times each approach calls distance function our technique:  1 order of magnitude faster than CD-criterion  3 orders of magnitude faster than brute force brute force

Themis Palpanas31 VLDB - Aug 2004 Disk-Based Experiments comparison of:  brute force  FastScan  RtreeProbe

Themis Palpanas32 VLDB - Aug 2004 Disk-Based Experiments comparison of:  FastScan  RtreeProbe

Themis Palpanas33 VLDB - Aug 2004 Disk-Based Experiments comparison of:  FastScan  RtreeProbe

Themis Palpanas34 VLDB - Aug 2004 Case Study video

Themis Palpanas35 VLDB - Aug 2004 Outline Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions

Themis Palpanas36 VLDB - Aug 2004 Related Work Dynamic Time Warping (DTW)  [Yi & Faloutsos’00][Keogh’02][Zhu & Shasha’03][Fung & Wong’03] Longest Common SubSequence (LCSS)  [Das et al.’97][Vlachos et al.’03] uniform scaling  [Argyros & Ermopoulos’03]

Themis Palpanas37 VLDB - Aug 2004 Outline Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions

Themis Palpanas38 VLDB - Aug 2004 Conclusions studied utility of uniform scaling similarity matching  applications in: motion capture libraries, music retrieval, historical handwritten archives introduced first lower bounding technique proposed indexing method for bounding envelopes  suitable for very large time series databases experimentally evaluated efficiency of technique demonstrated quality of results with real motion capture data

Themis Palpanas39 VLDB - Aug 2004 Outline

Themis Palpanas40 VLDB - Aug 2004 Lower Bounding Uniform Scaling assume:  candidate C, length 100  query Q, length 80  wish to find best match for any scaling of Q between build envelopes, length 80: U i = max( C  (i-1)*m/n  +1,…, C  i*m/n  ) L i = min( C  (i-1)*m/n  +1,…, C  i*m/n  )