Scaling and Warping in Time Series Querying Dear Reader: This file contains larger, full color versions of the images in “Scaling and Warping in Time Series.

Slides:



Advertisements
Similar presentations
Working with Profiles in IX1D v 3 – A Tutorial © 2006 Interpex Limited All rights reserved Version 1.0.
Advertisements

Indexing DNA Sequences Using q-Grams
A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.
Word Spotting DTW.
Why Systolic Architecture ?. Motivation & Introduction We need a high-performance, special-purpose computer system to meet specific application. I/O and.
 Have a title and use over half the page.  Needs to show trends.  Put the independent variable on the X-axis.  Dependent variable goes on the.
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
Chapter 9: The Normal Distribution
LIAL HORNSBY SCHNEIDER
HCI 530 : Seminar (HCI) Damian Schofield. HCI 530: Seminar (HCI) Transforms –Two Dimensional –Three Dimensional The Graphics Pipeline.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Petrophase 2008 Poster Presentation Title
CBF Dataset Two-Pat Dataset Euclidean DTW Increasingly Large Training.
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which.
Understanding and Comparing Distributions
Keystone Problems… Keystone Problems… next Set 19 © 2007 Herbert I. Gross.
Chemometrics Method comparison
Saeed Ebrahimijam SPRING Faculty of Business and Economics Department of Banking and Finance Doğu Akdeniz Üniversitesi FINA417.
Adobe Forms THE FORM ELEMENT PANEL. Creating a form using the Adobe FormsCentral is a quick and easy way to distribute a variety of forms including surveys.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Example 11.4 Demand and Cost for Electricity Modeling Possibilities.
Data Collection & Processing Hand Grip Strength P textbook.
The Power of Moving Averages in Financial Markets By: Michael Viscuso.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
CS 6825: Binary Image Processing – binary blob metrics
Previous Page Next Page EXIT Created by Professor James A. Sinclair, Ph.D. MMXI Working with Fractions Continued 3 4, Proper fractions Proper fractions.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Developing an Algorithm
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Warm Up 1.What does the data to the right tell you? 2.Are there any trends that you notice about plant height?
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Exact indexing of Dynamic Time Warping
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
COMP 5331 Project Roadmap I will give a brief introduction (e.g. notation) on time series. Giving a notion of what we are playing with.
Day 3 – May 9 – WBL Chapter 2 Kinematics: Description of Motion PC141 Intersession 2013Slide 1 A bit of terminology… Kinematics (the topic of the.
Photo Story. How to use Photo Story Photo Story 3 can be located in the Accessories folder on school computers. You will need to have your pictures already.
Indexes Anthony Sealey University of Toronto This material is distributed under an Attribution-NonCommercial-ShareAlike 3.0 Unported Creative Commons License,
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
Algorithm Analysis with Big Oh ©Rick Mercer. Two Searching Algorithms  Objectives  Analyze the efficiency of algorithms  Analyze two classic algorithms.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.
Kinematics = the study of Motion Kinematics = the study of Motion.
Sequences of numbers appear in diverse situations. If you divide a cake in half, and then divide the remaining half in half, and continue dividing in.
8.2 Estimating Population Means
8.2 Estimating Population Means
Travelling to School.
How to Start This PowerPoint® Tutorial
OptiSystem applications: BER analysis of BPSK with RS encoding
Supervised Time Series Pattern Discovery through Local Importance
Time Series Filtering Time Series
Chapter 8 Search and Sort
6.2 Grid Search of Chi-Square Space
How to Start This PowerPoint® Tutorial
Time Series Filtering Time Series
Module Recognition Algorithms
Simple Case Studies Using MASS
Presentation transcript:

Scaling and Warping in Time Series Querying Dear Reader: This file contains larger, full color versions of the images in “Scaling and Warping in Time Series Querying”. In addition, there are some extra experiments which we could not fit into the paper.

Euclidean DTW Uniform Scaling SWM  If we attempt simple Euclidean matching (after truncating the longer sequence) we get a large error because we are mapping part of the flight of one sequence to the takeoff drive in the other.  If we simply use DTW to match the entire sequences we get a large error because we are trying to explain part of the sequence in one attempt (the bounce from the mat) that simply does not exist in the other sequence.  If we attempt just uniform scaling, we get the best match when we stretch the shorter sequence by 112%. However the local alignment, particularly of the takeoff drive and up-flight is quite poor.  Finally, when we match the two sequences with SWM, we get an intuitive alignment between the two sequences. The global stretching (once again at 112%) allows DTW to align the small local differences. In this case, the fact that DTW needed to map a single point on time series onto 4 points in the other time series suggests an important local difference in one of these sequences. Inspection of the original videos suggest that the athlete misjudged his approach and attempted a clumsy correction just before his takeoff drive. Indexing video: There is increasing interest in indexing sports data, both from sports fans who may wish to find particular types of shots or moves, and from coaches who are interested in analyzing their athletes performance over time. As a concrete example, we consider the high jump. We can automatically collect the athlete’s center of mass information from video and convert a time series (It is possible to correct for the cameras pan and tilt). We found that when we issued queries to a database of high jumps, we only got intuitive answers when doing SWM. It is easy to see why if we look at two particular examples from the same athlete, and consider all possible matching options, as shown in the Figure on the left. From top to bottom: Video plays in presentation mode

C = candidate match Q = query C Q (rescaled 1.54 ) happy birth -day to you dear C Q (rescaled 1.40) 140 Query by Humming: The need for both local and global alignment when working with music has been extensively demonstrated. For completeness we will briefly review it here. In the Figure on the left we demonstrate the problems with universally familiar piece of music, Happy Birthday to You. For clarity of illustration, the music was produced by the second author on a keyboard and converted into a pitch contour. Two performances of Happy Birthday to You aligned with different metrics. Both performances were performed in the same key, but are shifted in the Y-axis for visual clarity. (top) Because the query sequence was performed at a much faster tempo, direct application of DTW fails to produce an intuitive alignment. (center) Rescaling the shorter performance by a scaling factor of 1.54 seems to improve the alignment, but note for example that the higher pitched note produced on the third “birth..” of the candidate is forced to align with the lower note of the third “happy..” in the query. (bottom) Only the application of both uniform scaling and DTW produces the correct alignment. Click to Play sound files

Pruning Power The following slide shows how the pruning power of the proposed lower bounding measure varies as the lengths of data change on different datasets. –For a majority of datasets, the pruning power increased with the length of data, suggesting that the proposed algorithm is likely to perform well in real-life environment, in which long sequences of data are collected for a long period of time. –More than 60% of the datasets obtained a pruning power above 90%. All but two of the datasets exhibited a pruning power of over 60% at length Even at length 16, over 60% pruning power was achieved in three-fourths of the datasets.

Average Pruning Power The following slide shows the pruning power averaged over all datasets; 87% of data sequences of length 1024 and 65% of data sequences of length 16 did not require computation of the actual time warping distances.

Pruning Power – Raw Numbers Pruning Power vs. Length of Data Dimensi o n EEGERP Data Reality C h e c k ATTASBallbeam Buoy S e n s o r BurstBurstinChaoticCSTRDarwinEarthquakeEegEvaporator Dimensi o n Foetal ECG Glass F ur n a c e Great L a k e s Koski ECGLeleccumMemoryNetworkOceanOcean ShearPacket PGT50 A lp h a PGT50 C D C1 5 Power D a t a Power P l a n t Dimensi o n Random W al k Robot ArmShuttleSoil TempSpeech Spot E x r a t e s Standard & Poo r Steamgen Synthetic Cont rol TideTongueWindingWoolAverage

Varying Scaling Factor The following slide shows the effect of varying the range of allowed scaling factors on pruning power. –Note the x-axis indicates the upper bound range of allowed scaling factor. The lower bound range of allowed scaling factor is the reciprocal of the upper bound. For instance, the label 2.0 indicates that the range of allowed scaling factor is between 1 / 2.0 = 0.5 and 2.0. In particular, the label 1.0 indicates that the time warping distance was calculated without scaling. It also implied that the size of the range was not increasing linearly. –However, the important observation is that for all sizes of ranges, a pruning power of over 90% was achieved in nearly three- fourths of the datasets. –For almost all datasets, the pruning powers never dropped below 60%.

Varying Scaling Factor We note that vigorously fluctuating datasets are far less common than smooth datasets. –The following slide illustrates this claim by showing the pruning power averaged over all the datasets, as the range of allowed scaling factor changes. For most ranges of scaling factors, the pruning powers achieved are above 90%.

Pruning Power vs. Scaling Factor SFEEGERP Data Reality Check ATTAS Ballbea m Buoy Sensor BurstBurstinChaoticCSTRDarwinEarthquakeEeg Evaporato r SFFoetal ECG Glass Furnace Great Lakes Koski ECG Leleccu m MemoryNetworkOceanOcean ShearPacket PGT50 Alpha PGT50 CDC15 Power Data Power Plant SF Random Walk Robot ArmShuttle Soil Temp Speech Spot Exrates Standard & Poor Steamg en Synthetic Control TideTongueWindingWoolAverage

Relative Page Access The following slide shows the relative page access of the datasets versus the length of data. –The relative page access varied significantly from approximately less than 0.2 and up to less than 1.8. –For short data of length 16, the relative page access is almost always larger than 1, suggesting that it is generally not a wise idea to use index for short data. However, as the length of data increases, the relative page access decreases in general, as evident in the slide following next, which shows that the relative page access decreases as the length of data increases. –This is an important result for the proposed index, signaling that the index is likely to perform progressively better as the length of data increases.

Average Relative Page Access The following slide also shows that the relative page access approaches 1 when the length of data is between 64 and 128. –This suggests that the use of index should be considered if the length of data is greater than 64. –The fact that about half of the datasets achieved a relative page access below 1 at length 64 and that over 70% of the datasets achieved less-than-one relative page access at length 1024 backed up the above claim.

Relative Page Access vs. Length of Data Dimensi o n EEGERP Data Reality C h e c k ATTASBallbeam Buoy S e n s o r BurstBurstinChaoticCSTRDarwinEarthquakeEegEvaporator Dimensi o n Foetal ECG Glass F ur n a c e Great L a k e s Koski ECGLeleccumMemoryNetworkOceanOcean ShearPacket PGT50 A lp h a PGT50 C D C1 5 Power D a t a Power P l a n t Dimensi o n Random W al k Robot ArmShuttleSoil TempSpeech Spot E x r a t e s Standard & Poo r Steamgen Synthetic Cont rol TideTongueWindingWoolAverage Relative Page Access – Raw Numbers

Query Time The following slide shows the actual running time of the range queries as calculated by the difference between two calls to gettimeofday before and after the queries. –The time is averaged over all 50 queries performed for each length of data of each dataset. –It shows that the query generally runs very fast. All queries completed within a fraction of a second for all datasets of length 16 and 32, and all queries completed in the magnitude of minutes. –Note the logarithmic scale in both axes.

Average Query Time The following slide shows the running time averaged over all datasets. –It suggested that most queries actually completed well within a minute, even for the larger length of data. And even for the largest length of data, queries completed in 560 seconds on average. –Recall from previous slides that linear scan perform better for datasets of shorter lengths; however, as the following slide shows, the query time for those datasets is actually not significant anyway. –Moreover, the proposed index can significantly reduce the query time for datasets of longer lengths. Combining both advantages, our proposed index is capable as an all-round solution suitable for datasets of all lengths.

Query Time vs. Length of Data Dimensi o n EEGERP Data Reality C h e c k ATTASBallbeam Buoy S e n s o r BurstBurstinChaoticCSTRDarwinEarthquakeEeg Evaporato r Dimensi o n Foetal ECG Glass F ur n a c e Great L a k e s Koski ECGLeleccumMemoryNetworkOceanOcean ShearPacket PGT50 A lp h a PGT50 C D C1 5 Power Data Power P l a n t Dimensi o n Random W al k Robot ArmShuttleSoil TempSpeech Spot E x r a t e s Standard & Poo r Steamgen Synthetic Cont rol TideTongueWindingWoolAverage Query Time – Raw Numbers