Dynamic Time Warping Algorithm for Gene Expression Time Series

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

Introduction to Neural Networks Computing
Longest Common Subsequence
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Word Spotting DTW.
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
74 th EAGE Conference & Exhibition incorporating SPE EUROPEC 2012 Automated seismic-to-well ties? Roberto H. Herrera and Mirko van der Baan University.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Branch and Bound Searching Strategies
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Praktikum zur Analyse von Formen - Abstandsmaße - Helmut Alt Freie Universität Berlin.
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Distance Functions for Sequence Data and Time Series
Announcements Take home quiz given out Thursday 10/23 –Due 10/30.
Based on Slides by D. Gunopulos (UCR)
1 Branch and Bound Searching Strategies 2 Branch-and-bound strategy 2 mechanisms: A mechanism to generate branches A mechanism to generate a bound so.
Using Relevance Feedback in Multimedia Databases
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Exact Indexing of Dynamic Time Warping
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment.
CSE554Laplacian DeformationSlide 1 CSE 554 Lecture 8: Laplacian Deformation Fall 2012.
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
CS910: Foundations of Data Analytics Graham Cormode Time Series Analysis.
Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.
Analysis of Constrained Time-Series Similarity Measures
Video Mosaics AllisonW. Klein Tyler Grant Adam Finkelstein Michael F. Cohen.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Mining Time Series.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
1 Similarity-based matching for face authentication Christophe Rosenberger Luc Brun ICPR 2008.
Chapter 2: Getting to Know Your Data
Exact indexing of Dynamic Time Warping
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
COMP 5331 Project Roadmap I will give a brief introduction (e.g. notation) on time series. Giving a notion of what we are playing with.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
Sequence Alignment.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Example 2 You are traveling by a canoe down a river and there are n trading posts along the way. Before starting your journey, you are given for each 1
Dynamic Programming for the Edit Distance Problem.
Accelerometer-Based Character Recognition Pen
Supervised Time Series Pattern Discovery through Local Importance
Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems.
Sequence comparison: Dynamic programming
Distance Functions for Sequence Data and Time Series
Digital Systems: Hardware Organization and Design
Distance Functions for Sequence Data and Time Series
School of Computer Science & Engineering
Digital Systems: Hardware Organization and Design
The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.
Pairwise sequence Alignment.
Cyclic string-to-string correction
Dynamic Time Warping and training methods
Dynamic Programming-- Longest Common Subsequence
Presentation transcript:

Dynamic Time Warping Algorithm for Gene Expression Time Series Elena Tsiporkova

What is Special about Time Series Data? Gene expression time series are expected to vary not only in terms of expression amplitudes, but also in terms of time progression since biological processes may unfold with different rates in response to different experimental conditions or within different organisms and individuals. time

Why Dynamic Time Warping? Any distance (Euclidean, Manhattan, …) which aligns the i-th point on one time series with the i-th point on the other will produce a poor similarity score. A non-linear (elastic) alignment produces a more intuitive similarity measure, allowing similar shapes to match even if they are out of phase in the time axis.

Warping Function Time Series A 1 is n m pk To find the best alignment between A and B one needs to find the path through the grid P = p1, … , ps , … , pk ps = (is , js ) which minimizes the total distance between them. P is called a warping function. js ps Time Series B 1 p1

Time-Normalized Distance Measure Time Series A Time-normalized distance between A and B : 1 is n m pk D(A , B ) = d(ps): distance between is and js ws > 0: weighting coefficient. js ps Best alignment path between A and B : P0 = (D(A , B )). Time Series B 1 p1

reduction of the search space Optimisations to the DTW Algorithm Time Series A The number of possible warping paths through the grid is exponentially explosive! 1 is n m Restrictions on the warping function: monotonicity continuity boundary conditions warping window slope constraint. reduction of the search space js Time Series B 1

Restrictions on the Warping Function Monotonicity: is-1 ≤ is and js-1 ≤ js. The alignment path does not go back in “time” index. Continuity: is – is-1 ≤ 1 and js – js-1 ≤ 1. The alignment path does not jump in “time” index. j j i i Guarantees that features are not repeated in the alignment. Guarantees that the alignment does not omit important features.

Restrictions on the Warping Function Boundary Conditions: i1 = 1, ik = n and j1 = 1, jk = m. The alignment path starts at the bottom left and ends at the top right. Warping Window: |is – js| ≤ r, where r > 0 is the window length. A good alignment path is unlikely to wander too far from the diagonal. r m j j i (1,1) i n Guarantees that the alignment does not consider partially one of the sequences. Guarantees that the alignment does not try to skip different features and gets stuck at similar features.

Restrictions on the Warping Function Slope Constraint: ( jsp – js0) / ( isp – is0) ≤ p and ( isq – is0) / ( jsq – js0) ≤ q , where q ≥ 0 is the number of steps in the x-direction and p ≥ 0 is the number of steps in the y-direction. After q steps in x one must step in y and vice versa: S = p / q [0 , ]. The alignment path should not be too steep or too shallow. j ≤ p ≤ q i Prevents that very short parts of the sequences are matched to very long ones.

The Choice of the Weighting Coefficient Time-normalized distance between A and B : Weighting Coefficient Definitions Symmetric form ws = (is – is-1) + (js – js-1), then C = n + m. Asymmetric form ws = (is – is-1), then C = n. Or equivalently, ws = (js – js-1), then C = m. D(A , B ) = complicates optimisation Seeking a weighting coefficient function which guarantees that: is independent of the warping function. Thus D(A , B ) = can be solved by use of dynamic programming.

Symmetric DTW Algorithm (warping window, no slope constraint) Time Series A g(n, m) 1 i n m Initial condition: g(1,1) = 2d(1,1). DP-equation: g(i, j – 1) + d(i, j) g(i, j) = min g(i – 1, j – 1) + 2d(i, j) . g(i – 1, j) + d(i, j) Warping window: j – r ≤ i ≤ j + r. Time-normalized distance: D(A , B ) = g(n, m) / C C = n + m. i = j - r j 1 1 2 Time Series B 1 i = j + r g(1,1)

Asymmetric DTW Algorithm (warping window, no slope constraint) Time Series A g(n, m) 1 i n m Initial condition: g(1,1) = d(1,1). DP-equation: g(i, j – 1) g(i, j) = min g(i – 1, j – 1) + d(i, j) . g(i – 1, j) + d(i, j) Warping window: j – r ≤ i ≤ j + r. Time-normalized distance: D(A , B ) = g(n, m) / C C = n. i = j - r j 1 1 Time Series B 1 i = j + r g(1,1)

Quazi-symmetric DTW Algorithm (warping window, no slope constraint) Time Series A g(n, m) 1 i n m Initial condition: g(1,1) = d(1,1). DP-equation: g(i, j – 1) + d(i, j) g(i, j) = min g(i – 1, j – 1) + d(i, j) . g(i – 1, j) + d(i, j) Warping window: j – r ≤ i ≤ j + r. Time-normalized distance: D(A , B ) = g(n, m) / C C = n + m. i = j - r j 1 1 1 Time Series B 1 i = j + r g(1,1)

DTW Algorithm at Work Start with the calculation of g(1,1) = d(1,1). Time Series A Calculate the first row g(i, 1) = g(i–1, 1) + d(i, 1). 1 i n m Calculate the first column g(1, j) = g(1, j) + d(1, j). Move to the second row g(i, 2) = min(g(i, 1), g(i–1, 1), g(i – 1, 2)) + d(i, 2). Book keep for each cell the index of this neighboring cell, which contributes the minimum score (red arrows). i = j - r j Carry on from left to right and from bottom to top with the rest of the grid g(i, j) = min(g(i, j–1), g(i–1, j–1), g(i – 1, j)) + d(i, j). Trace back the best path through the grid starting from g(n, m) and moving towards g(1,1) by following the red arrows. Time Series B 1 i = j + r

DTW Algorithm: Example Time Series A -0.87 -0.84 -0.85 -0.82 -0.23 1.95 1.36 0.60 0.0 -0.29 -0.88 -0.91 -0.84 -0.82 -0.24 1.92 1.41 0.51 0.03 -0.18 0.51 0.51 0.49 0.49 0.35 0.17 0.21 0.33 0.41 0.49 0.27 0.27 0.26 0.25 0.16 0.18 0.23 0.25 0.31 0.68 0.13 0.13 0.13 0.12 0.08 0.26 0.40 0.47 0.49 0.49 -0.60 -0.65 -0.71 -0.58 -0.17 0.77 1.94 -0.46 -0.62 -0.68 -0.63 -0.32 0.74 1.97 0.08 0.08 0.08 0.08 0.10 0.31 0.47 0.57 0.62 0.65 0.06 0.06 0.06 0.07 0.11 0.32 0.50 0.60 0.65 0.68 0.04 0.04 0.06 0.08 0.11 0.32 0.49 0.59 0.64 0.66 0.02 0.05 0.08 0.11 0.13 0.34 0.49 0.58 0.63 0.66 Euclidean distance between vectors Time Series B