Variable Penalty Dynamic Time Warping For Aligning Chromatography Data David Clifford Research Scientist June 2009
CSIRO Issues in aligning multiple - MS spectra Talk Outline Gas Chromatography Mass Spectrometry Examples and Properties Dynamic time warping – origins in speech recognition Uses in the 21 st century aligning GC-MS data Central Idea of the talk – variable penalty DTW, joint work with Glenn Stone Results of alignment and How to do it
CSIRO Issues in aligning multiple - MS spectra Gas Chromatography Separates a gas into its constituent parts These elute from machine over period of 40 minutes Measures quantity several times a second Does not identify compounds Gold standard in analytical chemistry Slow process, expensive technology
CSIRO Issues in aligning multiple - MS spectra Uses of Gas Chromatography Wine Chemistry Meat quality Metabolomic studies Data format is similar to Liquid Chromatography-MS etc
CSIRO Issues in aligning multiple - MS spectra Goal of this talk How can we align the two signals How can we align many signals Dynamic time warping – yes but it overdoes the warping Variable penalty DTW – balances warping with alignment needs VPdtw package now available on CRAN
CSIRO Issues in aligning multiple - MS spectra Before and After Alignment
CSIRO Issues in aligning multiple - MS spectra Calling for a taxi…. Matches what you say with database of placenames Dynamic time warping was invented in the late 60s early 70s to do this kind of matching. DTW can expand or contract your words to match placenames DTW is natural choice for matching speech Speed of speech differs between individuals Um’s and ah’s need to be cut out etc. DTW is a very fast algorithm, achieves global optimum
CSIRO Issues in aligning multiple - MS spectra Dynamic Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Dynamic Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra No alignment REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Alignment by Shift REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Linear Transformation (Shift and Stretch) REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Parametric Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Symmetric Dynamic Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Asymmetric Dynamic Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Sakoe-Chiba DTW (bound on shift) Memory efficient variation of DTW – faster method REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Dynamic Time Warping Guaranteed global optimum, but lots of non-diagonal moves REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Dynamic Time Warping REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra DTW and GC-MS DTW overdoes the warping…. Let’s examine the path REFERENCE Q U E R Y
CSIRO Issues in aligning multiple - MS spectra Rotate our view – it’s a complicated warp
CSIRO Issues in aligning multiple - MS spectra Paths found with two different penalties
CSIRO Issues in aligning multiple - MS spectra Why do we need to care about this Analysis is based on peak area – and overwarping will affect peak shape and area. Overwarping introduces artificial features into data. Overwarping occurs due to too many non-diagonal moves Solution #1: penalise non-diagonal moves Solution #2: variable penalty dependent on size of peaks
CSIRO Issues in aligning multiple - MS spectra Variable penalty DTW Minimise over paths w Choose penalty vector using a dilation of the signals Large penalty with large peaks Minimise this function using dynamic programming Easy to implement How does it compare to DTW, constant penalty DTW, and parametric time warping?
CSIRO Issues in aligning multiple - MS spectra Key Ingredient for VPdtw Penalty vector – proportional to a dilation of the signal. There is some subjectivity here to balance the need for alignment with the affect on raw signals.
CSIRO Issues in aligning multiple - MS spectra Before Alignment – can’t see detail but
CSIRO Issues in aligning multiple - MS spectra Check Alignment #1
CSIRO Issues in aligning multiple - MS spectra Check Alignment #2
CSIRO Issues in aligning multiple - MS spectra Check Alignment #3
CSIRO Issues in aligning multiple - MS spectra How far are points moved by alignment?
CSIRO Issues in aligning multiple - MS spectra VPdtw package – now on CRAN, GPL 2 VPdtw, dilation, plot.VPdtw, print.VPdtw result <- VPdtw(reference, query, penalty, maxshift = 350) print(result) plot(result,”Before”) plot(result,”After”) plot(result,”Shifts”) plot(result) Many queries, one penalty One query, many penalties Reference can be NULL
CSIRO Issues in aligning multiple - MS spectra Comparisons – Time
CSIRO Issues in aligning multiple - MS spectra Summary Introduced GC-MS data This talk is really about improving data quality Improvement via alignment without data reduction without unnatural features via fast computation VPdtw available on CRAN Faster Better than available alternatives
CSIRO Issues in aligning multiple - MS spectra References DTW: Vintsyuk, T. K. Kibernetika Sakoe, H., and Chiba, S. Proceedings of the International Congress on Acoustics, Budapest, Hungary, 1971; paper 20 c 13. Parametric Time Warping: Eilers, P.H.C. Anal. Chem Alignment Using Variable Penalty Dynamic Time Warping by Clifford, Stone, Montoliu, Rezzi, Martin, Guy, Bruce and Kochhar. Anal. Chem., 2009, 81 (3), pp 1000–1007
Thank you Statistical Bioinformatics - Agribusiness David Clifford Research Scientist CSIRO Division of Mathematics, Informatics and Statistics Phone: Web: Contact Us Phone: or Web:
CSIRO Issues in aligning multiple - MS spectra VPdtw package – plot(result,”Before”)
CSIRO Issues in aligning multiple - MS spectra VPdtw package – plot(result,”After”)
CSIRO Issues in aligning multiple - MS spectra VPdtw package – print(result) Reference is NULL. Query column # 13 is chosen at random. Query matrix is made up of 16 samples of length Single Penalty vector supplied by user. Max allowed shift is 150. Cost Overlap Max Obs Shift # Diag Moves # Expanded # Dropped Query #1: Query #2: Query #3: Query #4: Query #5: Query #6: Query #7: Query #8: Query #9: Query #10: Query #11: Query #12: Query #13: Query #14: Query #15: Query #16: