Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn Keogh Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M. In Proceedings of the 10 th ACM SIGKDD International Converence on Knowledge Discovery and Data Mining, 2004.
What are Time Series? Simply: Observations of a variable made over time Typical across a wide variety of domains Medicine Physiology Finance Microbiology Meteorology Surveillance
3 Motivation: Critical Decision Making Domains Spacecraft Launch Medicine Research Directions Mining Archives Extract rules, patterns, regularities Visualizing Streams Novel visualization and interaction for: Query by content Motif discovery Anomaly detection
Some Visual Time Series Systems Time Searcher Direct Manipulation Pattern Query Theme Rivers Theme strength over time Spirals Periodic Data with known period dot.com stocks Havre, Hetzler, Whitney & Nowell InfoVis 2000 Hochheiser and Shniederman Weber et. al
VizTree Construct a subsequence tree to span the space of subsequences of a given time series. Use this to collect statistics about the series. Size of the structure is independent of the length of the series.
VizTree Approach - Overview Place windows along the time series to obtain subsequences. Quantize along time and value dimension to obtain sequences of discrete symbols. Construct a subsequence tree to represent all possible such sequences. Collect frequencies of traversal of the branches of the subsequence tree. Use these for motif and anomaly detection, and for comparing time series.
Subsequences Place windows along the time series to obtain subsequences.
Discretization Subsequences are patterns. Take windows along time series – length of window ~ length of subsequence. Discretize the range of data - one symbol for each quantum. Divide window into segments ~ represent one segment with one symbol.
Symbolic Aggregate approXimation (SAX) One subsequence Quantization levels Segments Representative symbols Discrete version = acdcbdba
Subsequence Tree - example a b a c b b a c c b a c symbols={a,b,c} #segments per window=2 Tree spans the space of subsequences. #Branch factor ~ # symbols (size of alphabet) Depth ~ # segments per window Branch thickness ~ freq. of occurrence of subsequence.
VisTree Tool Demo
Query by Content: Subsequence Matching Finding known patterns Chunking Breaking a time series into individual series Methods Time (e.g. power usage) Shape(e.g. heart beats) Search Approaches Exact - Slow Approximate - Fast Exploration Hypothesis Testing VizTree VizTree
Motif Discovery Finding unknown patterns Not exact matches VisTree allows exploration at varying levels of precision E.g., cc** vs. ccac
Anomaly Detection Finding abnormal patterns. Use data already seen to identify anomalies Identified by thin branches
Comparing Series: Diff Tree Same parameters same tree structure Compare the test branch frequencies with respect to reference branch frequencies Blue = underrepresented Green = overrepresented Red = equivalent Thickness = magnitude
Thoughts on VizTree (Vis.) Most of “discovery” is implicit Manual search Parameter setting might be an issue Automation might help Tree Visualization Use of real estate? Effective? Intuitive? Alternatives?
Thoughts on VizTree (HCI) Primarily a tool to for researchers now (Also, we might have an outdated version) Even so, some HCI suggestions: Indication of how tree detail relates to tree overview Zoom into a specific area of the time series (rather than zoom+scroll) Selection in subsequence detail relates to subsequence overview Unfortunately, least interesting patterns are most easily accessed (branches at root) “snap to branch” or “snap to intersection” ? Ability to turn off highlighting (undo)
Summary: Unique Contributions Fundamental support for aperiodic series Scalable Resource requirements do not grow linearly with length series Rich visual feature set Global summaries Diff-trees between multiple series Local patterns and anomalies