VizTree Huyen Dao and Chris Ackermann
Introducing example These are two random bit sequences. One sequence is generated by a computer and the other one by humans. Which is which
Introducing example Not really random! Subjects tried to create Randomness by alternating. HUMAN
What does VizTree do? Analysis of time series data. Illustrates motifs, and anomalies with ‘Subsequence Trees’ Length of subsequence = 3
Creating a Subsequence Tree …
Creating a Subsequence Tree …
Discretizing Only discrete data can be visualized. Most data is continuous and needs to be converted. Several steps to convert continuous data into tree structure 1.PAC 2.SAX
PAC A. Piecewise aggregate approximation (PAC) of time series: 1.Divide time series into n segments of equal length 2.Assign each a coefficient = average of values in that segment
B.Create an alphabet on the distribution space of time series: a.Divide range into x regions: segment has equal probability of falling into any one b.Assign symbols to regions from top-to-bottom C.Assign each segment of the PAA a symbol based on in which segment resides. SAX a b c Time series becomes a string: ‘ b c b a b ’
Tree of continuous data b a b a b a b a b a b a b a Instead of Boolean values, the branches of represent the symbols, - the top branch represents a - the bottom branch represents the last letter Larger alphabet means more branches window size = 3 # of symbols = 3 Alphabet size = 2
Sliding window length Specifies the time frame of the pattern that is being matched length = 12 length = 24 Appropriate length can be determined by using the ruler
# of symbols per window Specifies how many discrete windows are fit into the given time window Depends on sliding window size and frequency of value changes length = 24 ‘ b c b a b ’ ‘ c a ’
Alphabet size Larger alphabet: –Discrete representation is more fine grained. –Tree is difficult to read a b c a b ‘ b c b a b ’ ‘ b b a a a ’
Parameters Length of the sliding window –For focusing on certain intervals # of symbols per window –The size of the pattern being analyzed Alphabet size –The number of discrete values.
Time Series Data Mining Tasks 1.Subsequence matching 2.Time series motif discovery 3.Anomaly Detection
Advanced settings Cull trivial matches: Consecutive strings that are the same: ‘dcb’, ‘dcb’ Consecutive strings where no pair of symbols are more than a symbol apart: ‘dcb’, ‘cba’ Chunking instead of actually sliding the window
Do not have to know exact pattern for query: give concise description of pattern. Selecting branch shows all subsequence matches and highlights occurrences in time series. VizTree and Data Mining Tasks Subsequence Matching
VizTree and Data Mining Tasks Time Series Motif Discovery Motif – “previously unknown, frequently occurring patterns” Discovery simple: frequently occurring patterns => thick branches Traditional motif discovery algorithms slow VizTree builds frequency into visualization so quickly find motifs Highlights where motifs occur Lin et al. 2005
Simple cases: observing very thin branches in subsequence trees. More complex cases: Diff Trees. Thick branches of vivid green or blue indicate anomalies in second time series. VizTree and Data Mining Tasks Anomaly Discovery Lin et al. 2005
Diff Tree Contain analysis of two time series, A and B Shows frequency of patterns in B in relation to frequency in A Two values used in creation: –Support: is a pattern overrepresented (more frequently occurring) in B or underrepresented (less frequently occurring) –Confidence: how prevalent is the pattern in A –Support => Thickness of branches –Confidence => Color intensity of branches Also: Surprisingness: ranks most anomalous patterns
Simple graphical representation: –Straightforward –Powerful: Can show lots of different subsequences in a simple tree structure –Simple and easy to understand description of subsequences through strings. Quick analysis –The subsequence trees and diff trees renders quickly –Since the relevant encoded in tree: can spot motifs and anomalies quickly What is great about VizTree?
Weaknesses It is difficult to find the right combination of parameters –An idea would be to superimpose the effect of parameters on original graph (discrete values, sliding window length etc.) Zooming is rather inconvenient –This could be solved by using another zooming technique, such as fish-eye. Usability could be improved –Would be informative to see how the alphabet is define over the dataset. –The subtree view does not indicate where in the main tree it is so can lose track –The time series scales are not adjustable so can be hard to place where subsequences are in terms of time –Nodes are hard to select