Presentation is loading. Please wait.

Presentation is loading. Please wait.

VizTree Huyen Dao and Chris Ackermann. Introducing example 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101.

Similar presentations


Presentation on theme: "VizTree Huyen Dao and Chris Ackermann. Introducing example 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101."— Presentation transcript:

1 VizTree Huyen Dao and Chris Ackermann

2 Introducing example 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101 110101010010101010110101010010110 010111011110100011100001010000100 111010100011100001010101100101110 101 010110010111100110100100001000101 001101101011100001010101110111110 001101101101111110100110010010001 101000111100110110100010111100010 110100110110011010000001001100010 011100000111010011001011000010100 10 These are two random bit sequences. One sequence is generated by a computer and the other one by humans. Which is which

3 Introducing example 10001000101001000101010100001010100010101110111101011010010 11101001010100111010101010010100101010111010101001010101011 01010100101100101110111101000111000010100001001110101000111 00001010101100101110101 01011001011110011010010000100010100110110101110000101010111 01111100011011011011111101001100100100011010001111001101101 00010111100010110100110110011010000001001100010011100000111 01001100101100001010010 01 1 0 01 Not really random! Subjects tried to create Randomness by alternating. HUMAN

4 What does VizTree do? Analysis of time series data. Illustrates motifs, and anomalies with ‘Subsequence Trees’ 1 0 0 0 1 1 0 1 0 1 0 0 1 1 Length of subsequence = 3

5 Creating a Subsequence Tree 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 …

6 Creating a Subsequence Tree 2 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 1 0 1 …

7 Discretizing Only discrete data can be visualized. Most data is continuous and needs to be converted. Several steps to convert continuous data into tree structure 1.PAC 2.SAX

8 PAC A. Piecewise aggregate approximation (PAC) of time series: 1.Divide time series into n segments of equal length 2.Assign each a coefficient = average of values in that segment 4.809.614.419.224

9 4.809.614.419.224 B.Create an alphabet on the distribution space of time series: a.Divide range into x regions: segment has equal probability of falling into any one b.Assign symbols to regions from top-to-bottom C.Assign each segment of the PAA a symbol based on in which segment resides. SAX a b c Time series becomes a string: ‘ b c b a b ’

10 Tree of continuous data b a b a b a b a b a b a b a Instead of Boolean values, the branches of represent the symbols, - the top branch represents a - the bottom branch represents the last letter Larger alphabet means more branches window size = 3 # of symbols = 3 Alphabet size = 2

11 Sliding window length Specifies the time frame of the pattern that is being matched. 4.809.614.419.224 length = 12 length = 24 Appropriate length can be determined by using the ruler

12 # of symbols per window Specifies how many discrete windows are fit into the given time window Depends on sliding window size and frequency of value changes 4.809.614.419.224 length = 24 ‘ b c b a b ’ ‘ c a ’

13 Alphabet size Larger alphabet: –Discrete representation is more fine grained. –Tree is difficult to read. 4.809.614.419.224 a b c a b ‘ b c b a b ’ ‘ b b a a a ’

14 Parameters Length of the sliding window –For focusing on certain intervals # of symbols per window –The size of the pattern being analyzed Alphabet size –The number of discrete values.

15 Time Series Data Mining Tasks 1.Subsequence matching 2.Time series motif discovery 3.Anomaly Detection

16 Advanced settings Cull trivial matches: Consecutive strings that are the same: ‘dcb’, ‘dcb’ Consecutive strings where no pair of symbols are more than a symbol apart: ‘dcb’, ‘cba’ Chunking instead of actually sliding the window

17 Do not have to know exact pattern for query: give concise description of pattern. Selecting branch shows all subsequence matches and highlights occurrences in time series. VizTree and Data Mining Tasks Subsequence Matching

18 VizTree and Data Mining Tasks Time Series Motif Discovery Motif – “previously unknown, frequently occurring patterns” Discovery simple: frequently occurring patterns => thick branches Traditional motif discovery algorithms slow VizTree builds frequency into visualization so quickly find motifs Highlights where motifs occur Lin et al. 2005

19 Simple cases: observing very thin branches in subsequence trees. More complex cases: Diff Trees. Thick branches of vivid green or blue indicate anomalies in second time series. VizTree and Data Mining Tasks Anomaly Discovery Lin et al. 2005

20 Diff Tree Contain analysis of two time series, A and B Shows frequency of patterns in B in relation to frequency in A Two values used in creation: –Support: is a pattern overrepresented (more frequently occurring) in B or underrepresented (less frequently occurring) –Confidence: how prevalent is the pattern in A –Support => Thickness of branches –Confidence => Color intensity of branches Also: Surprisingness: ranks most anomalous patterns

21 Simple graphical representation: –Straightforward –Powerful: Can show lots of different subsequences in a simple tree structure –Simple and easy to understand description of subsequences through strings. Quick analysis –The subsequence trees and diff trees renders quickly –Since the relevant encoded in tree: can spot motifs and anomalies quickly What is great about VizTree?

22 Weaknesses It is difficult to find the right combination of parameters –An idea would be to superimpose the effect of parameters on original graph (discrete values, sliding window length etc.) Zooming is rather inconvenient –This could be solved by using another zooming technique, such as fish-eye. Usability could be improved –Would be informative to see how the alphabet is define over the dataset. –The subtree view does not indicate where in the main tree it is so can lose track –The time series scales are not adjustable so can be hard to place where subsequences are in terms of time –Nodes are hard to select


Download ppt "VizTree Huyen Dao and Chris Ackermann. Introducing example 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101."

Similar presentations


Ads by Google