VizTree Huyen Dao and Chris Ackermann. Introducing example 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101.

Slides:



Advertisements
Similar presentations
SAX: a Novel Symbolic Representation of Time Series
Advertisements

Indexing DNA Sequences Using q-Grams
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Compression CS 147 Minh Nguyen.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Chapter 7 Statistical Data Treatment and Evaluation
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
School of Computing Science Simon Fraser University
08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Spatial and Temporal Data Mining
Ensemble Learning: An Introduction
Jessica Lin, Eamonn Keogh, Stefano Loardi
Computer Science 335 Data Compression.
Metamorphic Malware Research
8-1 Quality Improvement and Statistics Definitions of Quality Quality means fitness for use - quality of design - quality of conformance Quality is.
Algorithms for Regulatory Motif Discovery Xiaohui Xie University of California, Irvine.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
1. Statistics 2. Frequency Table 3. Graphical Representations  Bar Chart, Pie Chart, and Histogram 4. Median and Quartiles 5. Box Plots 6. Interquartile.
Time Series Data Analysis - II
Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.
Modeling and representation 1 – comparative review and polygon mesh models 2.1 Introduction 2.2 Polygonal representation of three-dimensional objects 2.3.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
ETM 607 – Random Number and Random Variates
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
(2.1) Fundamentals  Terms for magnitudes – logarithms and logarithmic graphs  Digital representations – Binary numbers – Text – Analog information 
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Statistics in psychology Describing and analyzing the data.
Genetic Algorithm.
1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Curves. First of all… You may ask yourselves “What did those papers have to do with computer graphics?” –Valid question Answer: I thought they were cool,
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Essential Statistics Chapter 91 Introducing Probability.
CHAPTER 10 Introducing Probability BPS - 5TH ED.CHAPTER 10 1.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
CSC321: Neural Networks Lecture 16: Hidden Markov Models
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
EE459 I ntroduction to Artificial I ntelligence Genetic Algorithms Practical Issues: Representations.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Writing for Computer science ——Chapter 6 Graphs, figures, and tables Tao Yang
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands Joseph Schaeffer, Caltech (slides by John Reif)
3.3 Fundamentals of data representation
Textbook does not really deal with compression.
Data Compression.
Supervised Time Series Pattern Discovery through Local Importance
Visually Mining and Monitoring Massive Time Series
CS 430: Information Discovery
Time Relaxed Spatiotemporal Trajectory Joins
Jessica Lin Eamonn Keogh Stefano Lonardi
Presentation transcript:

VizTree Huyen Dao and Chris Ackermann

Introducing example These are two random bit sequences. One sequence is generated by a computer and the other one by humans. Which is which

Introducing example Not really random! Subjects tried to create Randomness by alternating. HUMAN

What does VizTree do? Analysis of time series data. Illustrates motifs, and anomalies with ‘Subsequence Trees’ Length of subsequence = 3

Creating a Subsequence Tree …

Creating a Subsequence Tree …

Discretizing Only discrete data can be visualized. Most data is continuous and needs to be converted. Several steps to convert continuous data into tree structure 1.PAC 2.SAX

PAC A. Piecewise aggregate approximation (PAC) of time series: 1.Divide time series into n segments of equal length 2.Assign each a coefficient = average of values in that segment

B.Create an alphabet on the distribution space of time series: a.Divide range into x regions: segment has equal probability of falling into any one b.Assign symbols to regions from top-to-bottom C.Assign each segment of the PAA a symbol based on in which segment resides. SAX a b c Time series becomes a string: ‘ b c b a b ’

Tree of continuous data b a b a b a b a b a b a b a Instead of Boolean values, the branches of represent the symbols, - the top branch represents a - the bottom branch represents the last letter Larger alphabet means more branches window size = 3 # of symbols = 3 Alphabet size = 2

Sliding window length Specifies the time frame of the pattern that is being matched length = 12 length = 24 Appropriate length can be determined by using the ruler

# of symbols per window Specifies how many discrete windows are fit into the given time window Depends on sliding window size and frequency of value changes length = 24 ‘ b c b a b ’ ‘ c a ’

Alphabet size Larger alphabet: –Discrete representation is more fine grained. –Tree is difficult to read a b c a b ‘ b c b a b ’ ‘ b b a a a ’

Parameters Length of the sliding window –For focusing on certain intervals # of symbols per window –The size of the pattern being analyzed Alphabet size –The number of discrete values.

Time Series Data Mining Tasks 1.Subsequence matching 2.Time series motif discovery 3.Anomaly Detection

Advanced settings Cull trivial matches: Consecutive strings that are the same: ‘dcb’, ‘dcb’ Consecutive strings where no pair of symbols are more than a symbol apart: ‘dcb’, ‘cba’ Chunking instead of actually sliding the window

Do not have to know exact pattern for query: give concise description of pattern. Selecting branch shows all subsequence matches and highlights occurrences in time series. VizTree and Data Mining Tasks Subsequence Matching

VizTree and Data Mining Tasks Time Series Motif Discovery Motif – “previously unknown, frequently occurring patterns” Discovery simple: frequently occurring patterns => thick branches Traditional motif discovery algorithms slow VizTree builds frequency into visualization so quickly find motifs Highlights where motifs occur Lin et al. 2005

Simple cases: observing very thin branches in subsequence trees. More complex cases: Diff Trees. Thick branches of vivid green or blue indicate anomalies in second time series. VizTree and Data Mining Tasks Anomaly Discovery Lin et al. 2005

Diff Tree Contain analysis of two time series, A and B Shows frequency of patterns in B in relation to frequency in A Two values used in creation: –Support: is a pattern overrepresented (more frequently occurring) in B or underrepresented (less frequently occurring) –Confidence: how prevalent is the pattern in A –Support => Thickness of branches –Confidence => Color intensity of branches Also: Surprisingness: ranks most anomalous patterns

Simple graphical representation: –Straightforward –Powerful: Can show lots of different subsequences in a simple tree structure –Simple and easy to understand description of subsequences through strings. Quick analysis –The subsequence trees and diff trees renders quickly –Since the relevant encoded in tree: can spot motifs and anomalies quickly What is great about VizTree?

Weaknesses It is difficult to find the right combination of parameters –An idea would be to superimpose the effect of parameters on original graph (discrete values, sliding window length etc.) Zooming is rather inconvenient –This could be solved by using another zooming technique, such as fish-eye. Usability could be improved –Would be informative to see how the alphabet is define over the dataset. –The subtree view does not indicate where in the main tree it is so can lose track –The time series scales are not adjustable so can be hard to place where subsequences are in terms of time –Nodes are hard to select