Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

SAX: a Novel Symbolic Representation of Time Series
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
Mining Time Series.
Accelerometer-based Transportation Mode Detection on Smartphones
Time Series visualizations Information Visualization – CPSC 533c Lior Berry March 10 th 2004.
Interactive Pattern Search in Time Series (Using TimeSearcher 2) Paolo Buono, Aleks Aris, Catherine Plaisant, Amir Khella, and Ben Shneiderman Proceedings,
08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 SIMS 247: Information Visualization and Presentation Marti Hearst Oct 24, 2005.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Jessica Lin, Eamonn Keogh, Stefano Loardi
Video Mining Learning Patterns of Behaviour via an Intelligent Image Analysis System.
KDD for Science Data Analysis Issues and Examples.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.
Transformation of an Uncertain Video Search Pipeline to a Sketch-Based Visual Analytics Loop Philip A. Legg 1,2, David H.S. Chung 2, Matthew L. Parry 2,
Modelling and Simulation 2008 A brief introduction to self-similar fractals.
Time Series Data Analysis - II
Time Series Motifs Statistical Significance
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Automatic methods for functional annotation of sequences Petri Törönen.
Multimedia and Time-series Data
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
June 6, 2014 IAT Interaction ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS +
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Data Mining By Dave Maung.
Mining Time Series.
Advanced Scientific Visualization
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
V Material obtained from summer workshop in Guildford County, July-2014.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
A compact vector representation for volumetric objects with complex internal structures spanning a wide range of scales.
Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
Cartography Developing a Spatial Perspective. Developing spatial awareness F Two interconnected concepts of objects and measurements. F Use objects to.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
1 ITM 734 Introduction to Human Factors in Information Systems Cindy Corritore Information Visualization.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
VizTree Huyen Dao and Chris Ackermann. Introducing example
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
A Time Series Representation Framework Based on Learned Patterns
CLASSIFICATION OF ECG SIGNAL USING WAVELET ANALYSIS
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Database management system Data analytics system:
OptiView™ XG Network Analysis Tablet
MIS2502: Data Analytics Advanced Analytics - Introduction
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Advanced Scientific Visualization
Supervised Time Series Pattern Discovery through Local Importance
Visually Mining and Monitoring Massive Time Series
Qualitative Research Quantitative Research.
A Time Series Representation Framework Based on Learned Patterns
CSc4730/6730 Scientific Visualization
Data Warehousing and Data Mining
Discrete Event Simulation - 4
Sequential Data Cleaning: A Statistical Approach
SEG5010 Presentation Zhou Lanjun.
CHAPTER 7: Information Visualization
Volume 5, Issue 4, Pages e4 (October 2017)
Jessica Lin Eamonn Keogh Stefano Lonardi
CSE591: Data Mining by H. Liu
Presentation transcript:

Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn Keogh Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M. In Proceedings of the 10 th ACM SIGKDD International Converence on Knowledge Discovery and Data Mining, 2004.

What are Time Series?  Simply: Observations of a variable made over time  Typical across a wide variety of domains Medicine Physiology Finance Microbiology Meteorology Surveillance

3 Motivation: Critical Decision Making  Domains Spacecraft Launch Medicine  Research Directions Mining Archives  Extract rules, patterns, regularities Visualizing Streams  Novel visualization and interaction for: Query by content Motif discovery Anomaly detection

Some Visual Time Series Systems  Time Searcher Direct Manipulation Pattern Query  Theme Rivers Theme strength over time  Spirals Periodic Data with known period dot.com stocks Havre, Hetzler, Whitney & Nowell InfoVis 2000 Hochheiser and Shniederman Weber et. al

VizTree  Construct a subsequence tree to span the space of subsequences of a given time series.  Use this to collect statistics about the series.  Size of the structure is independent of the length of the series.

VizTree Approach - Overview  Place windows along the time series to obtain subsequences.  Quantize along time and value dimension to obtain sequences of discrete symbols.  Construct a subsequence tree to represent all possible such sequences.  Collect frequencies of traversal of the branches of the subsequence tree.  Use these for motif and anomaly detection, and for comparing time series.

Subsequences Place windows along the time series to obtain subsequences.

Discretization  Subsequences are patterns.  Take windows along time series – length of window ~ length of subsequence.  Discretize the range of data - one symbol for each quantum.  Divide window into segments ~ represent one segment with one symbol.

Symbolic Aggregate approXimation (SAX) One subsequence Quantization levels Segments Representative symbols Discrete version = acdcbdba

Subsequence Tree - example a b a c b b a c c b a c  symbols={a,b,c}  #segments per window=2  Tree spans the space of subsequences. #Branch factor ~ # symbols (size of alphabet) Depth ~ # segments per window Branch thickness ~ freq. of occurrence of subsequence.

VisTree Tool Demo

Query by Content: Subsequence Matching  Finding known patterns  Chunking Breaking a time series into individual series Methods  Time (e.g. power usage)  Shape(e.g. heart beats)  Search Approaches  Exact - Slow  Approximate - Fast Exploration Hypothesis Testing VizTree VizTree

Motif Discovery  Finding unknown patterns  Not exact matches  VisTree allows exploration at varying levels of precision E.g., cc** vs. ccac

Anomaly Detection  Finding abnormal patterns.  Use data already seen to identify anomalies  Identified by thin branches

Comparing Series: Diff Tree  Same parameters  same tree structure  Compare the test branch frequencies with respect to reference branch frequencies Blue = underrepresented Green = overrepresented Red = equivalent Thickness = magnitude

Thoughts on VizTree (Vis.)  Most of “discovery” is implicit Manual search Parameter setting might be an issue Automation might help  Tree Visualization Use of real estate? Effective? Intuitive? Alternatives?

Thoughts on VizTree (HCI)  Primarily a tool to for researchers now (Also, we might have an outdated version)  Even so, some HCI suggestions: Indication of how tree detail relates to tree overview Zoom into a specific area of the time series (rather than zoom+scroll) Selection in subsequence detail relates to subsequence overview Unfortunately, least interesting patterns are most easily accessed (branches at root)  “snap to branch” or “snap to intersection” ? Ability to turn off highlighting (undo)

Summary: Unique Contributions  Fundamental support for aperiodic series  Scalable Resource requirements do not grow linearly with length series  Rich visual feature set Global summaries Diff-trees between multiple series Local patterns and anomalies