ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.

Slides:



Advertisements
Similar presentations
SAX: a Novel Symbolic Representation of Time Series
Advertisements

Indexing DNA Sequences Using q-Grams
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Fast Algorithms For Hierarchical Range Histogram Constructions
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
Searching on Multi-Dimensional Data
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Mining Time Series.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Multimedia DBs.
SST:an algorithm for finding near- exact sequence matches in time proportional to the logarithm of the database size Eldar Giladi Eldar Giladi Michael.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
Spatial and Temporal Data Mining
Jessica Lin, Eamonn Keogh, Stefano Loardi
Multimedia DBs. Time Series Data
Spatial Indexing I Point Access Methods.
UNC Chapel Hill M. C. Lin Overview of Last Lecture About Final Course Project –presentation, demo, write-up More geometric data structures –Binary Space.
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Importance-Driven Time-Varying Data Visualization Chaoli Wang, Hongfeng Yu, Kwan-Liu Ma University of California, Davis.
A Multiresolution Volume Rendering Framework for Large-Scale Time- Varying Data Visualization Chaoli Wang 1, Jinzhu Gao 2, Liya Li 1, Han-Wei Shen 1 1.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Time Series Data Analysis - II
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Database Management 9. course. Execution of queries.
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Mining Time Series.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Exact indexing of Dynamic Time Warping
FlowGraph: A Compound Hierarchical Graph for Flow Field Exploration Jun Ma, Chaoli Wang, Ching-Kuang Shene Michigan Technological University Presented.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
VizTree Huyen Dao and Chris Ackermann. Introducing example
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Dense-Region Based Compact Data Cube
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Spatial Data Management
Progressive Clustering of Big Data with GPU Acceleration and Visualization Jun Wang1, Eric Papenhausen1, Bing Wang1, Sungsoo Ha1, Alla Zelenyuk2, and Klaus.
Data Science Algorithms: The Basic Methods
B-Trees B-Trees.
Computing and Compressive Sensing in Wireless Sensor Networks
Spatial Indexing I Point Access Methods.
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Mean Shift Segmentation
A Time Series Representation Framework Based on Learned Patterns
Birch presented by : Bahare hajihashemi Atefeh Rahimi
Nearest Neighbors CSC 576: Data Mining.
Presentation transcript:

iTree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization Symposium 28 February 2013 Sydney, Australia

Time-activity curve (TAC) –Time-varying medical imaging data [Fang et al. 2007] –Importance analysis –Multiscale data clustering –Temporal sequencing –Trend identification What iTree can do for us? –Handle ever-growing size and complexity (efficient data compacting) –Index and query TACs adaptively (effective data indexing) –Interact with space-time data (intuitive visual exploration) TAC-based time-varying data visualization

Symbolic Aggregate ApproXimation (SAX) baabccbc word length: 8; bit cardinality: 2 First convert the time series to piecewise aggregate approximation (PAA) representation, then convert the PAA to symbols C C b b b a c c c a It takes linear time [Lin et al. 2003] SAX word can be represented by symbols (e.g., a, b, c ) or bits (e.g., 00, 01, 10 or 0 2, 1 2, 2 2 ) breakpoints Keogh’s SIGKDD 2007 tutorial slide

Handle time-varying data –Use group of voxels over time intervals by going through voxel by voxel for the 1 st time step, then the 2 nd etc. –Modify the original SAX/iSAX algorithms to Better differentiate SAX words (effectiveness) Improve computational performance (efficiency) Make iSAX amenable for visual mapping (visualization) PAA conversion –Convert a TAC T of length n to a PAA C of length w SAX for time-varying volume data (1)

Transfer function based breakpoint identification –H’: histogram after logarithm and normalization of the original histogram –H: new histogram by multiplying H’ by the opacity value SAX for time-varying volume data (2) Before After

SAX word generation –Construct an alphabet Φ and transform C into an array of symbol Ĉ to form a SAX word –Distance between two symbols –Distance between two SAX words –Distance between two SAX words is the lower bound of the Euclidean distance defined based on the PAA representation SAX for time-varying volume data (3)

D LB (Q’,S’) S Q D(Q,S ) Exact (Euclidean) distance D(Q,S) Lower bounding distance D LB (Q,S) Q’Q’ S’ Lower bounding means that for all Q and S, we have… D LB (Q’,S’)  D(Q,S) SAX lower bounding Keogh’s SIGKDD 2007 tutorial slide Raw dataApprox. resp.

SAX construction (in sec) Choose 8 to 12 word length and 16 to 32 quantization level are appropriate for quality and speed tradeoff Less than 10 minutes to construct SAX excluding I/O time

iSAX organizes SAX words hierarchically –A node represents a set of TACs with the same or similar SAX words –Split a node when the number of SAX words exceeds a certain threshold –How to split? The original iSAX chooses the symbol with the left-most smallest bit cardinality to split We choose a symbol covering the largest value range to split iSAX for time-varying volume data (1)

Comparison Original breakpoint identification and symbol splitting Our new breakpoint identification and symbol splitting

iSAX construction –Voxel IDs for each terminal node are saved into a file –Use the SAX word itself as the file name to facilitate search Out-of-core acceleration strategy –Partition all voxels or groups into at most 2 w buckets and save each non-empty bucket into a file –Choose the file with the largest voxel/group count to split if larger than a threshold δ n –Continue this until no file is larger than δ n iSAX for time-varying volume data (2)

Approximate and exact search –Both take the PAA representation and a threshold δ as input –Approximate search only compares each of the file names with the PAA converted SAX word if the distance is less than δ –Exact search needs an additional step: compute PAA-based distance to the input PAA and return those voxels that have a distance less than δ iSAX for time-varying volume data (3)

From iSAX (internal) hierarchy to iTree (external) –Number of non-empty children of the root is fairly large Solution: level promoting –iSAX has a larger number of hierarchy with small fanout (2) Solution: sibling grouping –Sibling nodes are not arranged according to their similarity Solution: sibling reordering –Resulting properties The height of the iTree is determined by the maximal bit cardinality for representing any symbol in the SAX words The iTree is balanced: no node has an excessively large fanout Neighboring sibling nodes have a higher degree of similarity in terms of spatial closeness and temporal trend iTree (1)

iTree drawing and focus+context visualization –Hyperbolic layout [Laming and Rao 1996] Accommodate a large number of nodes Allow focus+context interaction –Add the time ring to indicate the time dimension Query in multiple coordinated views (volume view, iTree view and SAX view) iTree (2)

iSAX/iTree construction (in sec) Reduce the number of nodes an order of magnitude smaller from iSAX to iTree

Brute-force/approx./exact search (in sec) Brute-force search does not use any indexing scheme but simply goes over the PAA representation of data for identifying similar voxels The time cost for approx. search does not increase much from current interval to all time steps (only involving using the names of index files for distance computation)

iTree –Data organization, visual representation and user interaction framework for time-varying data analysis and visualization –Applicable for tackling big time-varying data sets Limitations –Breakpoint identification depends on input transfer function –Blockwise TACs lead to block discontinuity in data classification Future work –Motif finding (locate previously unknown, frequently occurring patterns) –Time-varying multivariate data Acknowledgements –U.S. National Science Foundation Summary