FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene Michigan Technological University Presented at IEEE Pacific Visualization Symposium March 5, 014 Yokohama, Japan
FlowString interface Query result Streamline set Query string Alphabet and vocabulary Parameters Textual Visual Streamline query
First look of FlowString (crayfish)
Streamline similarity measures Proximity-based measures –Leverage spatial proximity between integral curves Feature-based measures –Extract geometrical, topological or domain specific features for similarity analysis Distribution-based measures –Capture feature distributions for more robust similarity comparison Transformation-based measures –Map data properties or features into a transformed space for similarity measuring
Our solution Shape-based measure –Extract features that are invariant under translation, rotation and scaling –Support flexible partial streamline matching Approach –Advocate a vocabulary approach –Construct character-level alphabet and word-level vocabulary –Design intuitive and convenient user interface and interaction
Terms (1/2) Character (low-level shape descriptor) –Unique local shape primitive extracted from streamlines Alphabet –A set of characters describing various local shapes Word (high-level shape descriptor) –A sequence of characters encoding a streamline shape pattern Vocabulary –A set of words describing various regional shapes
Terms (2/2) String –Mapping of a global streamline to a sequence of characters Substring –Encoding a portion of the corresponding streamline
Notations Character –a (same order) –a’ (reversed order) –A (both orders) Multiple characters with common features ( | ) –(a 1 | a 2 | … a m ) Word concatenation ( | and & ) –[abc]|[bbc] (segments that match either abc or bbc ) –[abc]&[bbc] (segments that match both abc and bbc with some distance apart) Other symbols –a + (single character repetition) –? and * (wildcard symbols)
Outline of FlowString approach Alphabet generation –Streamline resampling –Dissimilarity measure –Affinity propagation clustering String operation –Streamline suffix tree –Vocabulary construction –Exact vs. approximate search
Streamline resampling (1/2) Goal: the number of sample points is similar to the local features with the same shape but different scales Criteria: –A streamline segment between two sample points should be simple enough (no feature is ignored) –The density of sample points should be related to the local feature size Solution: maintain a constant accumulative curvature between two neighboring sample points along the streamline
Streamline resampling (2/2) Neighborhood size r = 7
Character concatenation (a): characters assigned to all sample points, which produces a deterministic shape (b) and (c): characters assigned to every r-1 sample points, which produces different shapes
Dissimilarity measure Dissimilarity between the local shapes of two sample points ( P a and P b ) –Use Procrustes distance which minimizes a measure of shape difference –Ignore geometric positions and orientations –Require a registration (Procrustes superimposition) before distance calculation
Affinity propagation clustering Apply affinity propagation for clustering –Simultaneously consider all data points as potential exemplars –Automatically determine the best number of clusters Perform two-level clustering to generate characters
Character generation (1/3) Second-level clustering result
Character generation (2/3) First-level clustering result
Character generation (3/3) Original shape primitives
Streamline suffix tree Convert each streamline to a string using the alphabet Construct a suffix tree to enable efficient operations on these strings –Linear time and space cost to construct the tree –Transform the problem of searching for a string to searching for a node in the tree –O(m+z) searching time, where m is the length of the string and z is its number of appearance
Vocabulary construction Automatically identify meaningful words to construct the vocabulary –Select the most common patterns from the streamlines (i.e., detect the most frequently appeared substrings) –Achieve through a simple depth-first search traversal of the streamline suffix tree –O(n) time, where n is the total length of the original strings (i.e., the number of nodes is linear to n )
Exact vs. approximate search (1/2) The need for approximate search –Similarities among the shapes represented by different characters are different –Different numbers of repetition of a certain shape often seem to be similar K -approximate search using dynamic programming where k is a threshold used in the edit distance Extend to handle single character repetition ( + ) and multiple characters with common features ( | )
Exact matching EE Exact matching FF Exact matching (E|F)(E|F) Approx. matching ( k =15 ) (E|F)(E|F) E : spiral with large torsion F: spiral with small torsion Exact vs. approximate search (2/2)
Parameter settings
Timing performance
Solar plume
Tornado
Two swirls
FlowString –Robust partial streamline matching using shape invariant features –Characters / alphabets and words / vocabulary metaphors –Intuitive user interface and interaction support Future work –Conduct domain expert evaluation –Extend FlowString to handle multiple data sets –Release FlowString to benefit the community Acknowledgements –U.S. National Science Foundation Summary
Thank you!