On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Janine Bennett 1 William McLendon III 1 Guarav Bansal 2 Peer-Timo Bremer 3 Jacqueline Chen 1 Hemanth Kolla 1 1 Sandia National Laboratories, 2 Intel, 3 Lawrence Livermore National Laboratory Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Approved for Unlimited Unclassified Release, SAND # C
HPC resources generate large, complex, multivariate data sets Details: Lifted Ethylene Jet – 1.3 billion grid points – 22 chemical species, vector, & particle data – 7.5 million cpu hours on 30,000 processors – 112,500 time steps (data stored every 375 th ) – 240 TB of raw field data + 50 TB particle data Recent data sets generated by S3D, developed at the Combustion Research Facility, Sandia National Laboratories Efficiently characterizing & tracking intermittent features defined by multiple variables poses significant research challenges!
Our contribution: a framework for characterizing complex events in large-scale multivariate data Introduce attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features – Defined by multiple variables – Spanning an arbitrary number of time steps – Representation achieves drastic data reductions Provide a mechanism for querying ARGs – Identify events conditioned on a variety of metrics Demonstrate results on large-scale combustion simulation data
Related work Topology: Segment domain into features according to function behavior Level-set behavior: Reeb graph, contour tree, and variants [Carr et al. 2003, Pascucci et al. 2007, Mascarenhas et al 2006, van Krevald et al 2004] Gradient behavior: Morse and Morse-Smale Complex [Edelsbrunner 2003, Gyulassy et al 2007, 2008, Gunther et al 2011] Multivariate feature analysis : Many correlation-based feature definitions [Gosink et al 2007, Chen et al 2011, Jaenicke et al 2007, Sauber et al 2006, Schneider et al 2008, Bennett et al 2011] Feature tracking graphs: Capture spatial-temporal relationships [Edelsbrunner et al 2004, Bremer et al 2010, Muelder et al 2009, Widanagamaachchi et al 2012] Graph search algorithms: Identify patterns in large-scale graphs [Barret et al 2007, Berry et al 2007, Gregor et al 2005, Siek et al 2002]
What is an attributed relational graph (ARG)? ARG nodes correspond to spatial features – Each ARG node encodes Feature type Time step Optional per feature statistics ARG edges encode relationship between features – Spatial overlap metric – Supports feature tracking over time
ARG Nodes: Segment domain into relevant features Many options for segmenting the domain into features Often features of interest are defined by a threshold around minima or maxima of a particular variable x y f
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest
ARG Nodes: Refine the tree to increase granularity of possible segmentations x y f
ARG Nodes: Features are defined as all sub-trees above a user-specified threshold x y f x y f
x y f ARG Edges: An overlap-based metric is used to encode feature behavior over time t = 1 t = 2 t = 3 t = 4
t = 1 t = 2 t = 3 t = 4 ARG Edges: The same metric is used to encode relationships between different types of features
t = 1 t = 2 t = 3 t = 4 ARG Edges: Relationships can span multiple time steps
ARG Edges: Edge labels indicate degree of overlap between associated features 2511
multi-way co-occurrence Once the ARG is constructed, we can search for patterns of interest co-occurrence time-lag features
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching MTGL: Multi-Threaded Graph Library – Open source software – Given ARG and template – Filter: Remove all edges in ARG that cannot belong to template – Match: Find all possible template matches in filtered ARG
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template patternTemplate walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk
Case study: identification of deflagration fronts in HCCI combustion data Turbulent auto-ignitive mixture of Di-Methyl Ether under homogeneous charge compression ignition (HCCI) conditions Deflagration fronts: spatially collocated extrema of chemical reaction rates and diffusive fluxes Reaction rate of OHDiffusion of OH
Feature family Structure geometries Hierarchy & statistics temperature 4.0 GB319 MB diffusion OH 3.6 GB11 MB reaction rate OH 4.3 GB534 MB Raw output data size: 78.2 GB (grid size = 560 x 560 x 560) – 703 MB/variable * 6 variables for 19 time steps Meta-data: computed in parallel on ORNL’s Lens system – 3 feature families: Each encoding size, minimum, maximum, mean, and variance of 6 different variables Data dependent costs O(minutes) per time step – Structure geometries only needed for ARG construction (not queries) – Size of ARG: 504 KB Under 1GB required for fully flexible exploration and search on commodity hardware – O(seconds) for searches Case study: ARG representation encodes complex relationships very compactly
template connected components 1462 nodes edges Case study: Searching the ARG A subset of the deflagration fronts identified A subset of the full ARG (full size is 6563 nodes and 8903 edges)
Conclusion & future work Introduced attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features Provided a mechanism for querying ARGs Demonstrated results on large-scale combustion simulation data Some domain knowledge required to construct ARG – Which variables define features of interest – Range of potential time-lags between features Opportunities for future work – GUI tool for specifying search template patterns Leveraging per-feature statistics in queries – Linked views of ARG, search results, domain visualization – Dynamic ARGs Don’t require feature thresholds to be specified in advance Instead these are runtime parameters to be explored
Questions? Janine Bennett Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.