On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Janine Bennett 1 William McLendon III 1 Guarav Bansal.

Slides:



Advertisements
Similar presentations
Sauber et al.: Multifield-Graphs Multifield-Graphs: An Approach to Visualizing Correlations in Multifield Scalar Data Natascha Sauber, Holger Theisel,
Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Volume Graphics (lecture 5 : Contour Tree / Contour Spectrum) lecture notes acknowledgement : J. Snoeyink, C. Bajaj Bong-Soo Sohn School of Computer Science.
Ensemble Emulation Feb. 28 – Mar. 4, 2011 Keith Dalbey, PhD Sandia National Labs, Dept 1441 Optimization & Uncertainty Quantification Abani K. Patra, PhD.
Molecular Simulations of Metal-Organic Frameworks
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Topology-Based Analysis of Time-Varying Data Scalar data is often used in scientific data to represent the distribution of a particular value of interest,
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
DESIGN OF A GENERIC PATH PATH PLANNING SYSTEM AILAB Path Planning Workgroup.
Contour Tree and Small Seed Sets for Isosurface Traversal Marc van Kreveld Rene van Oostrum Chandrajit Bajaj Valerio Pascucci Daniel R. Schikore.
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
January 5, Feature Tracking in VR for Cumulus Cloud Life-Cycle Studies E. J. Griffith, F. H. Post, M. Koutek, T. Heus and H. J. J. Jonker 11 th.
An Efficient Parallel Approach for Identifying Protein Families from Large-scale Metagenomics Data Changjun Wu, Ananth Kalyanaraman School of Electrical.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Data Analysis and Visualization Using the Morse-Smale complex
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
1. 2 General problem Retrieval of time-series similar to a given pattern.
Towards Topology-Rich Visualization Attila Gyulassy SCI Institute, University of Utah.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
Fast Isocontouring For Improved Interactivity Chandrajit L. Bajaj Valerio Pascucci Daniel R. Schikore.
DIDS part II The Return of dIDS 2/12 CIS GrIDS Graph based intrusion detection system for large networks. Analyzes network activity on networks.
A Multiresolution Volume Rendering Framework for Large-Scale Time- Varying Data Visualization Chaoli Wang 1, Jinzhu Gao 2, Liya Li 1, Han-Wei Shen 1 1.
Data Structures and Image Segmentation Luc Brun L.E.R.I., Reims University, France and Walter Kropatsch Vienna Univ. of Technology, Austria.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
Lei Zhang and Guoning Chen, Department of Computer Science, University of Houston Robert S. Laramee, Swansea University David Thompson and Adrian Sescu,
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Data Intensive Computing at Sandia September 15, 2010 Andy Wilson Senior Member of Technical Staff Data Analysis and Visualization Sandia National Laboratories.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Random-Accessible Compressed Triangle Meshes Sung-eui Yoon Korea Advanced Institute of Sci. and Tech. (KAIST) Peter Lindstrom Lawrence Livermore National.
Roee Litman, Alexander Bronstein, Michael Bronstein
LAMMPS Users’ Workshop
Lei Zhang and Guoning Chen, Department of Computer Science, University of Houston Robert S. Laramee, Swansea University David Thompson and Adrian Sescu,
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Visualization Four groups Design pattern for information visualization
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Large Scale Time-Varying Data Visualization Han-Wei Shen Department of Computer and Information Science The Ohio State University.
Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by.
Riccardo Fellegara University of Genova Genova, Italy
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
High Fidelity Numerical Simulations of Turbulent Combustion
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Tracking Features in Embedded Surfaces: Understanding Extinction in Turbulent Combustion Wathsala Widanagamaachchi in collaboration with Pavol Klacansky,
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
4.6.1 Upper Echelons of Surfaces
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.
Mean Shift Segmentation
Ray-Cast Rendering in VTK-m
On Efficient Graph Substructure Selection
Scale-Space Representation of 3D Models and Topological Matching
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Peng Jiang, Linchuan Chen, and Gagan Agrawal
Presentation transcript:

On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Janine Bennett 1 William McLendon III 1 Guarav Bansal 2 Peer-Timo Bremer 3 Jacqueline Chen 1 Hemanth Kolla 1 1 Sandia National Laboratories, 2 Intel, 3 Lawrence Livermore National Laboratory Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL Approved for Unlimited Unclassified Release, SAND # C

HPC resources generate large, complex, multivariate data sets Details: Lifted Ethylene Jet – 1.3 billion grid points – 22 chemical species, vector, & particle data – 7.5 million cpu hours on 30,000 processors – 112,500 time steps (data stored every 375 th ) – 240 TB of raw field data + 50 TB particle data Recent data sets generated by S3D, developed at the Combustion Research Facility, Sandia National Laboratories Efficiently characterizing & tracking intermittent features defined by multiple variables poses significant research challenges!

Our contribution: a framework for characterizing complex events in large-scale multivariate data Introduce attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features – Defined by multiple variables – Spanning an arbitrary number of time steps – Representation achieves drastic data reductions Provide a mechanism for querying ARGs – Identify events conditioned on a variety of metrics Demonstrate results on large-scale combustion simulation data

Related work Topology: Segment domain into features according to function behavior Level-set behavior: Reeb graph, contour tree, and variants [Carr et al. 2003, Pascucci et al. 2007, Mascarenhas et al 2006, van Krevald et al 2004] Gradient behavior: Morse and Morse-Smale Complex [Edelsbrunner 2003, Gyulassy et al 2007, 2008, Gunther et al 2011] Multivariate feature analysis : Many correlation-based feature definitions [Gosink et al 2007, Chen et al 2011, Jaenicke et al 2007, Sauber et al 2006, Schneider et al 2008, Bennett et al 2011] Feature tracking graphs: Capture spatial-temporal relationships [Edelsbrunner et al 2004, Bremer et al 2010, Muelder et al 2009, Widanagamaachchi et al 2012] Graph search algorithms: Identify patterns in large-scale graphs [Barret et al 2007, Berry et al 2007, Gregor et al 2005, Siek et al 2002]

What is an attributed relational graph (ARG)? ARG nodes correspond to spatial features – Each ARG node encodes Feature type Time step Optional per feature statistics ARG edges encode relationship between features – Spatial overlap metric – Supports feature tracking over time

ARG Nodes: Segment domain into relevant features Many options for segmenting the domain into features Often features of interest are defined by a threshold around minima or maxima of a particular variable x y f

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Merge trees encode features of interest defined by a single variable for a range of thresholds x y f Tree encodes behavior as sweep of function values is performed from maximum to minimum of range of interest

ARG Nodes: Refine the tree to increase granularity of possible segmentations x y f

ARG Nodes: Features are defined as all sub-trees above a user-specified threshold x y f x y f

x y f ARG Edges: An overlap-based metric is used to encode feature behavior over time t = 1 t = 2 t = 3 t = 4

t = 1 t = 2 t = 3 t = 4 ARG Edges: The same metric is used to encode relationships between different types of features

t = 1 t = 2 t = 3 t = 4 ARG Edges: Relationships can span multiple time steps

ARG Edges: Edge labels indicate degree of overlap between associated features 2511

multi-way co-occurrence Once the ARG is constructed, we can search for patterns of interest co-occurrence time-lag features

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching MTGL: Multi-Threaded Graph Library – Open source software – Given ARG and template – Filter: Remove all edges in ARG that cannot belong to template – Match: Find all possible template matches in filtered ARG

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template patternTemplate walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Searches are performed using a two-phase subgraph isomorphism heuristic: filtering & matching ARG Template walk

Case study: identification of deflagration fronts in HCCI combustion data Turbulent auto-ignitive mixture of Di-Methyl Ether under homogeneous charge compression ignition (HCCI) conditions Deflagration fronts: spatially collocated extrema of chemical reaction rates and diffusive fluxes Reaction rate of OHDiffusion of OH

Feature family Structure geometries Hierarchy & statistics temperature 4.0 GB319 MB diffusion OH 3.6 GB11 MB reaction rate OH 4.3 GB534 MB Raw output data size: 78.2 GB (grid size = 560 x 560 x 560) – 703 MB/variable * 6 variables for 19 time steps Meta-data: computed in parallel on ORNL’s Lens system – 3 feature families: Each encoding size, minimum, maximum, mean, and variance of 6 different variables Data dependent costs O(minutes) per time step – Structure geometries only needed for ARG construction (not queries) – Size of ARG: 504 KB Under 1GB required for fully flexible exploration and search on commodity hardware – O(seconds) for searches Case study: ARG representation encodes complex relationships very compactly

template connected components 1462 nodes edges Case study: Searching the ARG A subset of the deflagration fronts identified A subset of the full ARG (full size is 6563 nodes and 8903 edges)

Conclusion & future work Introduced attributed relational graphs (ARGs) as an efficient encoding scheme for relationships between spatial features Provided a mechanism for querying ARGs Demonstrated results on large-scale combustion simulation data Some domain knowledge required to construct ARG – Which variables define features of interest – Range of potential time-lags between features Opportunities for future work – GUI tool for specifying search template patterns Leveraging per-feature statistics in queries – Linked views of ARG, search results, domain visualization – Dynamic ARGs Don’t require feature thresholds to be specified in advance Instead these are runtime parameters to be explored

Questions? Janine Bennett Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.