XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem ASCR CS PI Meeting Kenneth Moreland Sandia National Laboratories Berk Geveci Kitware, Inc. David Pugmire Oak Ridge National Laboratory David Rogers Los Alamos National Laboratory Kwan-Liu Ma University of California at Davis Hank Childs University of Oregon Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2017-2554 PE
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem IMD Novel Ideas Enable visualization at extreme scale by addressing four interlocking challenges Leverage Emerging Processor Technology with the VTK-m toolkit for many-core visualization Improve In Situ Integration with lighter weight toolkits and better post hoc interaction Understand the Usability of new techniques with qualitative user studies Use Proxy Analysis to understand visualization behavior at extreme scale alone and coupled Impact and Champions Milestones/Dates/Status The XVis project provides visualization research necessary for extreme-scale science in the DOE. Our studies on algorithms for emerging processor technology will update existing software like ParaView and VisIt to future architectures. The associated proxy analysis will guide our software on the most effective use of each platform and enable large-scale studies across numerous integrated software tools. Our in situ integration will make our software more accessible to simulation and the post hoc analysis more complete, as verified through user studies. Scheduled Actual 1.c Hybrid Parallel FY16Q4 FY17Q1 1.d Additional Algorithms FY17Q4 ongoing 1.e Function Characterization FY17Q4 ongoing 2.c Flyweight In Situ FY16Q4 ongoing 2.d Data Model Application FY16Q3 FY16Q3 2.e Memory Hierarchy Streaming FY16Q4 FY16Q4 2.f Interface for Post Hoc Interaction FY16Q4 ongoing 3.c/3.d Start/Continue Usability Studies FY17Q4 ongoing 3.e Apply Usability Studies FY17Q4 ongoing 4.a/4.b Mini-App Impl/Characterization FY16Q4 ongoing 4.c Architectural Studies FY17Q4 ongoing XVis provided the funding to test the concepts and build the initial VTK-m framework This software is the DOE solution for scientific visualization on multi- and many-core machines Techniques for in situ integration are leveraged by SENSEI and ALPINE projects XVis members working with application scientists for combustion and fusion for better in situ usability Principal Investigator(s): Kenneth Moreland (SNL), Berk Geveci (Kitware), David Pugmire (ORNL), David Rogers (LANL), Kwan-Liu Ma (UC Davis), Hank Childs (U Oregon) Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2017-2554 PE March, 2017
In-situ Forms of Tools and 1st Vis of 1 Trillion Cell Dataset Scientific Visualization and Analysis Ecosystem for Large Data (VTK, ParaView*, VisIt*, Cinema, SENSEI, and VTK-m*) Problem DOE simulation scientists generate petabytes of data that they need to understand Solution Developed general-purpose, scientific visualization and analysis libraries and tools designed with parallelism in mind to operate on world’s largest data sets collaborative open-source development model engaging multiple National Laboratories, universities, and industry Impact Leading visualization and analysis solutions for Department of Energy scientists Used on all ASCR supercomputing facilities, and worldwide inmost HPC facilities Over 1 million downloads of software worldwide Creates capability for DOE to look at data from the world’s largest simulations Steve Langer (LLNL) – “We rely on this software for gaining an understanding of complex and spatial variations in backscattered light simulations using pF3D, where data may consist of over 400 billion zones per time step.” * ECP funded Core collapse supernova from GeneASIS. Novel visualization of ocean flow data, designed to show ‘mixing barrier’ (left) Problem Scientists regularly generate massive data sets using simulations, and need help understanding this data. They use visualization and analysis to obtain this understanding. Since there are so many simulations, and so many scientists, it is imperative that our community delivers reliable tools, enabling them to do the analysis without assistance from visualization experts. Solution Developed libraries and tools to take advantage of parallelism and advanced hardware This ecosystem of tools take advantage of open-source development model across many institutions Designed for remote, interactive visualization This work represents significant partnering between ASCR, ASC and Private Industry. Impact Leading visualization and analysis solutions for Department of Energy scientists Also used worldwide on most HPC installations Over 1 million downloads of software worldwide Creates capability for DOE to look at data from the world’s largest simulations Steve Langer – “We rely on this software for gaining an understanding of complex and spatial variations in backscattered light simulations using pF3D, where data may consist of over 400 billion zones per time step.” Images Top Core collapse supernova from GeneASIS. Bottom Philip Wolfram, LANL. “The enhanced visualization provided the capability to see physical features in the flow that we not readily apparent before, e.g., the mixing barrier over the continental shelf break.” Publications Cinema James Ahrens, Sébastien Jourdain, Patrick O'Leary, John Patchett, David H. Rogers, and Mark Petersen. 2014. An image-based approach to extreme scale in situ visualization and analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 424-434. DOI: https://doi.org/10.1109/SC.2014.40 ParaView Ahrens, James, Geveci, Berk, Law, Charles, ParaView: An End-User Tool for Large Data Visualization, Visualization Handbook, Elsevier, 2005. SENSEI Utkarsh Ayachit, Brad Whitlock, Matthew Wolf, Burlen Loring, Berk Geveci, David Lonie, and E. Wes Bethel. 2016. The SENSEI generic in situ interface. In Proceedings of the 2nd Workshop on In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization (ISAV '16). IEEE Press, Piscataway, NJ, USA, 40-44. DOI: https://doi.org/10.1109/ISAV.2016.13 VisIt H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, K. Bonnell, M. Miller, G. H. Weber, C. Harrison, D. Pugmire, T. Fogal, C. Garth, A. Sanderson, E. W. Bethel, M. Durant, D. Camp, J. M. Favre, O. Ru ̈bel, P. Navr ́atil, M. Wheeler, P. Selby, and F. Vivodtzev. VisIt: An End-User Tool For Visualizing and Analyzing Very Large Data. In Proceedings of SciDAC 2011, Denver, CO, July 2011. VTK Schroeder, Will; Martin, Ken; Lorensen, Bill (2006), The Visualization Toolkit (4th ed.), Kitware, VTK-m VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures. Kenneth Moreland, Christopher Sewell, William Usher, Li-ta Lo, Jeremy Meredith, David Pugmire, James Kress, Hendrik Schroots, Kwan-Liu Ma, Hank Childs, Matthew Larsen, Chun-Ming Chen, Robert Maynard, and Berk Geveci. IEEE Computer Graphics and Applications, 36(3), May/June 2016. CHI (Human-Computer Interaction conference) Francesca Samsel, Mark Petersen, Terece Geld, Greg Abram, Joanne Wendelberger, and James Ahrens. 2015. Colormaps that Improve Perception of High-Resolution Ocean Data. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '15). ACM, New York, NY, USA, 703-710. DOI=http://dx.doi.org/10.1145/2702613.2702975 SENSEI: Generic In-situ Interface; 1 M rank In-situ Vis ‘06 User Community Expands DOE-wide Through SciDAC First Open Source Release of VTK 1999 VTK-based Distributed Memory Tools ‘13 Cinema: Image-based Interactive Vis ‘17 Combined Distributed-memory/ Shared-memory Tools ‘05 ‘10 In-situ Forms of Tools and 1st Vis of 1 Trillion Cell Dataset ‘11 ‘12 ‘15 1993 ‘14 ‘16 VTK-m: Many-core Version of VTK
Enabling DOE Visualization Tools with VTK-m Objective DOE’s production visualization tools, representing 100’s of man-years of investment, are not ready for exascale architectures. Integrating VTK-m into our existing visualization tools provides functionality at exascale while leveraging previous investment. Technology ParaView & VisIt: Interactive HPC visualization software leverages VTK and MPI for large scale jobs. VTK: Toolkit for 3D computer graphics, image processing, and visualization. Uses MPI for parallelization. VTK-m: Toolkit for the development and distribution of scientific visualization algorithms on multicore and accelerators. VTK-m In Situ with Catalyst/PyFR batch mode on Titan with 5000 GPUS Impact ParaView 5.3 and VisIt 2.11 offer VTK-m acceleration. Upcoming VTK 8.0 will offer VTK-m accelerated filters. In situ tools demonstrated on Titan with thousands of GPUs. VTK-m 1.0.0 Released VTK-m Live In Situ with Catalyst/PyFR VTK-m Added to VTK and ParaView VTK-m started by merging EAVL, Dax, and Piston VTK-m Added to VisIt EAVL, Dax, and Piston being developed 12 13 14 15 16 17
Live Demonstration of In Situ Visualization on Accelerator Processors Objective Scientific discovery using 100 petascale to exascale supercomputers requires integrated visualization solutions on new computational hardware. Multiple technologies to be implemented and integrated. Technology ParaView: existing HPC visualization infrastructure leveraging VTK over large MPI jobs with user interaction. Catalyst: in situ coupling of ParaView to simulations. VTK-m: toolkit for the development and distribution of visualization algorithms on multicore and accelerators. PyFR: CFD simulation running on 256 GPU devices of Oak Ridge’s Titan supercomputer to analyze the turbulence behind a new serrated jet engine nozzle. Impact A demonstration at SC 2015 shows the integration of these 4 key technologies for a live analysis of the simulation. Top right: The pockets of air in the jet wake reduce noise. Direct right: A live interactive visualization of a large simulation running on Titan viewed and controlled on the show floor in Austin. The entire process from simulation to analysis to rendering happens locally on the GPUs without memory transfer. As increasing limitations in network, power, and storage limit the movement of data, sharing resources is critical for effective analysis. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND 2016-0727 PE
Flyweight In Situ Analysis Objective In situ analysis though tightly-coupled simulation and analysis codes. Accelerator enabled scalable analysis and visualization though lightweight infrastructure Technology VTK-m : toolkit for the development and distribution of visualization algorithms on multicore and accelerators. Sensei : write-once, deploy-many in situ infrastructure with integration of Catalyst, Libsim, ADIOS and Glean. Strawman (Alpine) : lightweight in situ visualization infrastructure for multi-physics HPC simulations. Impact Initial demonstration of VTK-m integration into emerging in situ infrastructures. Flyweight in situ analysis with minimal dependencies. Upcoming scalability study with multiple mini-apps.
Usability of In Situ Generated PDFs for Post Hoc Analysis Application and Background Goals and Challenges Large-scale scientific simulations produce too much data for effective storage and analysis Researchers need a way to quickly query large data subsets, sometimes using trial and error, to explore scientific phenomena Many methods currently in use by scientific teams inefficiently operate on the full data A study is being done with Dr. Jackie Chen, a combustion scientist at Sandia National Labs Features of interest are not known a priori and must be explored during post hoc analysis Need to capture simulation trends in situ and store/represent them in an efficient format Must have little impact on the simulation and fit smoothly into the scientists’ existing workflow An effective set of tools must be evaluated using a thorough usability study to determine their effectiveness and improve their functionality Results and Impact A set of in situ libraries with a post hoc visualization tool can capture trends efficiently using probability distribution functions (PDFs) Large-scale particle data is coupled with PDFs in situ to enable fast subset selection and analysis Extensive performance tests ensure insignificant simulation and storage overheads An ongoing usability study with expert users will evaluate both the in situ and post hoc tools Kwan-Liu Ma
Parallel Peak Pruning: Scalable SMP Contour Tree Computation Hamish A Parallel Peak Pruning: Scalable SMP Contour Tree Computation Hamish A. Carr (University of Leeds), Gunther H. Weber (LBNL/Vis Base), Christopher M. Sewell (LANL/XVis), James P. Ahrens (LANL/XVis) Background and Problem Contour trees summarize topological properties of isosurfaces, making them a valuable tool for data analysis, such as identifying Most “important” isosurfaces based on prominence; and Features in wide range of applications, including burning regions (combustion simulations), halos (computational cosmology), atoms and bonds (computational chemistry), pores and pockets (materials). Optimal serial “sweep-and-merge” algorithm for computing contour trees based on inherently sequential metaphor, requires data processing in sorted order. Objective Effective utilization of multi-core architectures (per compute- node resources), such multi-core CPUs, GPUs Need efficient data-parallel contour tree computation algorithm. Approach New “parallel peak pruning” method finds peaks and their “governing saddles,” creating superarcs in the join/split tree by recursive pruning of peak/saddle pairs and their corresponding regions. Can prune many peak/saddle pairs concurrently, effectively using data parallel resources. Algorithm has formal guarantees of O(log(n) log(t)) parallel steps and O(nlog(n)) work–i.e., number of primitive operations–for grid with n vertices and t critical points, i.e., depends data on size and complexity. Native OpenMP and Nvidia Thrust implementations enable effective data-parallel contour computation on multi-core CPUs and GPUs. Recently finalized VTK-m port (https://gitlab.kitware.com/vtk/vtk-m) makes new algorithm available to DOE and HPC stakeholders. Results and Impact Significant parallel performance improvement: Up to 10x parallel speed-up on multi-core CPUs and up to 50x speed up on Nvidia GPUs, compared to optimal serial sweep-and-merge algorithm. (Tested on USGS GTOPO30 elevation maps.) Important building block for analysis on future exascale platforms. Publication “Parallel Peak Pruning for Scalable SMP Contour Tree Computation” by H. Carr, G. Weber, C. Sewell, J. Ahrens won best paper award at IEEE LDAV 2016. Why are contour trees important? Contour trees track connected components of isosurfaces, thus summarizing topological properties of isosurfracces Intuitively, contour trees encode elevations at which islands/lakes emerge and at which they merge with each other as water is drained/added water from/to a landscape. They, and their related merge trees are an important component in many data analysis pipelines For example: merge trees have been used to compute statistics about burning regions in combustion simulations (BoxLib), identify halos in cosmology simulations (Nyx/BoxLib), identify pores in porous media, e.g., for carbon sequestration (imaging data, Chombo simulations) Contour trees provide additional topological information and can be used, e.g., to find atoms and bonds in chemical simulations. Furthermore, they can be used for automatic parameter (isovalue, transfer function) determination for in situ visualization What is the problem? There is a de-facto standard serial algorithm to compute contour trees that is computationally optimal However this algorithm is based on a sweeping/flooding metaphor that requires sorting data and processing it in-order This approach poses severe limits when trying to parallelize it’ What is our objective? We need to find a new algorithm that uses current per-node hardware effectively and addresses increasing per-node concurrency of future architectures What is new about our approach? We completely re-thought the approach to compute merge tree and contour tree. Essentially, we identify its branches and ”prune” them of. Unlike the previous approach, branches can be handled concurrently and almost independently. Since contour trees of large data sets have many branches, this exposes a lot of “surface area” for parallelism What is the impact/result? We now have an algorithm that scales well to many cores (see slide). This takes care of the on-node computation of contour trees for current and close-future architectures It will serve as a building block for hybrid approaches that use it on individual nodes and combine the results across nodes (via MPI) This is important for exascale data analysis pipelines and “intelligent” automatic steering of visualization work flows when visualization is performed in situ References: XVis Impact Slide “Data-Parallel Algorithm for Contour Tree Construction” (LA-UR-17-20363) (a) Terrain and select contour lines. (b) Join and (c) split tree record where maxima and minima respectively “meet.” (d) The contour tree combines join and spit tree into a representation of the full connectivity of contour lines. (e) The branch decomposition orders features of the contour tree hierarchically based on a simplification measure, such as prominence (persistence). 8
VTK-m Support for ECP Projects VTK-m is being used for visualization and analysis in the “Coupled Monte Carlo Neutronics and Fluid Flow Simulation of Small Modular Reactors” ECP project. Direct visualization support for Constructive Solid Geometry (CSG) allows scientists to perform more accurate analysis Conversion to mesh-based representations not needed Expensive (1000s of rods!) and introduces errors In situ analysis and visualization possible without the need to copy and convert data A reactor pin assembly rendered directly from the native CSG representation. Detail view of CSG representation of a single rod in the pin assembly.
Wavelet Compression in VTK-m Background Technique Supercomputing trends have the ability to generate data going up faster than the ability to store data, which will deter visualization. One solution is to transform and reduce simulation data to a small enough form that post hoc analysis is possible. Wavelets are an excellent technology for this reduction, and are used widely in other fields Implemented wavelet compression in VTK-m Enables portable performance over many architectures Also soon to be available in many end-user products, due to VTK-m being adopted (VisIt, ParaView, ADIOS) Impact: wavelet compression “is now available” to simulation codes. Level of data reduction can be controlled by user Progress Study demonstrates: VTK-m approach is comparable to reference CPU implementation Also performs well on GPU “Achieving Portable Performance For Wavelet Compression Using VTK-m” by Li, Sewell, Clyne, and Childs. In submission to EuroGraphics Symposium on Parallel Graphics and Visualization (EGPGV) 2017. “is now available” since the code has not been merged back yet. Performance study comparing our hardware-agnostic approach with a hardware-specific solution. 10
External Facelist Calculation in VTK-m Background Technique External facelist calculation is an essential algorithm for visualization Algorithm produces renderable surfaces from three-dimensional volumes. Needed in conjunction with popular algorithms like clipping, material interface reconstruction, and interval volumes. Implemented two variants of algorithm in VTK-m – variants based on hashing and sorting, respectively Both required advanced usage of VTK-m features. Impact: by implementing the algorithm in VTK-m, it is now available on multiple architectures. This is the first-ever many-core implementation of this algorithm. Progress Study demonstrates: serial performance comparable to existing serial implementation good parallel performance hashing-based variant is fastest technique Algorithm contributed back to VTK-m repo “External Facelist Calculation with Data-Parallel Primitives” by Lessley, Binyahib, Maynard, and Childs, EuroGraphics Symposium on Parallel Graphics and Visualization (EGPGV), 2016. This slide describes a new technique for doing flow analysis. Compared to the traditional method, it is faster, more accurate, and takes less storage. The technique is dependent on, and enabled by, in situ processing. Background: Flow analysis is a family of techniques, including things like streamlines, pathlines, stream surfaces, line integral convolution, and Finite-Time Lyapunov Exponents (FTLE). Traditionally, these techniques are carried out by analyzing the trajectory of one or more particles. In the case of FTLE, the numbers of particles can be in the billions. The trajectories of particles are calculated via “advection”: displacing the particle from a given location by the direction of the velocity at that location. (The velocity coming from the simulation field and the displacement being done with an ordinary differential equation.) The idea here is to change the underlying operation for advection. Where the traditional method stores velocity fields, our new method calculates trajectories of key particles in situ, and then those trajectories are saved. Rather than displacing a particle using the velocity field, new particle trajectories can be interpolated from the basis of trajectories that are calculated in situ. This idea, inspired by the Lagrangian frame of reference rather than the traditional Eulerian one, is faster, more accurate, and takes less storage. (We showed these things to be true in our study, where we also introduced this technique.) Why is it more accurate?: because the traditional technique has to do temporal interpolation of velocity fields. As simulations save data less and less often (because disk is slowing down compared to compute), flow analysis techniques are getting increasingly inaccurate. This is not a problem for our Lagrangian technique, since it can access all of the data in situ Why does it take less storage?: the amount of storage used can be determined by the simulation scientist. We say less storage, because we assume that they will opt for less storage. The storage can be set at any level. Obviously, if the storage allotment is too low, the accuracy will suffer. However, we showed that the storage can be 64X smaller and maintain the same accuracy as the traditional technique. (We also showed that it is 10X more accurate if using the same storage.) Why is it faster?: the traditional approach involves loading a bunch of velocity fields. As a result, when calculating the trajectory of a few particles, a lot of data needs to be loaded. Our new approach is organized around trajectories. As a result, for a few particles, only a few trajectories need to be loaded … a big win. Visualization of two data sets used in study 11
Background Progress Impact Rendering in VTK-m Rendering is an essential operation for visualization Many types of rendering: rasterization, ray-tracing, volume rendering Also key for CINEMA-style data compression, which addresses I/O gap by saving many, many images instead of simulation data Implemented rendering techniques in VTK-m Studies show excellent performance on GPU, CPU, and Xeon Phi Comparison with industry standards (Intel OSPRay, NVIDIA OptiX) show our hardware-agnostic code has comparable performance Code available in VTK-m branch Impact For VTK-m to be successful, it is critical that it has performant, reliable rendering infrastructure With this effort, we now have “future proofed” rendering and can avoid external dependencies with respect to rendering This will improve in situ integration, due to smaller binaries and less code complexity Publications: “VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures”, Moreland et al. CG&A “is now available” since the code has not been merged back yet. Ray-traced rendering in VTK-m. 12