SC05 November, 2005 Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA
SC05 November, 2005 National Center for Atmospheric Research Space Weather Turbulence Atmospheric Chemistry Climate Weather The Sun More than just the atmosphere… from the earth’s oceans to the solar interior
SC05 November, 2005 Goals 1.Improve scientist’s ability to investigate and understand complex phenomena found in high-resolution fluid flow simulations –Accelerate analysis process and improve scientific productivity –Enable exploration of data sets heretofore impractical due to unwieldy size –Gain insight into physical processes governing fluid dynamics widely found in the natural world 2.Demonstrate visualization’s ability to aid in day-to-day scientific discovery process
SC05 November, 2005 Problem motivation : Analysis of high resolution numerical turbulence simulations Simulations are huge!! –May require months of supercomputer time –Multi-variate (typically 5 to 8 variables) –Time-varying data –A single experiment may yield terabytes of numerical data Analysis requirements are formidable –Numerical outputs simulate phenomena not easily observed!!! –Interesting domain regions (ROIs) may not be known apriori Additionally… –Historical focus of computing centers on batch processing –Dichotomy of batch and interactive processing needs –Currently available analysis tools inadequate for large data needs Single threaded, 32bit, in-core algorithms Lack advanced visualization capabilities –Currently available visualization tools ill-suited for analysis
SC05 November, 2005 [Numerical] models that can currently be run on typical supercomputing platforms produce data in amounts that make storage expensive, movement cumbersome, visualization difficult, and detailed analysis impossible. The result is a significantly reduced scientific return from the nation's largest computational efforts. And furthermore…
SC05 November, 2005 A sampling of various technology performance curves Not all technologies advance at same rate!!!
SC05 November, 2005 Example: Compressible plume dynamics 504x504x variables (u,v,w,rho,temp) ~500 time steps saved 9TBs storage Six months compute time required on 112 IBM SP RS/6000 processors Three months for post- processing Data may be analyzed for several years M. Rast, Image courtesy of Joseph Mendoza, NCAR/SCD
SC05 November, 2005 Visualization and Analysis Platform for oceanic, atmospheric, and solar Research (VAPoR) Key components Domain specific numerically simulated turbulence in the natural sciences Data processing language Data post processing and quantitative analysis Advanced visualization Identify spatial/temporal ROIs Multiresolution Enable speed/quality tradeoffs This work is funded in part through a U.S. National Science Foundation, Information Technology Research program grant Combination of visualization with multiresolution data representation that provide sufficient data reduction to enable interactive work on time-varying data
SC05 November, 2005
SC05 November, 2005 Multiresolution Data Representation Geometry Reduction (Schroeder et al, 1992; Lindrstrom & Silva, 2001;Shaffer and Garland, 2001) Wavelet based progressive data access –Mathematical transforms similar to Fourier transformations –Invertible and lossless –Numerically efficient forward and inverse transform –No additional storage costs –Permit hierarchical representations of functions –See Clyne, VIIP2003 Transform (e.g. Iso, cut plane) Render geometry Data Source data Pixels Analyze & Manipulate Text, 2D graphics Visualization Pipeline Reduce Data reduction (Cignoni, et al 1994; Wilhelms & Van Gelder, 1994; Pascucci & Frank, 2001; Clyne 2003)
SC05 November, 2005 Putting it all together Visual data browsing permits rapid identification of features of interest, reducing data domain Multiresolution data representation affords a second level of data reduction by permitting speed/quality trade offs enabling rapid hypothesis testing Quantitative operators and data processing enable data analysis Result: Integrated environment for large-data exploration and discovery Goal: Avoid unnecessary and expensive full-domain calculations –Execute on human time scales!!! Visual data browsing Data manipulation Quantitative analysis Refine Coarsen
SC05 November, 2005 Compressible Convection M. Rast, 2002
SC05 November, x504x2048 Full 252x252x1024 1/8 126x126x512 1/64 63x63x256 1/512 Compressible plume data set shown at native and progressively coarser resolutions Compressible plume Resolution: Problem size:
SC05 November, 2005 Rendering timings Compressible Convection504 2 x2048 Compressible Plume Reduced resolution affords responsive interaction while preserving all but finest features SGI Octane2, 1x600MHz R14k SGI Origin, 10x600MHz R14k Interactive!!
SC05 November, 2005 Derived quantities p:pressure :density T:temperature :ionization potential :Avogadro’s number m e :electron mass k:Boltzmann’s constant h:Planck’s constant Derived quantities produced from the simulation’s field variables as a post- process
SC05 November, 2005 Calculation timings for derived quantities Note: 1/2 th resolution is 1/8 th problem size, etc Deriving new quantities on interactive time scales only possible with data reduction SGI Origin, 10x600MHz R14k
SC05 November, 2005 Error in approximations Error is highly dependent on operation performed Algebraic operations tested introduced low error even after substantial coarsening Error grows rapidly for gradient calculation Point-wise error gives no indication of global (average) error Point-wise, normalized, maximum, absolute error ResolutionP Eq 1 Y Eq 2 2 Eq 3 Full000 1/ / /
Integrated visualization and analysis on interactively selected subdomains: Vertical vorticity of the flow Mach number of the vertical velocity Full domain seen from aboveSubdomain from side Full domain seen from aboveSubdomain from side Efficient analysis requires rapid calculation and visualization of unanticipated derived quantities. This can be facilitated by a combination of subdomain selection and resolution reduction.
A test of multiresolution analysis: Force balance in supersonic downflows Sites of supersonic downflow are also those of very high vertical vorticity. The core of the vortex tubes are evacuated, with centripetal acceleration balancing that due to the inward directed pressure gradient. Buoyancy forces are maximum on the tube periphery due to mass flux convergence. The same interpretation results from analysis at half resolution. Full Half Resolution Subdomain selection and reduced resolution together yield data reduction by a factor of 128
SC05 November, 2005 Summary Presented prototype, integrated analysis environment aimed at aid investigation of high-resolution numerical fluid flow simulations Orders of magnitude data reduction achieved through: 1.Visualization: Reduce full domain to ROI 2.Multiresolution: Enable speed/quality trade-offs Coarsened data frequently suitable for rapid hypothesis testing that may later be verified at full resolution
SC05 November, 2005 Future work Quantify and predict error in results obtained with various mathematical operations applied to coarsened data Investigate lossy and lossless data compression Add support for less regular meshes Explore other scientific domains –Climate, weather, atmospheric chemistry,…
SC05 November, 2005 Future??? Original20:1 Lossy Compression
SC05 November, 2005 Acknowledgements Steering Committee –Nic Brummell - CU, JILA –Aimé Fournier – NCAR, IMAGe –Helene Politano - Observatoire de la Cote d'Azur –Pablo Mininni, NCAR, IMAGe –Yannick Ponty - Observatoire de la Cote d'Azur –Annick Pouquet - NCAR, ESSL –Mark Rast - NCAR, HAO –Duane Rosenberg - NCAR, IMAGe –Matthias Rempel - NCAR, HAO –Yuhong Fan - NCAR, HAO Developers –Alan Norton – NCAR, SCD –John Clyne – NCAR, SCD Research Collaborators –Kwan-Liu Ma, U.C. Davis –Hiroshi Akiba, U.C. Davis –Han-Wei Shen, Ohio State –Liya Li, Ohio State Systems Support –Joey Mendoza, NCAR, SCD
SC05 November, 2005 Questions???