HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09

Supercomputing 101  Why simulation?  Simulations are sometimes more cost effective than experiments.  New model for science has three legs: theory, experiment, and simulation.  What is the “petascale”?  1 FLOP = 1 FLoating point OPeration per second  1 GigaFLOP = 1 billion FLOPs, 1 TeraFLOP = 1000 GigaFLOPs  1 PetaFLOP = 1,000,000 GigaFLOPs  PetaFLOPs + petabytes on disk + petabytes of memory  petascale  Why petascale?  More compute cycles, more memory, etc, lead for faster and/or more accurate simulations.

Petascale computing is here.  4 existing petascale machines LANL RoadRunner ORNL Jaguar Julich JUGeneUTK Kraken

Supercomputing is not slowing down.  Two ~20 PetaFLOP machines will be online in 2011  Q: When does it stop?  A: Exascale is being actively discussed right now  http://www.exascale.org LLNL SequoiaNCSA BlueWaters

How does the petascale affect visualization? Large # of time steps Large ensembles Large scale Large # of variables

Why is petascale visualization going to change the rules?  Michael Strayer ( U.S. DoE Office of Science ): “petascale is not business as usual”  Especially true for visualization and analysis!  Large scale data creates two incredible challenges: scale and complexity  Scale is not “business as usual”  Supercomputing landscape is changing  Solution: we will need “smart” techniques in production environments  More resolution leads to more and more complexity  Will the “business as usual” techniques still suffice? Outline  What are the software engineering ramifications?

Production visualization tools use “pure parallelism” to process data. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Pure parallelism: pros and cons  Pros:  Easy to implement  Cons:  Requires large amount of primary memory  Requires large I/O capabilities   requires big machines

Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem)  Relative I/O (ratio of total memory and I/O) is key FLOPs MemoryI/O Terascale machine “Petascale machine”

Anedoctal evidence: relative I/O is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk

Why is relative I/O getting slower?  “I/O doesn’t pay the bills”  Simulation codes aren’t affected.

Recent runs of trillion cell data sets provide further evidence that I/O dominates 12 ● Weak scaling study: ~62.5M cells/core 12 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds

Assumptions stated  I/O is a dominant term in visualization performance  Supercomputing centers are procuring “imbalanced” petascale machines  Trend is towards massively multi-core, with lots of shared memory within a node  I/O goes to a node more cores  less I/O bandwidth per core  And: Overall I/O bandwidth is also deficient

Pure parallelism is not well suited for the petascale.  Emerging problem:  Pure parallelism emphasizes I/O and memory  And: pure parallelism is the dominant processing paradigm for production visualization software.  Solution? … there are “smart techniques” that de- emphasize memory and I/O.  Data subsetting  Multi-resolution  Out of core  In situ

Data subsetting eliminates pieces that don’t contribute to the final picture. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Data Subsetting: pros and cons  Pros:  Less data to process (less I/O, less memory)  Cons:  Extent of optimization is data dependent  Only applicable to some algorithms

Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4

Multi-resolution: pros and cons  Pros  Avoid I/O & memory requirements  Cons  Is it meaningful to process simplified version of the data?

Out-of-core iterates pieces of data through the pipeline one at a time. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

Out-of-core: pros and cons  Pros:  Lower requirement for primary memory  Doesn’t require big machines  Cons:  Still paying large I/O costs (Slow!)

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code

In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………

In situ: pros and cons  Pros:  No I/O!  Lots of compute power available  Cons:  Very memory constrained  Many operations not possible Once the simulation has advanced, you cannot go back and analyze it  User must know what to look a priori Expensive resource to hold hostage!

Summary of Techniques and Strategies  Pure parallelism can be used for anything, but it takes a lot of resources  Smart techniques can only be used situationally  Petascale strategy 1:  Stick with pure parallelism and live with high machine costs & I/O wait times  Other petascale strategies?  Assumption: We can’t afford massive dedicated clusters for visualization We can fall back on the super computer, but only rarely

Now we know the tools … what problem are we trying to solve?  Three primary use cases:  Exploration  Confirmation  Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies

Notional decision process Need all data at full resolution? No Multi-resolution (debugging & scientific discovery) Multi-resolution (debugging & scientific discovery) Yes Do operations require all the data? No Data subsetting (comparison & data analysis) Data subsetting (comparison & data analysis) Yes Do you know what you want do a priori? Yes In Situ (data analysis & images / movies) In Situ (data analysis & images / movies) No Do algorithms require all data in memory? No Interactivity required? No Out-of-core (Data analysis & images / movies) Out-of-core (Data analysis & images / movies) Exploration Confirmation Communication Pure parallelism (Anything & esp. comparison) Pure parallelism (Anything & esp. comparison) Yes

Alternate strategy: smart techniques All visualization and analysis work Multi-res In situ Out-of-core Do remaining ~5% on SC Data subsetting

How Petascale Changes the Rules  We can’t use pure parallelism alone any more  We will need algorithms to work in multiple processing paradigms  Incredible research problem…  … but also an incredible software engineering problem.

Data flow networks… a love story 29 File Reader (Source) Slice Filter Contour Filter Renderer (Sink) Update Execute  Work is performed by a pipeline  A pipeline consists of data objects and components (sources, filters, and sinks)  Pipeline execution begins with a “pull”, which starts Update phase  Data flows from component to component during the Execute phase

Data flow networks: strengths  Flexible usage  Networks can be multi-input / multi-output  Interoperability of modules  Embarrassingly parallel algorithms handled by base infrastructure  Easy to extend  New derived types of filters Abstract filter Slice filter Contour filter ???? filter Inheritance Source Sink Filter A Filter B Filter C Flow of data

Data flow networks: weaknesses  Execution of modules happens in stages  Algorithms are executed at one time Cache inefficient  Memory footprint concerns  Some implementations fix the data model

Data flow networks: observations  Majority of code investment is in algorithms (derived types of filters), not in base classes (which manage data flow).  Source code for managing flow of data is small and in one place Algorithms don’t care about data processing paradigm … they only care about operating on inputs and outputs.

Example filter: contouring Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering {

Example filter: contouring with data subsetting Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { Communicate with executive to discard pieces

Example filter: contouring with out-of-core Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { 123 456 789 101112 123 456 789 101112 Algorithm called 12 times

Example filter: contouring with multi-resolution techniques Contour algorithm Contour filter Mesh input Surface/line output Data Reader Contour Filter Rendering {

Simulation code Example filter: contouring with in situ Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { X For each example, the contour algorithm didn’t change, just its context.

How big is this job?  Many algorithms are basically “processing paradigm” indifferent  What percentage of a vis code is algorithms?  What percentage is devoted to the “processing paradigm”?  Other? We can gain insight by looking at the breakdown in a real world example (VisIt).

VisIt is a richly featured, turnkey application for large data.  Tool has two focal points: big data & providing a product for end users.  VisIt is an open source, end user visualization and analysis tool for simulated and experimental data  >100K downloads on web  R&D 100 award in 2005  Used “heavily to exclusively” on 8 of world’s top 12 supercomputers  Pure parallelism + out-of-core + data subsetting + in situ 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L)

VisIt architecture & lines of code Client side Server side viewer gui cli mdserver engine (parallel and serial) engine (parallel and serial)  + custom interfaces  + documentation  + regression testing  + user knowledge  + Wiki  + mailing list archives 154K 70K 103K 14K 29K Plots 43K Operators 55K 34K 33K Databases 192K Handling large data, parallel algorithms 32K / 559K Support libraries & tools 178K libsim 10K Pure parallelism is the simplest paradigm. “Replacement” code may be significantly larger.

Summary  Petascale machines are not well suited for pure parallelism, because of its high I/O and memory costs.  This will force production visualization software to utilize more processing paradigms.  The majority of existing investments can be preserved.  This is thanks in large part to the elegant design of data flow networks.  Hank Childs, hchilds@lbl.gov  … and questions???

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

Similar presentations

Presentation on theme: "HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09.

Similar presentations

Presentation on theme: "HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09."— Presentation transcript:

Similar presentations

About project

Feedback