Download presentation
Presentation is loading. Please wait.
Published byKristian Gilmore Modified over 8 years ago
1
HOW PETASCALE VISUALIZATION WILL CHANGE THE RULES Hank Childs Lawrence Berkeley Lab & UC Davis 10/12/09
2
Supercomputing 101 Why simulation? Simulations are sometimes more cost effective than experiments. New model for science has three legs: theory, experiment, and simulation. What is the “petascale”? 1 FLOP = 1 FLoating point OPeration per second 1 GigaFLOP = 1 billion FLOPs, 1 TeraFLOP = 1000 GigaFLOPs 1 PetaFLOP = 1,000,000 GigaFLOPs PetaFLOPs + petabytes on disk + petabytes of memory petascale Why petascale? More compute cycles, more memory, etc, lead for faster and/or more accurate simulations.
3
Petascale computing is here. 4 existing petascale machines LANL RoadRunner ORNL Jaguar Julich JUGeneUTK Kraken
4
Supercomputing is not slowing down. Two ~20 PetaFLOP machines will be online in 2011 Q: When does it stop? A: Exascale is being actively discussed right now http://www.exascale.org LLNL SequoiaNCSA BlueWaters
5
How does the petascale affect visualization? Large # of time steps Large ensembles Large scale Large # of variables
6
Why is petascale visualization going to change the rules? Michael Strayer ( U.S. DoE Office of Science ): “petascale is not business as usual” Especially true for visualization and analysis! Large scale data creates two incredible challenges: scale and complexity Scale is not “business as usual” Supercomputing landscape is changing Solution: we will need “smart” techniques in production environments More resolution leads to more and more complexity Will the “business as usual” techniques still suffice? Outline What are the software engineering ramifications?
7
Production visualization tools use “pure parallelism” to process data. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code
8
Pure parallelism: pros and cons Pros: Easy to implement Cons: Requires large amount of primary memory Requires large I/O capabilities requires big machines
9
Pure parallelism performance is based on # bytes to process and I/O rates. Vis is almost always >50% I/O and sometimes 98% I/O Amount of data to visualize is typically O(total mem) Relative I/O (ratio of total memory and I/O) is key FLOPs MemoryI/O Terascale machine “Petascale machine”
10
Anedoctal evidence: relative I/O is getting slower. Machine nameMain memoryI/O rate ASC purple49.0TB140GB/s5.8min BGL-init32.0TB24GB/s22.2min BGL-cur69.0TB30GB/s38.3min Petascale machine ?? >40min Time to write memory to disk
11
Why is relative I/O getting slower? “I/O doesn’t pay the bills” Simulation codes aren’t affected.
12
Recent runs of trillion cell data sets provide further evidence that I/O dominates 12 ● Weak scaling study: ~62.5M cells/core 12 #coresProblem Size TypeMachine 8K0.5TZ AIX Purple 16K1TZSun LinuxRanger 16K1TZLinuxJuno 32K2TZCray XT5JaguarPF 64K4TZBG/PDawn 16K, 32K1TZ, 2TZCray XT4Franklin 2T cells, 32K procs on Jaguar 2T cells, 32K procs on Franklin -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds -Approx I/O time: 2-5 minutes -Approx processing time: 10 seconds
13
Assumptions stated I/O is a dominant term in visualization performance Supercomputing centers are procuring “imbalanced” petascale machines Trend is towards massively multi-core, with lots of shared memory within a node I/O goes to a node more cores less I/O bandwidth per core And: Overall I/O bandwidth is also deficient
14
Pure parallelism is not well suited for the petascale. Emerging problem: Pure parallelism emphasizes I/O and memory And: pure parallelism is the dominant processing paradigm for production visualization software. Solution? … there are “smart techniques” that de- emphasize memory and I/O. Data subsetting Multi-resolution Out of core In situ
15
Data subsetting eliminates pieces that don’t contribute to the final picture. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code
16
Data Subsetting: pros and cons Pros: Less data to process (less I/O, less memory) Cons: Extent of optimization is data dependent Only applicable to some algorithms
17
Multi-resolution techniques use coarse representations then refine. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code P2 P4
18
Multi-resolution: pros and cons Pros Avoid I/O & memory requirements Cons Is it meaningful to process simplified version of the data?
19
Out-of-core iterates pieces of data through the pipeline one at a time. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 Parallelized visualization data flow network P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code
20
Out-of-core: pros and cons Pros: Lower requirement for primary memory Doesn’t require big machines Cons: Still paying large I/O costs (Slow!)
21
In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender Processor 1 ReadProcessRender Processor 2 P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code
22
In situ processing does visualization as part of the simulation. P0 P1 P3 P2 P8 P7P6 P5 P4 P9 GetAccess ToData ProcessRender Processor 0 Parallelized visualization data flow network Parallel Simulation Code GetAccess ToData ProcessRender Processor 1 GetAccess ToData ProcessRender Processor 2 GetAccess ToData ProcessRender Processor 9 …………
23
In situ: pros and cons Pros: No I/O! Lots of compute power available Cons: Very memory constrained Many operations not possible Once the simulation has advanced, you cannot go back and analyze it User must know what to look a priori Expensive resource to hold hostage!
24
Summary of Techniques and Strategies Pure parallelism can be used for anything, but it takes a lot of resources Smart techniques can only be used situationally Petascale strategy 1: Stick with pure parallelism and live with high machine costs & I/O wait times Other petascale strategies? Assumption: We can’t afford massive dedicated clusters for visualization We can fall back on the super computer, but only rarely
25
Now we know the tools … what problem are we trying to solve? Three primary use cases: Exploration Confirmation Communication Examples: Scientific discovery Debugging Examples: Scientific discovery Debugging Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Comparison Examples: Data analysis Images / movies Examples: Data analysis Images / movies
26
Notional decision process Need all data at full resolution? No Multi-resolution (debugging & scientific discovery) Multi-resolution (debugging & scientific discovery) Yes Do operations require all the data? No Data subsetting (comparison & data analysis) Data subsetting (comparison & data analysis) Yes Do you know what you want do a priori? Yes In Situ (data analysis & images / movies) In Situ (data analysis & images / movies) No Do algorithms require all data in memory? No Interactivity required? No Out-of-core (Data analysis & images / movies) Out-of-core (Data analysis & images / movies) Exploration Confirmation Communication Pure parallelism (Anything & esp. comparison) Pure parallelism (Anything & esp. comparison) Yes
27
Alternate strategy: smart techniques All visualization and analysis work Multi-res In situ Out-of-core Do remaining ~5% on SC Data subsetting
28
How Petascale Changes the Rules We can’t use pure parallelism alone any more We will need algorithms to work in multiple processing paradigms Incredible research problem… … but also an incredible software engineering problem.
29
Data flow networks… a love story 29 File Reader (Source) Slice Filter Contour Filter Renderer (Sink) Update Execute Work is performed by a pipeline A pipeline consists of data objects and components (sources, filters, and sinks) Pipeline execution begins with a “pull”, which starts Update phase Data flows from component to component during the Execute phase
30
Data flow networks: strengths Flexible usage Networks can be multi-input / multi-output Interoperability of modules Embarrassingly parallel algorithms handled by base infrastructure Easy to extend New derived types of filters Abstract filter Slice filter Contour filter ???? filter Inheritance Source Sink Filter A Filter B Filter C Flow of data
31
Data flow networks: weaknesses Execution of modules happens in stages Algorithms are executed at one time Cache inefficient Memory footprint concerns Some implementations fix the data model
32
Data flow networks: observations Majority of code investment is in algorithms (derived types of filters), not in base classes (which manage data flow). Source code for managing flow of data is small and in one place Algorithms don’t care about data processing paradigm … they only care about operating on inputs and outputs.
33
Example filter: contouring Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering {
34
Example filter: contouring with data subsetting Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { Communicate with executive to discard pieces
35
Example filter: contouring with out-of-core Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { 123 456 789 101112 123 456 789 101112 Algorithm called 12 times
36
Example filter: contouring with multi-resolution techniques Contour algorithm Contour filter Mesh input Surface/line output Data Reader Contour Filter Rendering {
37
Simulation code Example filter: contouring with in situ Contour algorithm Contour filter Mesh inputSurface/line output Data Reader Contour Filter Rendering { X For each example, the contour algorithm didn’t change, just its context.
38
How big is this job? Many algorithms are basically “processing paradigm” indifferent What percentage of a vis code is algorithms? What percentage is devoted to the “processing paradigm”? Other? We can gain insight by looking at the breakdown in a real world example (VisIt).
39
VisIt is a richly featured, turnkey application for large data. Tool has two focal points: big data & providing a product for end users. VisIt is an open source, end user visualization and analysis tool for simulated and experimental data >100K downloads on web R&D 100 award in 2005 Used “heavily to exclusively” on 8 of world’s top 12 supercomputers Pure parallelism + out-of-core + data subsetting + in situ 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L)
40
VisIt architecture & lines of code Client side Server side viewer gui cli mdserver engine (parallel and serial) engine (parallel and serial) + custom interfaces + documentation + regression testing + user knowledge + Wiki + mailing list archives 154K 70K 103K 14K 29K Plots 43K Operators 55K 34K 33K Databases 192K Handling large data, parallel algorithms 32K / 559K Support libraries & tools 178K libsim 10K Pure parallelism is the simplest paradigm. “Replacement” code may be significantly larger.
41
Summary Petascale machines are not well suited for pure parallelism, because of its high I/O and memory costs. This will force production visualization software to utilize more processing paradigms. The majority of existing investments can be preserved. This is thanks in large part to the elegant design of data flow networks. Hank Childs, hchilds@lbl.gov … and questions???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.