Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hank Childs, University of Oregon Large Data Visualization.

Similar presentations


Presentation on theme: "Hank Childs, University of Oregon Large Data Visualization."— Presentation transcript:

1 Hank Childs, University of Oregon Large Data Visualization

2 Announcements (1/2) Grading: – 6A, 6B, 7A graded – To do: 5, 7b (both 510 only) Current denominators: – 410: 65 points – 510: 85 points Feedback: fantastic job! – put them on your resumes…

3 Announcements (2/2) Final: – location, Fri Dec 11 th, 10:15am – 5 mins each for self-defined PPT, demo just PPT just demo – all should attend (even pre-defined folks) Schedule: – today: unstructured grids + FAQ on VTK – Weds: large data vis – Fri: Lagrangian flow (research)

4  Why simulation?  Simulations are sometimes more cost effective than experiments  But: simulations are only valuable if they are accurate  And: to achieve accuracy, simulations often require high resolution meshes Three Pillars of Science: Theory, Experiment, and Simulation

5 What are the challenges with large data visualization?  Two grand challenges:  Handling the scale of the data  Reducing the complexity of the data

6 512^3 portion of turbulent flow data set Data source: P.K. Yeung and Diego Donzins How can we gain insight from increasingly complex data?

7 1024^3 portion of turbulent flow data set Data source: P.K. Yeung and Diego Donzins How can we gain insight from increasingly complex data?

8 2048^3 portion of turbulent flow data set Data source: P.K. Yeung and Diego Donzins How can we gain insight from increasingly complex data?

9 4096^3 portion of turbulent flow data set Data source: P.K. Yeung and Diego Donzins How can we gain insight from increasingly complex data?

10  We developed a statistical approach for identifying and understanding “worms” in large-scale turbulent data.  Techniques deal with both scale and complexity. K.P. Gaither, H. Childs, K. Schulz, C. Harrison, W. Barth, D. Donzis, P.K. Yeung, “Using Visualization and Data Analysis to Understand Critical Structures in Massive Time Varying Turbulent Flow Simulations”, IEEE Computer Graphics & Applications, 32(4):34–45, July/Aug. 2012 K.P. Gaither, H. Childs, K. Schulz, C. Harrison, W. Barth, D. Donzis, P.K. Yeung, “Using Visualization and Data Analysis to Understand Critical Structures in Massive Time Varying Turbulent Flow Simulations”, IEEE Computer Graphics & Applications, 32(4):34–45, July/Aug. 2012 The best way to reduce the complexity is often application specific. How can we gain insight from increasingly complex data?

11 Supercomputers and Simulation  Why supercomputers?  More mesh resolution  more memory and computation  supercomputers  Very expensive supercomputers can still be cost effective. Climate Change Astrophysics Nuclear Reactors Image credit: Prabhat, LBNL The data sets resulting from these simulations are large and require special visualization techniques

12 Defining “big data” for visualization  Big data: data that is too big to be processed in its entirety all at one time because it exceeds the available memory. CriterionApproaches In its entiretyData subsetting / multi-resolution All at one timeStreaming (e.g. out of core) Exceeds available memory Parallelism

13 Different Types of Parallelism Distributed-memory parallelism Shared-memory parallelism Mostly focusing on distributed- memory parallelism

14 Parallelization: Overview It turns out many, many visualization operations follow “scatter, gather” – Scatter the pieces over processors – Perform operation with no coordination “embarrassingly parallel” – Gather the results later (often with rendering) This is a very scalable approach Very little work is needed to achieve parallelism in this context

15 Parallelization & Data Flow Networks Every processor instantiates an identical data flow network The data flow networks are only differentiated by the portion of the larger data set they work on PE = Processing Element (a compute node) Visualization reusing nodes from the supercomputer that generated the data

16 Data Parallel Pipelines Duplicate pipelines run independently on different partitions of data. Slide courtesy of Ken Moreland, Sandia Lab

17 Data Parallel Pipelines Duplicate pipelines run independently on different partitions of data. Slide courtesy of Ken Moreland, Sandia Lab

18 Data Parallel Pipelines Some operations will work regardless. – Example: Clipping. Slide courtesy of Ken Moreland, Sandia Lab

19 Data Parallel Pipelines Some operations will work regardless. – Example: Clipping. Slide courtesy of Ken Moreland, Sandia Lab

20 Data Parallel Pipelines Some operations will work regardless. – Example: Clipping. Slide courtesy of Ken Moreland, Sandia Lab

21 Data Parallel Pipelines Some operations will have problems. – Example: External Faces Slide courtesy of Ken Moreland, Sandia Lab

22 Data Parallel Pipelines Some operations will have problems. – Example: External Faces Slide courtesy of Ken Moreland, Sandia Lab

23 Data Parallel Pipelines Ghost cells can solve most of these problems. Slide courtesy of Ken Moreland, Sandia Lab

24 Data Parallel Pipelines Ghost cells can solve most of these problems. Slide courtesy of Ken Moreland, Sandia Lab But some algorithms really have problems … will mention those later.

25 (All parallel rendering slides courtesy Ken Moreland, Sandia Lab)

26 The Graphics Pipeline Points Lines Polygons Rendering Hardware Geometric Processing Translation Lighting Clipping Geometric Processing Translation Lighting Clipping Rasterization Polygon Filling Interpolation Texture Application Hidden Surface Removal Rasterization Polygon Filling Interpolation Texture Application Hidden Surface Removal Frame Buffer Display

27 FB Parallel Graphics Pipelines G G R R G G R R G G R R G G R R

28 FB Sort First Parallel Rendering G G R R G G R R G G R R G G R R Sorting Network

29 FB Sort Middle Parallel Rendering G G R R G G R R G G R R G G R R Sorting Network

30 FB Sort Last Parallel Rendering G G R R G G R R G G R R G G R R Sorting Network

31 Sort-First Bottleneck Renderer Polygon Sorter Network

32 Sort-Last Bottleneck Composition Network Renderer

33 Parallel rendering  So which is best?....  Vis: images normally ~1M pixels, data normally really big  So: sort last

34 Outline Overview Parallel Rendering Easy-to-Parallelize Algorithms Data Flow Networks and Smart Techniques

35 Parallel Isosurfacing Divide pieces over processors Each processor applies the isosurfacing algorithm to it pieces – There is no coordination required between pieces!! This is called “embarrassingly parallel” Each processor now contains a part of the large surface  parallel rendering

36 Parallel Isosurfacing … is it really that easy? What can go wrong: – You have cell centered data & algorithm works on node-centered data. – So must interpolate to nodes – And end up with inconsistency at boundaries – And thus broken a isosurface Ghost data solves this problem.

37 Parallel efficiency + isosurfacing Parallel reads Variation in isosurface calculation time Parallel rendering What are the barriers to parallel efficiency?

38 Parallelization similar to isosurfacing All the algorithms we discussed as being based on “Marching Cubes” – Slicing – Interval volumes – Box – Clip – Slicing by non-planes Threshold and many more… BUT: volume rendering, streamlines, and more are much harder….

39 Parallelizing Particle Advection Very difficult to parallelize Particle path is data-dependent – You don’t know where the particle will go – … so you don’t know which data to load In my opinion: hardest problem to efficiently parallelize in visualization Two axes: – Is the data big or small? – Do you have lots of particles or few?

40 Small data, small number of particles Serial processing!

41 Large data, small number of particles Still serial processing!

42 Small data, large number of seeds ReadAdvectRender Processor 1 ReadAdvectRender ReadAdvectRender Processor 0 Parallelized visualization data flow network File Simulation code Parallelizes very well, but the key is that the data is small enough that it can fit in memory. Parallelizes very well, but the key is that the data is small enough that it can fit in memory. Can also work on large data sets, but can really suffer with redundant I/O This scheme is referred to as parallelizing-over-particles.

43 Large data, large number of particles P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) P0 P3 P2 P5 P4 P7 P6 P9 P8 P1 Parallel Simulation Code ReadAdvectRender Processor 1 ReadAdvectRender ReadAdvectRender Processor 0 Parallelized visualization data flow network This scheme is referred to as parallelizing-over-data. What if all particles are in P0 and never leave?

44 Parallelization with big data & lots of seed points & bad distribution  Two extremes: Partition data over processors and pass particles amongst processors −Parallel inefficiency! Partition seed points over processors and process necessary data for advection −Redundant I/O! Notional streamline example Notional streamline example P0 P1 P2 P3 P4 P0 P1 P2 P3 P4 Parallelizing OverI/OEfficiency DataGoodBad ParticlesBadGood Parallelize over particles Parallelize over data Hybrid algorithms See “Pugmire, Childs, Garth, Ahern, Weber @ SC09”


Download ppt "Hank Childs, University of Oregon Large Data Visualization."

Similar presentations


Ads by Google