A Framework for Particle Advection for Very Large Data

Slides:

Advertisements

Similar presentations

Mounting Installations SITRANS P for Differential Pressure

Advertisements

P1 RJM 14/06/04Robotics : From Seven Dwarfs to R2D2 © Dr Richard Mitchell 2004 Robotics: From the Seven Dwarves to R2D2! Dr Richard Mitchell Department.

OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin),

Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams DMSN 2011 Cagri Balkesen & Nesime Tatbul.

Voronoi-based Geospatial Query Processing with MapReduce

17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.

Scheduling Algorithems

Minimum Weight Plastic Design For Steel-Frame Structures EN 131 Project By James Mahoney.

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

1 DOE Data Center Tools Suite Assessment Tool Development Electrical Example Lawrence Berkeley National Laboratory ANCIS EYP Mission Critical Rumsey Engineers.

Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©

Chapter 6 Intermediate Code Generation

Hank Childs Lawrence Berkeley National Laboratory /

1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

A Framework for Particle Advection for Very Large Data Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis.

Efficient Representation of Data Structures on Associative Processors Jalpesh K. Chitalia (Advisor Dr. Robert A. Walker) Computer Science Department Kent.

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University.

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

Reference: Message Passing Fundamentals.

Nuclear Energy Work Hank Childs & Christoph Garth April 15, 2010.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.

A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.

1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:

E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.

VACET: Deploying Technology for Visualizing and Analyzing Astrophysics Simulations Author May 19, 2009.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Expanding the CASE Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd.

Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.

Hank Childs, University of Oregon Lecture #6 CIS 410/510: Advection (Part 1)

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

How do we handle algorithms that aren’t embarrassingly parallel? P0 P1 P3 P2 P8 P7P6 P5 P4 P9 Pieces of data (on disk) ReadProcessRender Processor 0 ReadProcessRender.

Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Parallel and Distributed Simulation Time Parallel Simulation.

Hank Childs, University of Oregon Volume Rendering Primer / Intro to VisIt.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Hank Childs, University of Oregon Large Data Visualization.

1 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) VisIt: a visualization tool for large turbulence simulations Large data requires special techniques.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.

Zachary Starr Dept. of Computer Science, University of Missouri, Columbia, MO 65211, USA Digital Image Processing Final Project Dec 11 th /16 th, 2014.

Invitation to Computer Science 6th Edition

VisIt Project Overview

Parallel Algorithm Design

Parallel Programming in C with MPI and OpenMP

Unistore: Project Updates

Component Frameworks:

Using Packet Information for Efficient Communication in NoCs

Department of Computer Science, University of Tennessee, Knoxville

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Parallel Programming in C with MPI and OpenMP

Portable Performance for Many-Core Particle Advection

In Situ Fusion Simulation Particle Data Reduction Through Binning

Presentation transcript:

A Framework for Particle Advection for Very Large Data Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis Sean Ahern, ORNL Gunther Weber, LBNL Allen Sanderson, Univ. of Utah

Particle advection is a foundational visualization algorithm Advecting particles creates integral curves

Particle advection is the duct tape of the visualization world Advecting particles is essential to understanding flow and other phenomena (e.g. magnetic fields)!

Outline A general system for particle-advection based analysis Efficient advection of particles

Outline A general system for particle-advection based analysis Efficient advection of particles

Goal Efficient code for a variety of particle advection workloads and techniques Cognizant of use cases from 1 particle to >>10K particles. Need handling of every particle, every evaluation to be efficient. Want to support many diverse flow techniques: flexibility/extensibility is key. Fit within data flow network design (i.e. a filter)

Motivating examples of system FTLE Stream surfaces Streamline Dynamical Systems (e.g. Poincaré Maps) Statistics based analysis + more

Design PICS filter: parallel integral curve system Execution: Instantiate particles at seed locations Step particles to form integral curves Analysis performed at each step Termination criteria evaluated for each step When all integral curves have completed, create final output

Design Five major types of extensibility: How to parallelize? How do you evaluate velocity field? How do you advect particles? Initial particle locations? How do you analyze the particle paths?

Inheritance hierarchy avtPICSFilter avtIntegralCurve Streamline Filter Your derived type of PICS filter avtStreamlineIC Your derived type of integral curve We disliked the “matching inheritance” scheme, but this achieved all of our design goals cleanly.

#1: How to parallelize? avtICAlgorithm avtParDomIC-Algorithm (parallel over data) avtSerialIC-Algorithm (parallel over seeds) avtMasterSlave-ICAlgorithm

#2: Evaluating velocity field avtIVPField avtIVPVTKField avtIVPVTK- TimeVarying- Field avtIVPM3DC1 Field avtIVP-<YOUR>HigherOrder-Field IVP = initial value problem

#3: How do you advect particles? avtIVPSolver avtIVPDopri5 avtIVPEuler avtIVPLeapfrog avtIVP-M3DC1Integrator avtIVPAdams-Bashforth IVP = initial value problem

#4: Initial particle locations avtPICSFilter::GetInitialLocations() = 0;

#5: How do you analyze particle path? avtIntegralCurve::AnalyzeStep() = 0; All AnalyzeStep will evaluate termination criteria avtPICSFilter::CreateIntegralCurveOutput( std::vector<avtIntegralCurve*> &) = 0; Examples: Streamline: store location and scalars for current step in data members Poincare: store location for current step in data members FTLE: only store location of final step, no-op for preceding steps NOTE: these derived types create very different types of outputs.

Putting it all together PICS Filter ::CreateInitialLocations() = 0; avtICAlgorithm avtIVPSolver http://www.visitusers.org/index.php?title=Pics_dev avtIVPField Vector< avtIntegral-Curve> ::AnalyzeStep() = 0; Integral curves sent to other processors with some derived types of avtICAlgorithm.

Outline A general system for particle-advection based analysis Efficient advection of particles

Advecting particles Decomposition of large data set into blocks on filesystem ? What is the right strategy for getting particle and data together?

Strategy: load blocks necessary for advection Go to filesystem and read block Decomposition of large data set into blocks on filesystem

Strategy: load blocks necessary for advection Decomposition of large data set into blocks on filesystem

“Parallelize over Particles” Basic idea: particles are partitioned over PEs, blocks of data are loaded as needed. Positives: Indifferent to data size Trivial parallelization (partition particles over processors) Negative: Redundant I/O (both across PEs and within a PE) is a significant problem.

“Parallelize over data” strategy: parallelize over blocks and pass particles PE1 PE2 PE3 PE4

“Parallelize over Data” Basic idea: data is partitioned over PEs, particles are communicated as needed. Positives: Only load data one time Negative: Starvation!

Contracts are needed to enable these different processing techniques Parallelize-over-seeds One execution per block Only limited data available at one time Parallelize-over-data One execution total Entire data set available at one time

Both parallelization schemes have serious flaws. Two approaches: Parallelizing Over I/O Efficiency Data Good Bad Particles Parallelize over particles over data Hybrid algorithms

The master-slave algorithm is an example of a hybrid technique. Uses “master-slave” model of communication One process has unidirectional control over other processes Algorithm adapts during runtime to avoid pitfalls of parallelize-over-data and parallelize-over-particles. Nice property for production visualization tools. (Implemented in VisIt) D. Pugmire, H. Childs, C. Garth, S. Ahern, G. Weber, “Scalable Computation of Streamlines on Very Large Datasets.” SC09, Portland, OR, November, 2009

Master-Slave Hybrid Algorithm Divide PEs into groups of N Uniformly distribute seed points to each group P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 Slave Master Master: Monitor workload Make decisions to optimize resource utilization Slaves: Respond to commands from Master Report status when work complete

Master Process Pseudocode { while ( ! done ) if ( NewStatusFromAnySlave() ) commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) } What are the possible commands?

Commands that can be issued by master Assign / Loaded Block Assign / Unloaded Block Handle OOB / Load Handle OOB / Send OOB = out of bounds Master Slave Setup nomenclature Slave is given a particle that is contained in a block that is already loaded

Commands that can be issued by master Assign / Loaded Block Assign / Unloaded Block Handle OOB / Load Handle OOB / Send OOB = out of bounds Master Slave Slave is given a particle and loads the block

Commands that can be issued by master Assign / Loaded Block Assign / Unloaded Block Handle OOB / Load Handle OOB / Send OOB = out of bounds Master Slave Load Slave is instructed to load a block. The particle advection in that block can then proceed.

Commands that can be issued by master Assign / Loaded Block Assign / Unloaded Block Handle OOB / Load Handle OOB / Send OOB = out of bounds Master Slave Send to J Slave J Slave is instructed to send the particle to another slave that has loaded the block

Master Process Pseudocode { while ( ! done ) if ( NewStatusFromAnySlave() ) commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) }

Driving factors… You don’t want PEs to sit idle (The problem with parallelize-over-data) You don’t want PEs to spend all their time doing I/O (The problem with parallelize-over-particles)  you want to do “least amount” of I/O that will prevent “excessive” idleness.

Heuristics If no Slave has the block, then load that block! If some Slave does have the block… If that Slave is not busy, then have it process the particle. If that Slave is busy, then wait “a while.” If you’ve waited a “long time,” have another Slave load the block and process the particle.

Master-slave in action Iteration Action S0 reads B0, S3 reads B1 1 S1 passes points to S0, S4 passes points to S3, S2 reads B0 S0 S1 S2 S3 S4 0: Read 1: Pass 1: Read

Algorithm Test Cases Core collapse supernova simulation Magnetic confinement fusion simulation Hydraulic flow simulation

Workload distribution in parallelize-over-data Starvation 3 Procs doing all the work!!

Workload distribution in parallelize-over-particles Too much I/O Each streamline is computed by a single processor Lots of redundant I/O

Workload distribution in master-slave algorithm Just right Good mix of processors working on different parts of the problem

Workload distribution in supernova simulation Parallelization by: Data Particles Hybrid Interesting way to look at workload distribution Colored by PE doing integration

Astrophysics Test Case: Total time to compute 20,000 Streamlines Uniform Seeding Non-uniform Seeding Seconds Seconds COMPARISON Hybrid algorithm outperforms both traditional methods 256M cells 512 blocks Number of procs Number of procs Particles Data Hybrid

Astrophysics Test Case: Number of blocks loaded Uniform Seeding Non-uniform Seeding Blocks loaded Blocks loaded Seed: Lots of redundant I/O Block: Very low Hybrid can adapt to do the right amount of I/O Number of procs Number of procs Particles Data Hybrid

Summary: Master-Slave Algorithm First ever attempt at a hybrid algorithm for particle advection Algorithm adapts during runtime to avoid pitfalls of parallelize-over-data and parallelize-over- particles. Nice property for production visualization tools. Implemented inside VisIt visualization and analysis package.

Final thoughts… Summary: Particle advection is important for understanding flow and efficiently parallelizing this computation is difficult. We have developed a freely available system for doing this analysis for large data. Documentation: (PICS) http://www.visitusers.org/index.php?title=Pics_dev (VisIt) http://www.llnl.gov/visit Future work: UI extensions, including Python Additional analysis techniques (FTLE & more)

Acknowledgements Funding: This work was supported by the Director, Office of Science, Office and Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE- AC02-05CH11231 through the Scientific Discovery through Advanced Computing (SciDAC) program's Visualization and Analytics Center for Enabling Technologies (VACET). Program Manager: Lucy Nowell Master-Slave Algorithm: Dave Pugmire (ORNL), Hank Childs (LBNL/UCD), Christoph Garth (Kaiserslautern), Sean Ahern (ORNL), and Gunther Weber (LBNL) PICS framework: Hank Childs (LBNL/UCD), Dave Pugmire (ORNL), Christoph Garth (Kaiserslautern), David Camp (LBNL/UCD), Allen Sanderson (Univ of Utah)

A Framework for Particle Advection for Very Large Data Thank you!! A Framework for Particle Advection for Very Large Data Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis Sean Ahern, ORNL Gunther Weber, LBNL Allen Sanderson, Univ. of Utah