Parallel Simulation of Continuous Systems: A Brief Introduction Oct. 19, 2005 CS6236 Lecture
Background Sample applications of continuous systems Computer simulations Sample applications of continuous systems Civil engineering: building construction Aerospace engineering: aircraft design Mechanical engineering: machining Systems biology: heart simulations Computer engineering: semiconductor simulations Discrete models Continuous models
Outline Mathematical models and methods Parallel algorithm methodology Some active research areas
Mathematical Models Ordinary/partial differential equations Laplace equation: Heat (diffusion) equation: Steady-state v.s. time-dependent Convert into discrete problem through numerical discretization Finite difference methods: structured grids Finite element methods: local basis functions Spectral methods: global basis functions Finite volume methods: conservation
Example: 1-D Laplace Equation Laplace equation in one dimension with boundary conditions Finite difference approximation with Jacobi iteration
Example: 2-D Laplace Equation Laplace equation in two dimension with boundary conditions at four sides
Parallel Programming Model Parallel computation: two or more tasks executing concurrently Task encapsulates sequential program and local memory Tasks can be mapped to processors in various ways, including multiple tasks per processor
Performance Considerations Load balance: work divided evenly Concurrency: work done simultaneously Overhead: work not present in serial computation Communication Synchronization Redundant work Speculative work
Example: 1-D Laplace Equation Define n tasks, one for each yi Program for task i, i=1,…,n Initialize yi for k=1,… if i>1, send yi to task i-1 if i<n, send yi to task i+1 if i<n, recv yi+1 from task i+1 if i>1, recv yi-1 from task i-1 yi = (yi-1+yi+1)/2 end
Design Methodology Partition (Decomposition): decompose problem into fine-grained tasks to maximize potential parallelism Communication: determine communication pattern among tasks Agglomeration: combine into coarser-grained tasks, if necessary, to reduce communication requirements or other costs Mapping: assign tasks to processors, subject to tradeoff between communication cost and concurrency
Design Methodology
Types of Partitioning Domain decomposition: partition data Example: grid points in 1-, 2-, or 3-D mesh Functional decomposition: partition computation Example: components in climate model (atmosphere, ocean, land, etc.)
Example: Domain Decomposition 3-D mesh can be partitioned along any combination of one, two, or all three of its dimensions
Partitioning Checklist Identify at least an order of magnitude more tasks than processors in target parallel system Avoid redundant computation or storage Make tasks reasonably uniform in size Number of tasks, rather than size of each task, should grow as problem size increases
Communication Issues Latency and bandwidth Routing and switching Contention, flow control, and aggregate bandwidth Collective communication One-to-many: broadcast, scatter Many-to-one: gather, reduction, scan All-to-all Barrier
Communication Checklist Communication should be reasonably uniform across tasks in frequency and volume As localized as possible Concurrent Overlapped with computation, if possible Not inhibiting concurrent execution of tasks
Agglomeration Communication is proportional to surface area of subdomain, whereas computation is proportional to volume of subdomain Higher-dimensional decompositions have more favorable communication-to-computation ratio Increasing task sizes reduces communication but also reduces potential concurrency and flexibility
Surface-to-Volume Ratio
Example: Agglomeration Define p tasks, each with n/p of yi’s Program for task j, j=1,...p initialize yl,...,yh for k=1,... if j>1, send yl to task j-1 if j<p, send yh to task j+1 if j<p, recv yh+1 from task j+1 if j>1, recv yl-1 from task j-1 for i=l to h zi = (yi-1+yi+1)/2 end y = z end
Example: Overlap Comm/Comp Program for task j, j=1,...p initialize yl,...,yh for k=1,... if j>1, send yl to task j-1 if j<p, send yh to task j+1 for i=l+1 to h-1 zi = (yi-1+yi+1)/2 end if j<p, recv yh+1 from task j+1 zh = (yh-1+yh+1)/2 if j>1, recv yl-1 from task j-1 zl = (yl-1+yl+1)/2 y = z end
Mapping Two basic strategies for assigning tasks to processors: Place tasks that can execute concurrently on different processors Place tasks that communicate frequently on same processor Problem: These two strategies often conflict In general, finding optimal solution to this tradeoff is NP-complete, so heuristics are used to find reasonable compromise Dynamic vs static strategies
Mapping Issues Partitioning Granularity Mapping Scheduling Load balancing Particularly challenging for irregular problems Some software tools: Metis, Chaco, Zoltan, etc.
Example: Atmosphere Model Partitioning grid points in 3-D finite difference model Typically yields 105 to 107 tasks Communication 9-point stencil horizontally and 3-point stencil vertically Physics computations in vertical columns Global operations to compute total mass
Example: Atmosphere Model
Other Equations Heat (diffusion) equation: Laplace equation: Advection equation: Wave equation: Classification of second-order equations Parabolic, hyperbolic, and elliptic Methods for time-dependent equations Explicit v.s. implicit Finite-difference, finite-volume, finite-element
CFL Condition for Stability Necessary condition named after Courant, Friedrichs, and Lewy Computational domain of dependence must contain physical domain of dependence Implies time step must satisfy
Active Research Areas DES of continuous systems
Active Research Areas Coupling of different physics Load balancing Different mathematical models Continuous v.s. discrete techniques Load balancing Manager-worker model Irregular/unstructured problems Dynamic load balancing
Summary Mathematical models for continuous systems Ordinary and partial differential equations Finite difference, finite volume, and finite element Parallel algorithm design Partitioning Communication Agglomeration Mapping Active research areas
References I. T. Foster, Designing and Building Parallel Programs, Addison-Wesley, 1995 A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, 2nd. ed., Addison-Wesley, 2003 M. J. Quinn, Parallel Computing: Theory and Practice, McGraw-Hill, 1994 K. M. Chandy and J. Misra, Parallel Program Design: A Foundation, Addison-Wesley, 1988