Download presentation
Presentation is loading. Please wait.
Published byLetitia Barrett Modified over 6 years ago
1
Parallel Objects: Virtualization & In-Process Components
Orion Sky Lawlor Univ. of Illinois at Urbana-Champaign POHLL-2002
2
Introduction Parallel Programming is hard: Communication takes time
Message startup cost Bandwidth & contention Synchronization, race conditions Parallelism breaks abstractions Flatten data structures Hand off control between modules Harder than serial programming
3
Motivation Parallel Applications are either: Embarrassingly Parallel
Trivial, 1 RA-week effort E.g. Monte Carlo, parameter sweep, Communication totally irrelevant to performance
4
Motivation Parallel Applications are either: Embarrassingly Parallel
Excruciatingly Parallel Massive, 1+ RA-year effort E.g. “Pure” MPI codes ≥10k lines Communication, synchronization totally determine performance
5
Motivation Parallel Applications are either: Embarrassingly Parallel
Excruciatingly Parallel “We’ll be done in 6 months…” Several parallel libraries & codes & groups, dynamic & adaptive E.g. Multiphysics simulation
6
Serial Solution: Abstract!
Build layers of software High-level: Libc, C++ STL, … Mid-level: OS Kernel Silently schedule processes Keep CPU busy even when some processes block Allows a process to ignore other processes Low-level: assembler
7
Parallel Solution: Abstract!
Middle layers are missing High-level: ScaLAPACK, POOMA.. Mid-level: ? Kernel Silently schedule components Keep CPU busy even when some components block Allows a component to ignore other components Low-level: MPI
8
The missing middle layer:
Provides dynamic computation and communication overlap, even across separate modules Handles inter-module handoff Pipelines communication Improves cache utilization—smaller components Provides nice layer for advanced features, like process migration
9
Examples: Multiprogramming
10
Examples: Pipelining
11
Middle Layer: Implementation
Real OS processes/threads Robust, reliable, implemented High performance penalty No parallel features (migration!) Converse/Charm++ In-process components: efficient Piles of advanced features AMPI, MPI interface to Charm Application Framework
12
Charm++ Parallel library for Object-Oriented C++ applications
Messaging via method calls Communication “proxy” objects Methods called by scheduler System determines who runs next Multiple objects per processor Object migration fully supported Even with broadcasts, reductions
13
Mapping Work to Processors
System implementation User View
14
AMPI MPI interface, implemented on Charm++
Multiple “virtual processors” per physical processor Implemented as user-level threads Very fast context switching MPI_Recv only blocks virtual processor, not physical All the benefits of Charm++
15
Application Frameworks
Domain-specific interfaces: unstructured grids, structured grids, particle-in-cell Provide natural interface to application scientists (Fortran!) “Encapsulate” communication Built on Charm++ Most popular interfaces to Charm++
16
Charm++ Features: Migration
Automatic load balancing Balance load by migrating objects Application-independent Built-in data collection (cpu, net) Pluggable “strategy” modules Adaptive Job Scheduler Shrink/expand parallel job, by migrating objects Dramatic utilization improvment
17
Examples: Load Balancing
1. Adaptive Refinement 3. Chunks Migrated 2. Load Balancer Invoked
18
Examples: Expanding Job
19
Examples: Virtualization
20
Conclusions Parallel applications need something like a “kernel”
Neutral party to mediate CPU use Significant utilization gains Easy to put good tools in kernel Work migration support Load balancing Consider using Charm++
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.