Parallel Objects: Virtualization & In-Process Components

Parallel Objects: Virtualization & In-Process Components
Orion Sky Lawlor Univ. of Illinois at Urbana-Champaign POHLL-2002

Introduction Parallel Programming is hard: Communication takes time
Message startup cost Bandwidth & contention Synchronization, race conditions Parallelism breaks abstractions Flatten data structures Hand off control between modules Harder than serial programming

Motivation Parallel Applications are either: Embarrassingly Parallel
Trivial, 1 RA-week effort E.g. Monte Carlo, parameter sweep, Communication totally irrelevant to performance

Excruciatingly Parallel Massive, 1+ RA-year effort E.g. “Pure” MPI codes ≥10k lines Communication, synchronization totally determine performance

Excruciatingly Parallel “We’ll be done in 6 months…” Several parallel libraries & codes & groups, dynamic & adaptive E.g. Multiphysics simulation

Serial Solution: Abstract!
Build layers of software High-level: Libc, C++ STL, … Mid-level: OS Kernel Silently schedule processes Keep CPU busy even when some processes block Allows a process to ignore other processes Low-level: assembler

Parallel Solution: Abstract!
Middle layers are missing High-level: ScaLAPACK, POOMA.. Mid-level: ? Kernel Silently schedule components Keep CPU busy even when some components block Allows a component to ignore other components Low-level: MPI

The missing middle layer:
Provides dynamic computation and communication overlap, even across separate modules Handles inter-module handoff Pipelines communication Improves cache utilization—smaller components Provides nice layer for advanced features, like process migration

Examples: Multiprogramming

Examples: Pipelining

Middle Layer: Implementation
Real OS processes/threads Robust, reliable, implemented High performance penalty No parallel features (migration!) Converse/Charm++ In-process components: efficient Piles of advanced features AMPI, MPI interface to Charm Application Framework

Charm++ Parallel library for Object-Oriented C++ applications
Messaging via method calls Communication “proxy” objects Methods called by scheduler System determines who runs next Multiple objects per processor Object migration fully supported Even with broadcasts, reductions

Mapping Work to Processors
System implementation User View

AMPI MPI interface, implemented on Charm++
Multiple “virtual processors” per physical processor Implemented as user-level threads Very fast context switching MPI_Recv only blocks virtual processor, not physical All the benefits of Charm++

Application Frameworks
Domain-specific interfaces: unstructured grids, structured grids, particle-in-cell Provide natural interface to application scientists (Fortran!) “Encapsulate” communication Built on Charm++ Most popular interfaces to Charm++

Charm++ Features: Migration
Automatic load balancing Balance load by migrating objects Application-independent Built-in data collection (cpu, net) Pluggable “strategy” modules Adaptive Job Scheduler Shrink/expand parallel job, by migrating objects Dramatic utilization improvment

Examples: Load Balancing
1. Adaptive Refinement 3. Chunks Migrated 2. Load Balancer Invoked

Examples: Expanding Job

Examples: Virtualization

Conclusions Parallel applications need something like a “kernel”
Neutral party to mediate CPU use Significant utilization gains Easy to put good tools in kernel Work migration support Load balancing Consider using Charm++

Parallel Objects: Virtualization & In-Process Components

Similar presentations

Presentation on theme: "Parallel Objects: Virtualization & In-Process Components"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Objects: Virtualization & In-Process Components

Similar presentations

Presentation on theme: "Parallel Objects: Virtualization & In-Process Components"— Presentation transcript:

Similar presentations

About project

Feedback