Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig (koenig@cs.uiuc.edu)koenig@cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign 2004 Charm++ Workshop

Problem: Latency Tolerance for Multi-Cluster Applications  Goal: Good performance for tightly-coupled applications running across multiple clusters single campus Grid environment  Scenarios Very large applications On-demand computing  Challenge: Masking the effects of latency on inter-cluster messages Cluster A Cluster B Intra-cluster latency (microseconds) Inter-cluster latency (milliseconds)

Solution: Processor Virtualization  Charm++ chares and Adaptive MPI threads virtualize the notion of a processor.  A programmer decomposes a program into a large number of virtual processors.  The adaptive runtime system maps virtual processors onto physical processors; the runtime may adjust this mapping as the program executes (load balancing).  If one virtual processor that is mapped to a physical processor cannot make progress, some other virtual processor on the same physical processor may be able to do useful work.  No modification of application software or problem-specific tricks are necessary!

Hypothetical Timeline View of a Multi-Cluster Computation A B C cross-cluster boundary  Processors A and B are on one cluster, Processor C on a second cluster  Communication between clusters via high-latency WAN  Processor Virtualization allows latency to be masked

Charm++ on Virtual Machine Interface (VMI)  Message data are passed along VMI “send chain” and “receive chain”  Devices on each chain may deliver data directly, manipulate data, and/or pass data to next device Application Charm++ Converse (machine layer) VMI send chain receive chain AMPI

Description of Experiments  Experimental environment Artificial latency environment: VMI “delay device” adds a pre-defined latency between arbitrary pairs of nodes TeraGrid environment: Experiments run between NCSA and ANL machines (~1.725 ms one-way latency)  Experiments Five-point stencil (2D Jacobi) for matrix sizes 2048x2048 and 8192x8192 LeanMD molecular dynamics code running a 30,652 atom system

Five-Point Stencil Results (P=2)

LeanMD Results

Conclusion  Processor virtualization is a useful technique for masking latency in grid computing environments.  Future Work Testing across NCSA-SDSC Leverage Charm++ prioritized messages Grid-topology-aware load balancer Processor speed normalization Leverage Adaptive MPI

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

Similar presentations

Presentation on theme: "Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

Similar presentations

Presentation on theme: "Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department."— Presentation transcript:

Similar presentations

About project

Feedback