Download presentation
Presentation is loading. Please wait.
Published byLynette Benson Modified over 9 years ago
1
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig (koenig@cs.uiuc.edu)koenig@cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign 2004 Charm++ Workshop
2
Problem: Latency Tolerance for Multi-Cluster Applications Goal: Good performance for tightly-coupled applications running across multiple clusters single campus Grid environment Scenarios Very large applications On-demand computing Challenge: Masking the effects of latency on inter-cluster messages Cluster A Cluster B Intra-cluster latency (microseconds) Inter-cluster latency (milliseconds)
4
Solution: Processor Virtualization Charm++ chares and Adaptive MPI threads virtualize the notion of a processor. A programmer decomposes a program into a large number of virtual processors. The adaptive runtime system maps virtual processors onto physical processors; the runtime may adjust this mapping as the program executes (load balancing). If one virtual processor that is mapped to a physical processor cannot make progress, some other virtual processor on the same physical processor may be able to do useful work. No modification of application software or problem-specific tricks are necessary!
5
Hypothetical Timeline View of a Multi-Cluster Computation A B C cross-cluster boundary Processors A and B are on one cluster, Processor C on a second cluster Communication between clusters via high-latency WAN Processor Virtualization allows latency to be masked
6
Charm++ on Virtual Machine Interface (VMI) Message data are passed along VMI “send chain” and “receive chain” Devices on each chain may deliver data directly, manipulate data, and/or pass data to next device Application Charm++ Converse (machine layer) VMI send chain receive chain AMPI
7
Description of Experiments Experimental environment Artificial latency environment: VMI “delay device” adds a pre-defined latency between arbitrary pairs of nodes TeraGrid environment: Experiments run between NCSA and ANL machines (~1.725 ms one-way latency) Experiments Five-point stencil (2D Jacobi) for matrix sizes 2048x2048 and 8192x8192 LeanMD molecular dynamics code running a 30,652 atom system
8
Five-Point Stencil Results (P=2)
9
Five-Point Stencil Results (P=16)
10
Five-Point Stencil Results (P=32)
11
Five-Point Stencil Results (P=64)
12
LeanMD Results
13
Conclusion Processor virtualization is a useful technique for masking latency in grid computing environments. Future Work Testing across NCSA-SDSC Leverage Charm++ prioritized messages Grid-topology-aware load balancer Processor speed normalization Leverage Adaptive MPI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.