Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO. 1 JANUARY 2006 Presented by 張肇烜
Outline Introduction Design Rationale Methodology Results Analysis Conclusions
Introduction Large scientific parallel applications demand large amounts of memory space. Uniform use of resources at the parallel process level does not necessarily mean the system itself is evenly utilized. To ensure good CPU utilization, we must limit the number of processes in a parallel process.
Introduction (cont.) The key problem is memory usage and this problem has two parts: –Memory fragmentation –Paging overhead Network RAM has been proposed for use by sequential jobs in clusters to even memory load and reduce paging overhead.
Introduction (cont.) Existing network RAM techniques should not be directly applied to parallel jobs. –Processes from the same parallel job synchronize regularly. –Network congestion.
Introduction (cont.) We propose a new peer-to-peer solution called Parallel Network Ram (PNR) that allows overloaded cluster nodes to utilize idle remote memory. Each node contacts a manager node and requests that it allocate network RAM on its behalf.
Design Rationale Diagram of Parallel Network-Ram. Application 2 is assigned to nodes of P3, P4, and P5, but utilizes the available memory spaces in other nodes, such as P2, P6, P7.
Design Rationale (cont.) We propose a novel and effective technique called Parallel Network RAM (PNR). PNR does not coordinate with or receive information from the assumed centralized scheduler of the system. Managers act as proxies for clients to communicate with servers.
Design Rationale (cont.) We propose four different PNR designs. –Centralized PNR design (CEN) –Client-only PNR design (CLI) –Local manager PNR design (MAN) –Backbone PNR Design (BB)
Design Rationale (cont.) Centralized PNR design (CEN) : It coordinates all client requests.
Design Rationale (cont.) Client-only PNR design (CLI)
Design Rationale (cont.) Local manager PNR design (MAN) : The node volunteer to act as the manager.
Design Rationale (cont.) Backbone PNR Design (BB) :
Methodology We use a large trace collected from the CM- 5 parallel platform at the Los Alamos National Lab. To directly compare DP to the various PNR designs, we create another metric based on average response time (R) :
Methodology (cont.) Experimental setup:
Results Base experiment-64 nodes and 4000 jobs:
Results (cont.) Base experiment-128 nodes and 4000 jobs:
Results (cont.) Base experiment-64 nodes and 5000 jobs:
Results (cont.) Base experiment-128 nodes and 5000 jobs:
Results (cont.) RAM experiments-64 nodes and 4000 jobs:
Results (cont.) RAM experiments-128 nodes and 4000 jobs:
Results (cont.) RAM experiments-64 nodes and 5000 jobs:
Results (cont.) RAM experiments-128 nodes and 5000 jobs:
Results (cont.) Space sharing experiments-64 nodes and 4000 jobs:
Results (cont.) Space sharing experiments-128 nodes and 4000 jobs:
Results (cont.)
Analysis PNR is very sensitive to network performance. The main limiting factor on the space sharing system is network RAM allocation coordination. Under light load, CLI is the best choice for a space sharing system. CLI does surprisingly well in certain situations, if RAM is plentiful.
Analysis (cont.) When a high-performance network is available, PNR can produce pronounced performance gains. For heavily loaded systems, PNR can significantly reduce the response time of jobs as compared to DP.
Conclusion In this paper, we identified a novel way of reducing page fault service time and average response time in a cluster system running parallel processes. We proposed several different PNR designs and evaluated the performance of each under different condition.