Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.

Similar presentations


Presentation on theme: "Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL."— Presentation transcript:

1 Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO. 1 JANUARY 2006 Presented by 張肇烜

2 Outline  Introduction  Design Rationale  Methodology  Results  Analysis  Conclusions

3 Introduction  Large scientific parallel applications demand large amounts of memory space.  Uniform use of resources at the parallel process level does not necessarily mean the system itself is evenly utilized.  To ensure good CPU utilization, we must limit the number of processes in a parallel process.

4 Introduction (cont.)  The key problem is memory usage and this problem has two parts: –Memory fragmentation –Paging overhead  Network RAM has been proposed for use by sequential jobs in clusters to even memory load and reduce paging overhead.

5 Introduction (cont.)  Existing network RAM techniques should not be directly applied to parallel jobs. –Processes from the same parallel job synchronize regularly. –Network congestion.

6 Introduction (cont.)  We propose a new peer-to-peer solution called Parallel Network Ram (PNR) that allows overloaded cluster nodes to utilize idle remote memory.  Each node contacts a manager node and requests that it allocate network RAM on its behalf.

7 Design Rationale  Diagram of Parallel Network-Ram. Application 2 is assigned to nodes of P3, P4, and P5, but utilizes the available memory spaces in other nodes, such as P2, P6, P7.

8 Design Rationale (cont.)  We propose a novel and effective technique called Parallel Network RAM (PNR).  PNR does not coordinate with or receive information from the assumed centralized scheduler of the system.  Managers act as proxies for clients to communicate with servers.

9 Design Rationale (cont.)  We propose four different PNR designs. –Centralized PNR design (CEN) –Client-only PNR design (CLI) –Local manager PNR design (MAN) –Backbone PNR Design (BB)

10 Design Rationale (cont.)  Centralized PNR design (CEN) : It coordinates all client requests.

11 Design Rationale (cont.)  Client-only PNR design (CLI)

12 Design Rationale (cont.)  Local manager PNR design (MAN) : The node volunteer to act as the manager.

13 Design Rationale (cont.)  Backbone PNR Design (BB) :

14 Methodology  We use a large trace collected from the CM- 5 parallel platform at the Los Alamos National Lab.  To directly compare DP to the various PNR designs, we create another metric based on average response time (R) :

15 Methodology (cont.)  Experimental setup:

16 Results  Base experiment-64 nodes and 4000 jobs:

17 Results (cont.)  Base experiment-128 nodes and 4000 jobs:

18 Results (cont.)  Base experiment-64 nodes and 5000 jobs:

19 Results (cont.)  Base experiment-128 nodes and 5000 jobs:

20 Results (cont.)  RAM experiments-64 nodes and 4000 jobs:

21 Results (cont.)  RAM experiments-128 nodes and 4000 jobs:

22 Results (cont.)  RAM experiments-64 nodes and 5000 jobs:

23 Results (cont.)  RAM experiments-128 nodes and 5000 jobs:

24 Results (cont.)  Space sharing experiments-64 nodes and 4000 jobs:

25 Results (cont.)  Space sharing experiments-128 nodes and 4000 jobs:

26 Results (cont.)

27

28

29

30

31 Analysis  PNR is very sensitive to network performance.  The main limiting factor on the space sharing system is network RAM allocation coordination.  Under light load, CLI is the best choice for a space sharing system.  CLI does surprisingly well in certain situations, if RAM is plentiful.

32 Analysis (cont.)  When a high-performance network is available, PNR can produce pronounced performance gains.  For heavily loaded systems, PNR can significantly reduce the response time of jobs as compared to DP.

33 Conclusion  In this paper, we identified a novel way of reducing page fault service time and average response time in a cluster system running parallel processes.  We proposed several different PNR designs and evaluated the performance of each under different condition.


Download ppt "Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL."

Similar presentations


Ads by Google