Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parallel Applications 15-740 Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

Similar presentations


Presentation on theme: "1 Parallel Applications 15-740 Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002."— Presentation transcript:

1 1 Parallel Applications 15-740 Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002

2 2 Scaling Parallel Applications Ning Hu, Stefan Niculescu, Vahe Poladian

3 3 The Question  Can distributed shared memory, cache coherent, non-uniform memory access architectures scale on parallel apps?  What do we mean by scale:  Achieve parallel efficiency of 60%,  For a fixed problem size,  Increasing the number of processors,

4 4 DSM, cc-NUMA  Each processor has private cache,  Shared address space constructed from “public” memory of each processor,  Loads / stores used to access memory,  Hardware ensures cache coherence,  Non-uniform: miss penalty for remote data higher,  SGI Origin2000 chosen as an aggressive representative in this architectural family,

5 5 Origin 2000 overview  Nodes placed as vertices of hypercubes:  Ensures that communication latency grows linearly, as number of nodes doubles,  Each node is dual 195 MHz proc, with own 32KB 1 st level cache, 4 MB second level cache  Total addressable memory is 32GB  Most aggressive in terms of remote to local memory access latency ratio

6 6 Benchmarks  SPLASH-2: Barnes-Hut, Ocean, Radix Sort, etc,  3 new: Shear Warp, Infer, and Protein,  Range of communication-to-computation ratio, temporal and spatial locality,  Initial sizes of problems determined from earlier experiments:  Simulation with 256 processors,  Implementation with 32 processors,

7 7 Initial Experiments

8 8 Avg. Exec. Time Breakdown

9 9 Problem Size  Idea:  increase problem size until desired level of efficiency is achieved,  Question:  Feasible?  Question:  Even if feasible, is it desirable?

10 10 Changing Problem Size

11 11 Why problem size helps  Communication to computation ratio improved  Less load imbalance, both in computation and communication costs  Less waiting in synch  Superlienarity effects of cache size  Helps larger processor counts,  Hurts smaller processor counts  Less false sharing

12 12 Application Restructuring  What kind of restructuring:  Algorithmic changes, data partitioning  Ways restructuring helps:  Reduced communication,  Better data placement,  Static partitioning for better load balance,  Restructuring is app specific and complex,  Bonus side-effect:  Scale well on Shared Virtual Memory (clustered workstations) systems,

13 13 App Restructuring

14 14 Conclusions  Original versions not scalable on cc-NUMA  Simulation not accurate for quantitative results; implementation needed  Increasing size a poor solution  App restructuring works:  Restructured apps perform well also on SVM,  Parallel efficiency of these versions better  However, to validate results, good idea to run restructured apps on larger number of processors

15 15 STOP


Download ppt "1 Parallel Applications 15-740 Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002."

Similar presentations


Ads by Google