Download presentation
Presentation is loading. Please wait.
Published byJordan Owens Modified over 8 years ago
1
1 Parallel Applications 15-740 Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002
2
2 Scaling Parallel Applications Ning Hu, Stefan Niculescu, Vahe Poladian
3
3 The Question Can distributed shared memory, cache coherent, non-uniform memory access architectures scale on parallel apps? What do we mean by scale: Achieve parallel efficiency of 60%, For a fixed problem size, Increasing the number of processors,
4
4 DSM, cc-NUMA Each processor has private cache, Shared address space constructed from “public” memory of each processor, Loads / stores used to access memory, Hardware ensures cache coherence, Non-uniform: miss penalty for remote data higher, SGI Origin2000 chosen as an aggressive representative in this architectural family,
5
5 Origin 2000 overview Nodes placed as vertices of hypercubes: Ensures that communication latency grows linearly, as number of nodes doubles, Each node is dual 195 MHz proc, with own 32KB 1 st level cache, 4 MB second level cache Total addressable memory is 32GB Most aggressive in terms of remote to local memory access latency ratio
6
6 Benchmarks SPLASH-2: Barnes-Hut, Ocean, Radix Sort, etc, 3 new: Shear Warp, Infer, and Protein, Range of communication-to-computation ratio, temporal and spatial locality, Initial sizes of problems determined from earlier experiments: Simulation with 256 processors, Implementation with 32 processors,
7
7 Initial Experiments
8
8 Avg. Exec. Time Breakdown
9
9 Problem Size Idea: increase problem size until desired level of efficiency is achieved, Question: Feasible? Question: Even if feasible, is it desirable?
10
10 Changing Problem Size
11
11 Why problem size helps Communication to computation ratio improved Less load imbalance, both in computation and communication costs Less waiting in synch Superlienarity effects of cache size Helps larger processor counts, Hurts smaller processor counts Less false sharing
12
12 Application Restructuring What kind of restructuring: Algorithmic changes, data partitioning Ways restructuring helps: Reduced communication, Better data placement, Static partitioning for better load balance, Restructuring is app specific and complex, Bonus side-effect: Scale well on Shared Virtual Memory (clustered workstations) systems,
13
13 App Restructuring
14
14 Conclusions Original versions not scalable on cc-NUMA Simulation not accurate for quantitative results; implementation needed Increasing size a poor solution App restructuring works: Restructured apps perform well also on SVM, Parallel efficiency of these versions better However, to validate results, good idea to run restructured apps on larger number of processors
15
15 STOP
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.