Presentation is loading. Please wait.

Presentation is loading. Please wait.

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000.

Similar presentations

Presentation on theme: "A comparison of CC-SAS, MP and SHMEM on SGI Origin2000."— Presentation transcript:

1 A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

2 Three Programming Models  CC-SAS –Linear address space for shared memory  MP –Communicate with other processes explicitly via message passing interface  SHMEM –Via get and put primitives

3 Platforms:  Tightly-coupled multiprocessors –SGI Origin2000: a cache-coherent distributed shared memory machine  Less tightly-coupled clusters –A cluster of workstations connected by ethernet

4 Purpose  Compare the three programming models on Origin2000, a modern 64-processor hardware cache-coherent machine –We focus on scientific applications that access data regularly or predictably.

5 Questions to be answered  Can parallel algorithms be structured in the same way for good performance in all three models?  If there are substantial differences in performance under three models, where are the key bottlenecks?  Do we need to change the data structures or algorithms substantially to solve those bottlenecks?

6 Applications and Algorithms  FFT –All-to-all communication(regular)  Ocean –Nearest-neighbor communication  Radix –All-to-all communication(irregular)  LU –One-to-many communication

7 Performance Result

8 question:  Why MP is much worse than CC-SAS and SHMEM?

9 Analysis: Execution time = BUSY + LMEM + RMEM + SYNC where BUSY: CPU computation time LMEM: CPU stall time for local cache miss RMEM: CPU stall time for sending/receiving remote data SYNC: CPU time spend at synchronization events

10 Where does the time go in MP?

11 Improving MP performance  Remove extra data copy –Allocate all data involved in communication in shared address space  Reduce SYNC time –Use lock-free queue management instead in communication

12 Speedups under Improved MP

13 Why does CC-SAS perform best?

14  Extra packing/unpacking operation in MP and SHMEM  Extra packet queue management in MP  …

15 Speedups for Ocean

16 Speedups for Radix

17 Speedups for LU

18 Conclusions  Good algorithm structures are portable among programming models.  MP is much worse than CC-SAS and SHMEM under hardware-coherent machine. However, we can achieve similar performance if extra data copy and queue synchronization are well solved.  Something about programmability

19 Future work  How about those applications that indeed have irregular, unpredictable and naturally fine-grained data access and communication patterns?  How about software-based coherent machines (i.e. clusters)?

Download ppt "A comparison of CC-SAS, MP and SHMEM on SGI Origin2000."

Similar presentations

Ads by Google