Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reactive NUMA A Design for Unifying S-COMA and CC-NUMA

Similar presentations


Presentation on theme: "Reactive NUMA A Design for Unifying S-COMA and CC-NUMA"— Presentation transcript:

1 Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
Babak Falsafi and David A. Wood University of Wisconsin

2 Some Terminology NUMA CC-NUMA COMA S-COMA Non Uniform Memory Access
Cache Coherent NUMA COMA Cache Only Memory Architecture S-COMA Simple COMA

3 SMP Clusters Approach for large-scale shared memory parallel machines
Directory based cache coherence RAD responsible for remote memory access

4 CC-NUMA First processor causes page fault RAD snoops memory bus
OS Maps Virtual Address to Global Physical address RAD snoops memory bus Block Cache Remote request

5 CC-NUMA References global addresses directly Remote cluster cache
Only holds remote data Another level in cache hierarchy Block cache is small Sensitive to data allocation and placement Good for scientific workloads

6 S-COMA First access causes page fault Hits serviced by local memory
OS initializes page table, RAD translation table and access control tags Hits serviced by local memory Misses detected by RAD Inhibit memory Request data

7 S-COMA Remote data in memory or cache S-COMA Large memory and cache
Allocated/Mapped at page granularity S-COMA OS handles allocation and migration Large memory and cache Fully associative Large page size Requires large granularity spatial locality Possible Thrashing

8 R-NUMA Combine S-COMA and CC-NUMA
Map CC-NUMA pages to Global PA Map S-COMA pages to Local PA Often requires no additional hardware Distinguish 2 types of pages Reuse pages Data used frequently on the same node Communication pages Data exchange between nodes

9 Switching Mechanism Reuse pages Communication pages
Capacity and Conflict Misses S-COMA Communication pages Coherence Misses CC-NUMA Detect refetches of evicted blocks Trivial for read-only blocks in non-notifying protocol (still shared) Additional hardware required for read-write-blocks Count refetches on per-node, per-page basis

10 R-NUMA Figure

11 Qualitative Performance
Analysis of worst case behavior Performance depends on S-COMA resp. CC-NUMA overhead Realistically R-NUMA no more than 3 times worse than vanilla CC-NUMA or S-COME In practice “bound” is much smaller

12 Quantitative Results

13 Conclusions Dynamically react to program behavior
Exploit best caching strategy Per Page basis Worst case performance is bound Quantitative Results indicate R-NUMA usually no worse than best of CC-NUMA and S-COMA If worse, still way better than worst case Never worse than both Less sensitive to relocation threshold or overhead than S-COMA Less sensitive to cache size than CC-NUMA

14 Questions Sounds like a free lunch
Does R-NUMA really require no additional hardware? Dynamically switching always good in research papers What about the practice?


Download ppt "Reactive NUMA A Design for Unifying S-COMA and CC-NUMA"

Similar presentations


Ads by Google