Download presentation
Presentation is loading. Please wait.
Published byMyles King Modified over 6 years ago
1
Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
Babak Falsafi and David A. Wood University of Wisconsin
2
Some Terminology NUMA CC-NUMA COMA S-COMA Non Uniform Memory Access
Cache Coherent NUMA COMA Cache Only Memory Architecture S-COMA Simple COMA
3
SMP Clusters Approach for large-scale shared memory parallel machines
Directory based cache coherence RAD responsible for remote memory access
4
CC-NUMA First processor causes page fault RAD snoops memory bus
OS Maps Virtual Address to Global Physical address RAD snoops memory bus Block Cache Remote request
5
CC-NUMA References global addresses directly Remote cluster cache
Only holds remote data Another level in cache hierarchy Block cache is small Sensitive to data allocation and placement Good for scientific workloads
6
S-COMA First access causes page fault Hits serviced by local memory
OS initializes page table, RAD translation table and access control tags Hits serviced by local memory Misses detected by RAD Inhibit memory Request data
7
S-COMA Remote data in memory or cache S-COMA Large memory and cache
Allocated/Mapped at page granularity S-COMA OS handles allocation and migration Large memory and cache Fully associative Large page size Requires large granularity spatial locality Possible Thrashing
8
R-NUMA Combine S-COMA and CC-NUMA
Map CC-NUMA pages to Global PA Map S-COMA pages to Local PA Often requires no additional hardware Distinguish 2 types of pages Reuse pages Data used frequently on the same node Communication pages Data exchange between nodes
9
Switching Mechanism Reuse pages Communication pages
Capacity and Conflict Misses S-COMA Communication pages Coherence Misses CC-NUMA Detect refetches of evicted blocks Trivial for read-only blocks in non-notifying protocol (still shared) Additional hardware required for read-write-blocks Count refetches on per-node, per-page basis
10
R-NUMA Figure
11
Qualitative Performance
Analysis of worst case behavior Performance depends on S-COMA resp. CC-NUMA overhead Realistically R-NUMA no more than 3 times worse than vanilla CC-NUMA or S-COME In practice “bound” is much smaller
12
Quantitative Results
13
Conclusions Dynamically react to program behavior
Exploit best caching strategy Per Page basis Worst case performance is bound Quantitative Results indicate R-NUMA usually no worse than best of CC-NUMA and S-COMA If worse, still way better than worst case Never worse than both Less sensitive to relocation threshold or overhead than S-COMA Less sensitive to cache size than CC-NUMA
14
Questions Sounds like a free lunch
Does R-NUMA really require no additional hardware? Dynamically switching always good in research papers What about the practice?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.