Download presentation
Presentation is loading. Please wait.
Published byRandall Cole Modified over 6 years ago
1
Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA
Babak Falsafi and David A. Wood Computer Science Department University of Wisconsin, Madison Presented by Anita Lungu February 17, 2006
2
Context and Motivation
Large-Scale Distributed Shared Memory parallel machines Directory coherence between SMPs Local access fast / Remote access slow Problem: Hide remote memory access latency Solutions: Cache Coherent NUMA (CC-NUMA) Best when: coherency misses dominate Simple Cache Only Memory Architectures (S-COMA) Best when: capacity misses dominate Opportunity: Hybrid: R-NUMA = CC-NUMA + S-COMA Support both: dynamically select protocol for each page Better performance than each separately =>Best of both worlds
3
CC-NUMA Data elements: home node allocated Remote cluster cache
Keeps only remote data Block level granularity Small & fast (SRAM) Can be larger & slower (DRAM) Data elements: home node allocated Advantage when: Remote working set fits in small block cache Mostly coherence misses Disadvantage when: Many data accesses are remote
4
S-COMA Distributed main memory = 2nd level cache for remote data
Data elements: NO home node Allocation and mapping Page granularity (Software) Standard Virtual Address Translation hardware Coherence Block granularity (Hardware) Extra hardware: Access control tags 2 bits/block, trigger to inhibit memory Auxiliary SRAM translation table Convert Local Physical Pages<->Global Physical Pages (home) Advantage when: Mostly capacity/cold misses Remote data is reused often
5
R-NUMA Classify remote pages: Default all pages to CC-NUMA
Reuse: accessed many times by a node Communication: Used to communicate data between nodes Default all pages to CC-NUMA Dynamically change page to S-COMA Threshold: #remote capacity/conflict misses per page (in block cache) Per node decision
6
Qualitative Performance
Worst case scenario Page relocated from block cache (CC-NUMA) to memory (S-COMA) and not referenced again Worst case performance Depends on cost of relocation (change page from CC-NUMA to S-COMA) relative to cost of page allocation R-NUMA can be 3x worse than either CC-NUMA or S-COMA But… Threshold for optimal worst case performance <> threshold for optimal average performance
7
Base System Results Best case: Worst case:
R-NUMA reduces execution time by 37% Worst case: R-NUMA increases execution time by 57% CC-NUMA can be 179% worse than S-COMA S-COMA can be 315% worse that CC-NUMA
8
Sensitivity Results 2 1 3 1. S-COMA and R-NUMA sensitivity to page-fault and TLB invalidation overhead 2. R-NUMA sensitivity to relocation threshold value 3. CC-NUMA and R-NUMA sensitivity to cache size
9
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.