Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin, Madison, 1997 Presented by: Jie Xiao Feb 6, 2008
Outline: Introduction CC-NUMA, S-COMA, R-NUMA Theoretical Results Simulation Results Pros & Cons
Introduction DSM clusters Remote misses latency > Local misses latency Looking for the best remote caching strategy!
Introduction Looking for the best remote caching strategy! Solutions: CC-NUMA: Cache-coherent Non-Uniform Memory Access S-COMA: Simple Cache-Only Memory Architecture Our approach: R-NUMA: Reactive NUMA
CC-NUMA block cache: small & fast
S-COMA page cache: sufficiently large (part of the local node’s main memory) page granularity OS handles allocation and migration
CC-NUMA vs S-COMA Looking for the best remote caching strategy! Which one is better? Answer: Depends on the application! (1) Communication pages (2) Reuse pages
R-NUMA Dynamically switching from CC-NUMA to S-COMA Refetch times: per-node, per-page (hardware: counter) Each node to independently choose the best protocol for a particular page Greater performance stability Not much extra hardware
R-NUMA CC-NUMA R-NUMA S-COMA
R-NUMA CC-NUMA S-COMA
Theoretical Results Worst case analysis: R-NUMA performs no more than 3 times worse than either a CC-NUMA or S-COMA.
Simulation Results Base line: CC-NUMA: infinite block cache CC-NUMA: 32 KB block cache S-COMA: 320KB page cache R-NUMA: 128B block cache, 320KB page cache, relocation threshold 64
Simulation Results
R-NUMA is only sensitive to block cache size for applications whose reuse working set does not fit in the page cache (e.g. ocean) A large fraction of reuse pages in an application favor a smaller threshold value (e.g. choleshy, fmm, lu and ocean) R-NUMA is not very sensitive to page-fault and TLB invalidation overheads
Pros & Cons Pros + Flexible: per-page per-node + Exploit the best remote caching strategy without much extra work Cons - Threshold: 64? Change according to the applications?