Download presentation
Presentation is loading. Please wait.
Published byGyles Ray Modified over 9 years ago
1
Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures Per Stenstrom, Truman Joe and Anoop Gupta Presented by Colleen Lewis
2
Overview Common Features CC-NUMA COMA Cache Misses Performance Expectations Simulation & Results COMA-F
3
Common Features CC-NUMA DASHAlewife COMA DDMKSR1 Large-scale multiprocessors Single address space Distributed main memory Directory-based cache coherence Scalable interconnection network Examples:
4
Cache-Coherent Non-Uniform-Memory-Access Machines Network independent Write-invalidate cache coherence protocol 2 hop miss 3 hop miss CC-NUMA
5
COMA Cache-Only Memory Architectures Attraction memory – per-node memory acts as secondary/tertiary cache Data is distributed and mobile Directory is dynamically distributed in a hierarchy Combining – can optimize multiple reads –LU - 47%, Barnes Hut - 6%, remaining < 1% Reduces the average cache latency Increased overhead for directory structure COMA
6
Cache Misses Cold miss Capacity miss Coherence miss Which architecture has lower latency? CC-NUMACOMA
7
Figure 1
8
Performance Expectations Application Characteristics Low Miss Rates High Miss Rates Mostly Coherence Misses Mostly Capacity Misses Coarse Grained Data Access Fine Grained Data Access CC-NUMACOMA
9
Simulation 16 processors Cache lines = 16 bytes Cache size of 4 Kbytes –(Small – to force capacity misses)
10
Results
11
MP3D – Particle-based wind tunnel simulation PTHOR – Distributed-time logic simulation LocusRoute – VLSI standard cell router Water – Molecular dynamics code: Water Cholesky – Cholesky factorization of sparse matrix LU – LU decomposition of dense matrix Barnes-Hut – N-body problem solver O(NlogN) Ocean – Ocean basin simulation CC-NUMACOMA
12
Page Migration – Page Size Introduces additional overhead Node hit rate increases as page size decreases –Reduces false sharing –Fewer pages accessed by multiple processors Likely won’t work if data chunks are much smaller than pages (example - LU) NUMA-M performs better for Cholesky
13
Initial Placement Implemented as page migration with a max of 1 time that a page can be migrated LU does significantly better Ocean does the same for single vs. multiple migrations Requires increased work for compiler and programmer
14
Cache Size/Network Variations Cache Size Variations –Increasing the cache size causes coherence misses to dominate –With 64KB cache, CC-NUMA (without migration) is better for everything except Ocean. Network Latency Variations –Even with aggressive implementations of directory structure, COMA can’t compensate in applications with significant coherence miss rate
15
COMA-F Data directory information has a home node (CC-NUMA) Supports replication and migration of data blocks (COMA-H) Attempts to reduce the coherence miss penalty
16
Conclusion Application Characteristics Low Miss Rates High Miss Rates Mostly Coherence Misses Mostly Capacity Misses Coarse Grained Data Access Fine Grained Data Access CC-NUMACOMA CC-NUMA and COMA perform well for different application characteristics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.