Download presentation
Presentation is loading. Please wait.
Published byKristina Miles Modified over 8 years ago
1
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos
2
2 Introduction Shared Memory multiprocessors – Enterprise servers, Top500 supercomputers Shared Memory paradigm – Producer – Consumer relationships – Suffer from Remote misses Solutions – high performance interconnects, – sophisticated latency hiding techniques – effective caching/coherence mechanisms
3
3 Motivation Update vs. invalidate protocols – too much coherent traffic Adaptive protocols to optimize for migratory sharing – Identify dynamic sharing during execution – Adapt the coherence protocol
4
4 The Problem – 3 hops latency
5
5 Basic Idea (1/2) Directory delegation – Identify shared blocks – Producer node becomes the Home-node – Consumers send requests directly to producer Decrease latency – Each read/write access completes after 2 hops
6
6 Basic Idea (2/2) The producer node can identify sharers – Sharer nodes stored in the directory – Speculate that new data will be requested – Forward new data to sharers Similar to… – prefetching – last write prediction
7
7 Architecture
8
8 RAC == Remote Access Cache In the past – Eliminate Remote misses caused by small & low associative caches – Not a problem today In this work – A location to push data at a remote node – Location to store delegated blocks – Victim cache (for remote misses) as before
9
9 Sharing Pattern Detection Track access history only for frequently used blocks – Directory entries reside in the directory cache Keep saturating counter per directory entry – last_writer id 4 bits – reader_count 2 bits – write_repeat 2 bits
10
10 Producer/Consumer Tables Maintain state for blocks that don’t reside at home node – Producer table: current node serves as a producer for some cache blocks – Consumer table: current node is interested in some blocks found in a corresponding producer node
11
11 Delegate - Undelegate
12
12 One step further – Speculative updates Eliminate remote misses – Maintain sharers list after invalidation – Forward new data to sharers – Downgrade local state to SHARED Need to choose carefully what data to forward – Don’t want to change cpu core – Delayed Intervention
13
13 Delayed Intervention
14
14 Evaluation
15
15 Benchmarks
16
16 Results
17
17 Results
18
18 Results
19
19 Conclusions Adapting mechanisms to improve producer- consumer relationships Eliminate remote misses Directory delegation & speculative updates Minor hardware cost – 32 entry delegate cache & 32KB RAC Exec time ↓13%, remote misses ↓29%, network traffic ↓17% – 1K-entry delegate cache & 1MB RAC Exec time ↓21%, remote misses ↓40%, network traffic ↓15%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.