Lecture 4: Update Protocol

Slides:



Advertisements
Similar presentations
4/16/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.
Advertisements

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.
Lecture 9 ECE/CSC Spring E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
The University of Adelaide, School of Computer Science
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Outline Introduction (Sec. 5.1)
Lecture 8: Snooping and Directory Protocols
COSC6385 Advanced Computer Architecture
תרגול מס' 5: MESI Protocol
Cache Coherence in Shared Memory Multiprocessors
Lecture 19: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
A Study on Snoop-Based Cache Coherence Protocols
Cache Coherence for Shared Memory Multiprocessors
Lecture 9 Outline MESI protocol Dragon update-based protocol
The University of Adelaide, School of Computer Science
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 2: Snooping-Based Coherence
Chip-Multiprocessor.
CMSC 611: Advanced Computer Architecture
Lecture 5: Snooping Protocol Design Issues
Lecture 8: Directory-Based Cache Coherence
Bus-Based Coherent Multiprocessors
Lecture 7: Directory-Based Cache Coherence
Lecture 21: Synchronization and Consistency
Lecture: Coherence and Synchronization
Lecture 9: Directory Protocol Implementations
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Lecture 10: Consistency Models
Lecture 4: Synchronization
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 3: Coherence Protocols
Lecture 8 Outline Memory consistency
Lecture 19: Coherence Protocols
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Prof John D. Kubiatowicz
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Lecture 4: Update Protocol Topics: update protocol, evaluating coherence

Update Protocol (Dragon) 4-state write-back update protocol, first used in the Dragon multiprocessor (1984) Write-back update is not the same as write-through – on a write, only caches are updated, not memory Goal: writes may usually not be on the critical path, but subsequent reads may be

4 States No invalid state Modified and Exclusive-clean as before: used when there is a sole cached copy Shared-clean: potentially multiple caches have this block and main memory may or may not be up-to-date Shared-modified: potentially multiple caches have this block, main memory is not up-to-date, and this cache must update memory – only one block can be in Sm state In reality, one state would have sufficed – more states to reduce traffic

Design Issues If the update is also sent to main memory, the Sm state can be eliminated If all caches are informed when a block is evicted, the block can be moved from shared to M or E – this can help save future bus transactions The wire used to determine exclusivity is especially useful for an update protocol

Example P1 P2 MSI MESI Dragon MSI MESI Dragon P1: Rd X P1: Wr X Total transfers:

Evaluating Coherence Protocols There is no substitute for detailed simulation – high communication need not imply poor performance if the communication is off the critical path – for example, an update protocol almost always consumes more bandwidth, but can often yield better performance An easy (though, not entirely reliable) metric – simulate cache accesses and compute state transitions – each state transition corresponds to a fixed amount of interconnect traffic

State Transitions To From NP I E S M 1.25 0.96 1.68 0.64 1.87 0.002 1.25 0.96 1.68 0.64 1.87 0.002 0.20 14.0 0.02 1.00 0.42 2.5 134.7 2.24 2.63 2.3 843.6 NP – Not Present State transitions per 1000 data memory references for Ocean To From NP I E S M -- BusRd BusRdX Not possible BusUpgr BusWB Bus actions for each state transition

Cache Misses Coherence misses: cache misses caused by sharing of data blocks – true (two different processes access the same word) and false (processes access different words in the same cache line) False coherence misses are zero if the block size equals the word size An upgrade from S to M is a new type of “cache miss” as it generates (inexpensive) bus traffic

Block Size For most programs, a larger block size increases the number of false coherence misses, but significantly reduces most other types of misses (because of locality) – a very large block size will finally increase conflict misses Large block sizes usually result in high bandwidth needs in spite of the lower miss rate Alleviating false sharing drawbacks of a large block size: maintain state information at a finer granularity (in other words, prefetch multiple blocks on a miss) delay write invalidations reorganize data structures and decomposition

Update-Invalidate Trade-Offs The best performing protocol is a function of sharing patterns – are the sharers likely to read the newly updated value? Examples: locks, barriers Each variable in the program has a different sharing pattern – what can we do? Implement both protocols in hardware – let the programmer/hw select the protocol for each page/block For example: in the Dragon update protocol, maintain a counter for each block – an access sets the counter to MAX, while an update decrements it – if the counter reaches 0, the block is evicted

Title Bullet