Lecture 4: Update Protocol

Slides:

Advertisements

Similar presentations

4/16/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.

Advertisements

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.

1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.

1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )

1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.

1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.

1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.

1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)

1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.

1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )

CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.

Lecture 9 ECE/CSC Spring E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.

1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )

The University of Adelaide, School of Computer Science

1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.

Outline Introduction (Sec. 5.1)

Lecture 8: Snooping and Directory Protocols

COSC6385 Advanced Computer Architecture

תרגול מס' 5: MESI Protocol

Cache Coherence in Shared Memory Multiprocessors

Lecture 19: Coherence and Synchronization

The University of Adelaide, School of Computer Science

Lecture 18: Coherence and Synchronization

A Study on Snoop-Based Cache Coherence Protocols

Cache Coherence for Shared Memory Multiprocessors

Lecture 9 Outline MESI protocol Dragon update-based protocol

The University of Adelaide, School of Computer Science

Krste Asanovic Electrical Engineering and Computer Sciences

Lecture 2: Snooping-Based Coherence

Chip-Multiprocessor.

CMSC 611: Advanced Computer Architecture

Lecture 5: Snooping Protocol Design Issues

Lecture 8: Directory-Based Cache Coherence

Bus-Based Coherent Multiprocessors

Lecture 7: Directory-Based Cache Coherence

Lecture 21: Synchronization and Consistency

Lecture: Coherence and Synchronization

Lecture 9: Directory Protocol Implementations

Lecture 25: Multiprocessors

Lecture 9: Directory-Based Examples

Lecture 10: Consistency Models

Lecture 4: Synchronization

Lecture 8: Directory-Based Examples

Lecture 25: Multiprocessors

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 24: Multiprocessors

Lecture: Coherence, Synchronization

Lecture: Coherence Topics: wrap-up of snooping-based coherence,

Lecture 3: Coherence Protocols

Lecture 8 Outline Memory consistency

Lecture 19: Coherence Protocols

Coherent caches Adapted from a lecture by Ian Watson, University of Machester.

Prof John D. Kubiatowicz

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture: Coherence and Synchronization

Lecture 19: Coherence and Synchronization

Lecture 18: Coherence and Synchronization

The University of Adelaide, School of Computer Science

Lecture 10: Directory-Based Examples II

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 11: Consistency Models

Presentation transcript:

Lecture 4: Update Protocol Topics: update protocol, evaluating coherence

Update Protocol (Dragon) 4-state write-back update protocol, first used in the Dragon multiprocessor (1984) Write-back update is not the same as write-through – on a write, only caches are updated, not memory Goal: writes may usually not be on the critical path, but subsequent reads may be

4 States No invalid state Modified and Exclusive-clean as before: used when there is a sole cached copy Shared-clean: potentially multiple caches have this block and main memory may or may not be up-to-date Shared-modified: potentially multiple caches have this block, main memory is not up-to-date, and this cache must update memory – only one block can be in Sm state In reality, one state would have sufficed – more states to reduce traffic

Design Issues If the update is also sent to main memory, the Sm state can be eliminated If all caches are informed when a block is evicted, the block can be moved from shared to M or E – this can help save future bus transactions The wire used to determine exclusivity is especially useful for an update protocol

Example P1 P2 MSI MESI Dragon MSI MESI Dragon P1: Rd X P1: Wr X Total transfers:

Evaluating Coherence Protocols There is no substitute for detailed simulation – high communication need not imply poor performance if the communication is off the critical path – for example, an update protocol almost always consumes more bandwidth, but can often yield better performance An easy (though, not entirely reliable) metric – simulate cache accesses and compute state transitions – each state transition corresponds to a fixed amount of interconnect traffic

State Transitions To From NP I E S M 1.25 0.96 1.68 0.64 1.87 0.002 1.25 0.96 1.68 0.64 1.87 0.002 0.20 14.0 0.02 1.00 0.42 2.5 134.7 2.24 2.63 2.3 843.6 NP – Not Present State transitions per 1000 data memory references for Ocean To From NP I E S M -- BusRd BusRdX Not possible BusUpgr BusWB Bus actions for each state transition

Cache Misses Coherence misses: cache misses caused by sharing of data blocks – true (two different processes access the same word) and false (processes access different words in the same cache line) False coherence misses are zero if the block size equals the word size An upgrade from S to M is a new type of “cache miss” as it generates (inexpensive) bus traffic

Block Size For most programs, a larger block size increases the number of false coherence misses, but significantly reduces most other types of misses (because of locality) – a very large block size will finally increase conflict misses Large block sizes usually result in high bandwidth needs in spite of the lower miss rate Alleviating false sharing drawbacks of a large block size: maintain state information at a finer granularity (in other words, prefetch multiple blocks on a miss) delay write invalidations reorganize data structures and decomposition

Update-Invalidate Trade-Offs The best performing protocol is a function of sharing patterns – are the sharers likely to read the newly updated value? Examples: locks, barriers Each variable in the program has a different sharing pattern – what can we do? Implement both protocols in hardware – let the programmer/hw select the protocol for each page/block For example: in the Dragon update protocol, maintain a counter for each block – an access sets the counter to MAX, while an update decrements it – if the counter reaches 0, the block is evicted

Title Bullet