Presentation is loading. Please wait.

Presentation is loading. Please wait.

COSC6385 Advanced Computer Architecture

Similar presentations


Presentation on theme: "COSC6385 Advanced Computer Architecture"— Presentation transcript:

1 COSC6385 Advanced Computer Architecture
Lecture 16. Cache Coherence Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

2 Topics Today Cache Coherence

3 Memory Hierarchy in a Multiprocessor
Shared cache P $ Bus-based shared memory Memory P P P Cache Memory P $ Memory Fully-connected shared memory (Dancehall) Interconnection Network P $ Memory Interconnection Network Distributed shared memory

4 Cache Coherency Closest cache level is private
Multiple copies of cache line can be present across different processor nodes Local updates Lead to incoherent state Problem exhibits in both write-through and writeback caches Bus-based  globally visible Point-to-point interconnect  visible only to communicated processor nodes

5 Example (Writeback Cache)
Rd? Rd? Cache Cache Cache X= -100 X= -100 X= -100 X= 505 Memory X= -100

6 Example (Write-through Cache)
Rd? Cache Cache Cache X= -100 X= 505 X= -100 X= 505 X= 505 Memory X= -100

7 Memory Consistency Issue
What do you expect for the following codes? Initial values A=0 B=0 P1 P2 A=1; Flag = 1; while (Flag==0) {}; print A; Is it possible P2 prints A=0? P1 P2 A=1; B=1; print B; print A; Is it possible P2 prints A=0, B=1?

8 Memory Consistency Model
Programmers anticipate certain memory ordering and program behavior Become very complex When Running shared-memory programs A processor supports out-of-order execution A memory consistency model specifies the legal ordering of memory events when several processors access the shared memory locations

9 Implicit definition of coherence
Defining Coherence An MP is coherent if the results of any execution of a program can be reconstructed by a hypothetical serial order Implicit definition of coherence Write propagation Writes are visible to other processes Write serialization All writes to the same location are seen in the same order by all processes (to “all” locations called write atomicity) E.g., w1 followed by w2 seen by a read from P1, will be seen in the same order by all reads by other processors Pi

10 SC Example P1 P2 P3 P0 P1 P2 P3 P0 A=1 A=2 T=A Y=A U=A Z=A A=1 A=2 T=A
Sequentially Consistent Violating Sequential Consistency! (but possible in processor consistency model)

11 Maintain Program Ordering (SC)
Flag1 = Flag2 = 0 Dekker’s algorithm Only one processor is allowed to enter the CS P1 P2 Flag1 = 1 if (Flag2 == 0) enter Critical Section Flag2 = 1 if (Flag1 == 0) enter Critical Section Caveat: implementation fine with uni-processor,but violate the ordering of the above P0 P1 Flag2=0 Flag1=0 Flag1=1 Write Buffer Flag2=1 Write Buffer INCORRECT!! BOTH ARE IN CRITICAL SECTION! Flag1: 0 Flag2: 0

12 Atomic and Instantaneous Update (SC)
Update (of A) must take place atomically to all processors A read cannot return the value of another processor’s write until the write is made visible by “all” processors A = B = 0 P1 P2 P3 A = 1 if (A==1) B =1 if (B==1) R1=A

13 Atomic and Instantaneous Update (SC)
Update (of A) must take place atomically to all processors A read cannot return the value of another processor’s write until the write is made visible by “all” processors A = B = 0 P1 P2 P3 A = 1 if (A==1) B =1 if (B==1) R1=A Caveat when an update is not atomic to all … P0 P1 P2 P3 P4 A=1 A=1 A=1 A=1 B=1 B=1 R1=0?

14 Atomic and Instantaneous Update (SC)
Caches also make things complicated P3 caches A and B A=1 will not show up in P3 until P3 reads it in R1=A A = B = 0 P1 P2 P3 A = 1 if (A==1) B =1 if (B==1) R1=A

15 Another Example P0 P1 P2 P3 A=0 B=0 A=1 B=2 A=1 B=2 A=1 B=2 A=1 B=2 T1
See A’s update before B’s See B’s update before A’s

16 Bus Snooping based on Write-Through Cache
All the writes will be shown as a transaction on the shared bus to memory Two protocols Update-based Protocol Invalidation-based Protocol

17 Bus Snooping (Update-based Protocol on Write-Through cache)
X= 505 X= -100 X= 505 Memory Bus transaction X= -100 X= 505 Bus snoop Each processor’s cache controller constantly snoops on the bus Update local copies upon snoop hit

18 Bus Snooping (Invalidation-based Protocol on Write-Through cache)
Load X Cache Cache Cache X= -100 X= 505 X= 505 Memory Bus transaction X= -100 X= 505 Bus snoop Each processor’s cache controller constantly snoops on the bus Invalidate local copies upon snoop hit

19 A Simple Snoopy Coherence Protocol for a WT, No Write-Allocate Cache
PrWr / BusWr PrRd / --- Valid PrRd / BusRd BusWr / --- Invalid Observed / Transaction PrWr / BusWr Processor-initiated Transaction Bus-snooper-initiated Transaction

20 How about Writeback Cache?
WB cache to reduce bandwidth requirement The majority of local writes are hidden behind the processor nodes How to snoop? Write Ordering

21 Cache Coherence Protocols for WB caches
A cache has an exclusive copy of a line if It is the only cache having a valid copy Memory may or may not have it Modified (dirty) cache line The cache having the line is the owner of the line, because it must supply the block

22 Cache Coherence Protocol (Update-based Protocol on Writeback cache)
Store X Cache Cache Cache X= 505 X= -100 X= 505 X= -100 X= 505 X= -100 update Memory Bus transaction Update data for all processor nodes who share the same data For a processor node keeps updating the memory location, a lot of traffic will be incurred

23 Cache Coherence Protocol (Update-based Protocol on Writeback cache)
Store X Load X Cache Cache Cache X= 505 X= 333 X= 333 X= 505 X= 333 X= 505 Hit ! update update Memory Bus transaction Update data for all processor nodes who share the same data For a processor node keeps updating the memory location, a lot of traffic will be incurred

24 Cache Coherence Protocol (Invalidation-based Protocol on Writeback cache)
Store X Cache Cache Cache X= -100 X= -100 X= 505 X= -100 invalidate Memory Bus transaction Invalidate the data copies for the sharing processor nodes Reduced traffic when a processor node keeps updating the same memory location

25 Cache Coherence Protocol (Invalidation-based Protocol on Writeback cache)
Load X Cache Cache Cache X= 505 X= 505 Miss ! Snoop hit Memory Bus transaction Bus snoop Invalidate the data copies for the sharing processor nodes Reduced traffic when a processor node keeps updating the same memory location

26 Cache Coherence Protocol (Invalidation-based Protocol on Writeback cache)
Store X P P P Store X Store X Cache Cache Cache X= 505 X= 987 X= 333 X= 444 X= 505 Memory Bus transaction Bus snoop Invalidate the data copies for the sharing processor nodes Reduced traffic when a processor node keeps updating the same memory location

27 MSI Writeback Invalidation Protocol
Modified Dirty Only this cache has a valid copy Shared Memory is consistent One or more caches have a valid copy Invalid Writeback protocol: A cache line can be written multiple times before the memory is updated.

28 MSI Writeback Invalidation Protocol
Two types of request from the processor PrRd PrWr Three types of bus transactions post by cache controller BusRd PrRd misses the cache Memory or another cache supplies the line BusRd eXclusive (Read-to-own) PrWr is issued to a line which is not in the Modified state BusWB Writeback due to replacement Processor does not directly involve in initiating this operation

29 MSI Writeback Invalidation Protocol (Processor Request)
PrWr / BusRdX PrWr / --- PrRd / --- Modified Shared PrRd / --- PrWr / BusRdX PrRd / BusRd Invalid Processor-initiated

30 MSI Writeback Invalidation Protocol (Bus Transaction)
BusRd / Flush BusRd / --- Modified Shared BusRdX / Flush BusRdX / --- Flush data on the bus Both memory and requestor will grab the copy The requestor get data by Cache-to-cache transfer; or Memory Invalid Bus-snooper-initiated

31 Bus-snooper-initiated
MSI Writeback Invalidation Protocol (Bus transaction) Another possible implementation Another possible, valid implementation Anticipate no more reads from this processor A performance concern Save “invalidation” trip if the requesting cache writes the shared line later BusRd / Flush BusRd / --- Modified Shared BusRdX / Flush BusRdX / --- BusRd / Flush Invalid Bus-snooper-initiated

32 MSI Writeback Invalidation Protocol
PrWr / BusRdX PrWr / --- PrRd / --- BusRd / Flush BusRd / --- Modified Shared PrRd / --- BusRdX / Flush BusRdX / --- PrWr / BusRdX Invalid PrRd / BusRd Processor-initiated Bus-snooper-initiated

33 MSI Example P1 P2 P3 Bus BusRd MEMORY Processor Action State in P1
Cache Cache Cache X=10 S Bus BusRd MEMORY X=10 Processor Action State in P1 State in P2 State in P3 Bus Transaction Data Supplier P1 reads X S --- BusRd Memory

34 MSI Example P1 P2 P3 BusRd Bus MEMORY Processor Action State in P1
Cache Cache Cache X=10 S X=10 S BusRd Bus MEMORY X=10 Processor Action State in P1 State in P2 State in P3 Bus Transaction Data Supplier P1 reads X S --- BusRd Memory S --- BusRd Memory P3 reads X

35 Does not come from memory if having “BusUpgrade”
MSI Example P1 P2 P3 Cache Cache Cache --- I X=10 S X=10 X=-25 S M BusRdX Bus MEMORY X=10 Processor Action State in P1 State in P2 State in P3 Bus Transaction Data Supplier P1 reads X S --- BusRd Memory S --- BusRd Memory P3 reads X I --- M BusRdX Memory P3 writes X Does not come from memory if having “BusUpgrade”

36 MSI Example P1 P2 P3 BusRd Bus MEMORY Processor Action State in P1
Cache Cache Cache --- X=-25 S I X=-25 S M BusRd Bus MEMORY X=-25 X=10 Processor Action State in P1 State in P2 State in P3 Bus Transaction Data Supplier P1 reads X S --- BusRd Memory S --- BusRd Memory P3 reads X Memory P3 writes X I --- M BusRdX P1 reads X S --- BusRd P3 Cache

37 MSI Example P1 P2 P3 BusRd Bus MEMORY Processor Action State in P1
Cache Cache Cache X=-25 S X=-25 S X=-25 S M BusRd Bus MEMORY X=-25 X=10 Processor Action State in P1 State in P2 State in P3 Bus Transaction Data Supplier P1 reads X S --- BusRd Memory S --- BusRd Memory P3 reads X P3 writes X I --- M BusRdX Memory P1 reads X S --- BusRd P3 Cache P2 reads X S BusRd Memory


Download ppt "COSC6385 Advanced Computer Architecture"

Similar presentations


Ads by Google