A Systematic Methodology to Develop Resilient Cache Coherence Protocols Konstantinos Aisopos (Princeton, MIT) Li-Shiuan Peh (MIT)
Motivation CMP era is here… Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months) NIC P$ S$ P P CC … CC R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R [1,2] [1] R. Bauman (TI), IEEE Design Test of Computers, vol. 22 (3), 2005 [2] J. Graham (MoSys), EE Times, 2002
Motivation CMP era is here… Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months) Goal: resilient cache coherence protocol NIC P$ S$ P P CC … CC loss of a single coherence message : deadlock R R R data request R R R R R R S R R
Outline Motivation Methodology – Walkthrough: a resilient transaction – Defining resilience properties – Enforcing resilience properties Evaluation – Overhead – Performance Conclusions
S1 S2 S S R S SM dir I I M request (M) unblock ack S{ } B M M{ } request (M) R S1S2R 1. initiator sends request to the directory 2. directory forwards request to the sharers 3. sharers invalidate their copy and acknowledge 4. request completes and initiator sends unblock to the dir 5. dir updates sharing vector and may now process succeeding requests Walkthrough Example: transaction resilient transaction
S1 S2 R S dir request (M) SM request (M) 1. initiator sends request to the directory 2. request is lost 3. initiator resends request after a timeout 4. directory forwards request to the sharers (…transaction continues identically as before) Walkthrough Example: transaction resilient transaction
S2 S1 R request (M) ack S{R,S1,S2 } B M request (M) S SM dir ack S{ } R S1S2 1. initiator resends its request Walkthrough Example: transaction resilient transaction
S2 S1 R request (M) ack S{R,S1,S2 } B M request (M) S SM S request (M) request (S) B S unblock S M B M request (M) ? request (M) dir tolerate a duplicate request: (1) transit to same state (2) generate the same messages S{ } R S1S2 1. initiator resends its request Walkthrough Example: transaction resilient transaction B M (M) request unblock
S2 S1 R request (M) ack request (M) S SM ack dir S{R,S1,S2 } B M S{ } R S1S2 1. initiator resends its request 2. directory forwards the request to sharers (again) Walkthrough Example: transaction resilient transaction
S2 S1 request (M) ack S I request (M) ack request (M) ack Walkthrough Example: transaction resilient transaction tolerate a duplicate request: (1) transit to same state (2) generate the same messages
S2 S1 R request (M) ack request (M) S SM ack dir ack M 1. initiator resends its request 2. directory forwards the request to sharers (again) 3. sharers acknowledge (again) (…transaction completes identically as before) Walkthrough Example: transaction resilient transaction
Outline Motivation Methodology – Walkthrough: a resilient transaction – Defining resilience properties – Enforcing resilience properties Evaluation – Overhead – Performance Conclusions
Defining the Resilience Properties request R … … … R response - same state transition - same outgoing messages - same state transition - same outgoing messages response message loss => transaction suspended the requestor regenerates its request after timeout
Defining the Resilience Properties request X A msgA … Y … msgB msgA msgB transient … stable request stable message last R … … … Property 1 initiator remains transient throughout the transaction Property 2 replicate msgs roll-back to same earlier state Property 3 retain information to regenerate msgs R response
Outline Motivation Methodology – Walkthrough: a resilient transaction – Defining resilience properties – Enforcing resilience properties Evaluation – Overhead – Performance Conclusions
Enforcing Property 1 the initiator remains transient throughout a transaction to be able to resend lost messages transient … stable request stable message last Property 1
Enforcing Property 1 the initiator remains transient throughout a transaction to be able to resend lost messages transient … request stable message last Property 1 transient stable request stable dir … response unblock done initiator cannot resend unblock counter-example: Enforcement: transient - detect every outgoing message that transits the initiator to stable state - replace the stable with a transient state, and wait for done stable
Enforcing Property 2 Property 2 A msgA … replicate messages roll-back to the earlier state the original message transitioned to
T1T1 S msgA … T2T2 … … … … TMTM … T M2 T1T1 S msgA … T2T2 … … … … T M1 TMTM disassociate branches after merging point msgA T 1 or T 2 ? Enforcing Property 2 replicate messages roll-back to the earlier state the original message transitioned to Property 2 A msgA …
unique data I M R request (M) dir ( ) unique data request (M) dir ( ) Enforcing Property 3 retain info to regenerate every outgoing message, in case a replicate request is received Property 3 msgA … msgB msgA msgB Sharer
TMTM … unique data M R request (M) dir ( ) I TITI invalidate permission invalidate ack … Enforcing Property 3 retain info to regenerate every outgoing message, in case a replicate request is received Property 3 msgA … msgB msgA msgB Sharer unique data retains
Outline Motivation Methodology – Walkthrough: a resilient transaction – Defining resilience properties – Enforcing resilience properties Evaluation – Overhead – Performance Conclusions
Evaluation: Overhead directory-based protocol (static directory node, MESI) base statesresilient states stable ModifiedMd (M, waiting done) Ed (E, waiting done) Exclusive SharedSd (S, waiting done) InvalidId (I, waiting done) transient IM (I M)Sp (S, waiting permission) IS (I S)Ip (I, waiting permission) SM (S M)Ma (M, waiting ack) ISI (IS I)Sa (S, waiting ack) MI (M I) base statesresilient states stable ransient ModifiedMd (M, waiting done) OwnedEd (E, waiting done) ExclusiveSd (S, waiting done) SharedId (I, waiting done) InvalidMId (MI, waiting done) transient IM (I M)Sp (S, waiting permission) IS (I S)Ip (I, waiting permission) SM (S M)Ma (M, waiting ack) SE (S E)Ea (E, waiting ack) SS (S S)Sa (S, waiting ack) OM (O M) WB req broadcast-based protocol (AMD Hammer, MOESI) 9 to 17 states (4 to 5 bits) 12 to 22 cache states (4 to 5 bits) 12 to 22 states (4 to 5 bits) stable transient stable transient No state was introduced into the critical path of serving a request
PCaddressrequestorflagsstate Miss Status Holding Register (MSHR) entries 4-32 timer 0 to 2 13 state 1bit 13bits response bitvector 64bits trans ID 6bits 11 bytes total storage overhead : < 0.5 KB / core (worst-case: 2KB / core) (*)(*) assuming a 64-node CMP with in-order cores (*)(*) Evaluation: Overhead
Network-on-Chip Topology8x8 mesh Channels64-bit VNets5 RoutingXY System Configuration Processorsin-order SPARC cores L1 Caches64KB/node, 3 cycles4-way 64Byte blk L2 Caches1MB/node, 6 cycles Memory4 controllers * 1GB, 160 cycles Simulator: Wisconsin Multifacet GEMS Evaluation: Performance
fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate SPLASHPARSEC 7.4% 11% 1.4% 1.8% 1.1% 3.5% lower is better directory protocol Evaluation: Performance metric: runtime overhead vs. non-resilient baseline
fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate SPLASHPARSEC 2.4% 5.1% 0.5% 20.4% 51% 56% broadcast protocol Evaluation: Performance metric: runtime overhead vs. non-resilient baseline
Outline Motivation Methodology – Walkthrough: a resilient transaction – Defining resilience properties – Enforcing resilience properties Evaluation – Overhead – Performance Conclusions
We have presented a generic methodology: coherence protocol -> resilient coherence protocol …by enforcing 3 properties minimal hardware overhead (<2KB / node) small performance overhead – directory-based protocol: 1.4% (1 fault / msec) – broadcast-based protocol:2.4% (1 fault / msec) Conclusions
Thank You! Questions?
BACKUP SLIDES
Why performance overhead? transactions last longer => a request may have to wait for outstanding conflicting requests to complete data remain in caches for longer (3-way hs) => cache replacement duration more messages are injected in the NoC => network traffic => average NoC latency
Transaction Duration B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2) +12% +18%
Transaction Duration 11% 24% B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2) large working sets, shared data => high number of requests (high traffic) (!) retransmissions saturate network)
Network Traffic most congested link average over all links
Enforcing the Resilience Properties A single message type transits to a unique state in every FSM branch P2 … … T1T1 T2T2 msgA … Case 2: identical messages in same branch X Y msgA T count =1 T count =2 ack SM + acks =1 ack SM + acks =2 R request (M) SM + acks =0 … M
Enforcing the Resilience Properties A single message type transits to a unique state in every FSM branch P2 … … msgA … Case 2: identical messages in same branch X Y msgA T count =1 T count =2 … … X msgA T [XYZ=100] msgA … Y T [XYZ=110]
Enforcing the Resilience Properties A single message type transits to a unique state in every FSM branch P2 … … msgA … Case 2: identical messages in same branch X Y msgA T count =1 T count =2 … … X msgA T [XYZ=100] msgA … X T [XYZ=100] (duplicate)