Presentation is loading. Please wait.

Presentation is loading. Please wait.

Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav,

Similar presentations


Presentation on theme: "Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav,"— Presentation transcript:

1 Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav, blundell, milom}@cis.upenn.edu

2 This work licensed under the Creative Commons Attribution-Share Alike 3.0 United States License You are free: to Share — to copy, distribute, display, and perform the work to Remix — to make derivative works Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to: http://creativecommons.org/licenses/by-sa/3.0/us/ Any of the above conditions can be waived if you get permission from the copyright holder. Apart from the remix rights granted under this license, nothing in this license impairs or restricts the author's moral rights. [ 2 ] PATCH - Arun Raghavan - MICRO 2008

3 Why Yet Another Coherence Protocol? Fast sharing Avoids broadcast Scalable interconnect ✔✗ Snoopy Directory Track sharers Token Coherence Token counting ✗ ✗ ✔ ✔ ✔ ✔ ✗ ✔✔ ✔ 1 2 12 3 1 2 Our goal This work: combining directory and token counting ? 3 PATCH - Arun Raghavan - MICRO 2008

4 Overview Begin with a standard directory protocol Fast sharing misses? Direct requests Ensure safety? Token counting Broadcast-free forward progress? Token Tenure Directory selects one requestor to retain tokens Requestors give up tokens after a timeout interval PATCH: Predictive, Adaptive Token Counting Hybrid Send request “hints” directly to predicted sharers Retain scalability? Lowest-priority, best-effort delivery Fast sharing misses, scales as directory [ 4 ] PATCH - Arun Raghavan - MICRO 2008

5 Directory Operation [ 5 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Directory

6 Directory Operation [ 6 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P0 MI I Directory

7 Directory Operation [ 7 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P0 MI I Directory Store miss GetM Fwd(P1) acks =1 I Data, acks=1 M Unblock 1 2 3

8 Directory Operation [ 8 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P0 MI I Directory Store miss GetM IM Unblock P1 Data, acks=1 Fwd(P1) acks =1

9 Directory with Direct Requests? [ 9 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 IM I Load miss GetS Data OS GetS

10 Directory with Direct Requests? [ 10 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 IM I Load miss Data OS GetS Store miss GetM Fwd(P0) acks=1 GetS

11 Directory with Direct Requests? [ 11 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 IM I Load miss Data OS GetS Store miss GetM Fwd(P0) acks=1 Data acks=1 I GetS

12 Directory with Direct Requests? [ 12 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 IM I Load miss Data OS GetS Store miss GetM Fwd(P0) acks=1 Data acks=1 M Incoherence!! GetS I Why? Direct requests break key directory assumption

13 Restoring Coherence Coherence invariant: one writer or many readers Directory: enforces implicitly by distributed algorithm Assumes complete state information at the directory Alternative: encode permission with token count Fixed number of tokens per cache block Need all tokens to write One or more tokens to read Explicitly enforces coherence invariant Without regard to races, protocol details [ 13 ] PATCH - Arun Raghavan - MICRO 2008 Token Coherence [ISCA ’03]

14 Directory with Direct Requests: Tokens [ 14 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS

15 [ 15 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS Data Directory with Direct Requests: Tokens

16 [ 16 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS Data Store miss GetM Fwd(P0) Directory with Direct Requests: Tokens

17 [ 17 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS Data Store miss GetM Data Directory with Direct Requests: Tokens Fwd(P0)

18 [ 18 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS Data Store miss GetM Data Directory with Direct Requests: Tokens Fwd(P0)

19 [ 19 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss GetS Data Store miss GetM Data Directory with Direct Requests: Tokens Fwd(P0)

20 [ 20 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss Store miss P0 Starves Directory with Direct Requests: Tokens GetS

21 [ 21 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss Store miss Token Coherence Solution: Persistent Requests GetS Persistent Request Broadcast Table at each processor N 2 state

22 [ 22 ] PATCH - Arun Raghavan - MICRO 2008 P0 P1 P2 Sharers: Owner: P1 Load miss Store miss P0’s request reached directory first Directory declares P0 winner Our Solution GetS Winner P2 non-winner Inferred after timeout Timeout Directory forwards to P0

23 Token Tenure [ 23 ] PATCH - Arun Raghavan - MICRO 2008

24 Token Tenure Tokens can be tenured or untenured Tokens by default untenured Untenured tokens must be sent to the directory… … unless tenured within timeout window Active (winner) requestors tenure tokens Directory activates one request at a time Directory explicitly informs active requestor Multiple processors can hold tenured tokens Why does this ensure forward progress? [ 24 ] PATCH - Arun Raghavan - MICRO 2008

25 Flow of Tokens to Active Requestor [ 25 ] PATCH - Arun Raghavan - MICRO 2008 Timeout Bounce Forwarded request Racing Request Active Directory Untenured Tenured Direct request Restore directory’s ability to resolve races Implementation: add timeout

26 Token Tenure: Implementation No common-case performance impact Activation off critical path of miss Token count still determines permissions No additional traffic Activation piggybacked on forwarded messages Set timeout to twice average roundtrip latency Avoid early timeout…. …but minimize slowing down winner in races [ 26 ] PATCH - Arun Raghavan - MICRO 2008

27 Using Direct Requests Direct requests to no, some or all processors [ 27 ] PATCH - Arun Raghavan - MICRO 2008 19%7%28%8%18% Direct requests improve performance But at what cost? jbb oltp apache barnes ocean 64 processors, 16B/cycle normalized runtime Average 14% Dest. Set Prediction [ISCA ’03]

28 Direct Requests: Runtime and Traffic Direct requests to no, some or all processors If successful, two hop miss Else, directory forwards anyway [ 28 ] PATCH - Arun Raghavan - MICRO 2008 Predictors see benefit using fewer direct requests normalized runtime normalized traffic jbb oltp apache barnes ocean PATCH-NoDirect and Directory have identical traffic PATCH-Broadcast has >100% overhead Runtime Traffic

29 Best-Effort Direct Requests Direct requests in PATCH 1.Strictly in addition to directory requests 2.Don’t need explicit acks  direct requests can be dropped arbitrarily Best-effort delivery Lowest priority, deliver strictly on “do-no-harm-basis” If queued up too long in switches, controller: drop  lower-bound: PATCH-NoDirect performance Adequate bandwidth? drop no requests Scarce bandwidth? drop all requests Never worse than directory [ 29 ] PATCH - Arun Raghavan - MICRO 2008

30 Best-Effort Direct Requests 4 8 16 32 64 128 256 512 normalized runtime number of processors 29% 20% Broadcast performance with plentiful bandwidth Converges with directory performance at 512 Adapt dynamically; one-size-fits-all Better than both [ 30 ] PATCH - Arun Raghavan - MICRO 2008

31 Enhancing Directory Scalability [ 31 ] PATCH - Arun Raghavan - MICRO 2008

32 Enhancing Directory Scalability Req IISS Directory Forward Ack 0 1 Directory [ 32 ] PATCH - Arun Raghavan - MICRO 2008

33 Enhancing Directory Scalability Coarse directories: 1-bit for k sharers Fan-out delivery of forwards: worst case O(N) traffic Requires acks from non-sharers too Multiple unicast messages (no ack combining) Worst case O(N√N) on 2D torus interconnect Req IISS Directory Forward Ack Directory-coarse 1 0 1 [ 33 ] PATCH - Arun Raghavan - MICRO 2008

34 Enhancing Directory Scalability With PATCH only token holders need respond Avoid “unnecessary acknowledgements” When # of sharers small, prevents ack from dominating Even more scalable than directory Req IISS Directory Forward Ack Req Directory Forward Directory-coarse PATCH-coarse 1 [ 34 ] PATCH - Arun Raghavan - MICRO 2008

35 PATCH has high tolerance to inexactness Traffic comparison normalized traffic 1 2 4 8 16 32 64 128 256 coarseness (sharers/bit) 319% 32% DirectoryPATCH Runtime comparison, 2B/cycle normalized runtime 142% 3.6% microbenchmark @ 256 processors Coarse Directory: Runtime and Traffic [ 35 ] PATCH - Arun Raghavan - MICRO 2008

36 Related Work Token counting Token Coherence [Martin+, ISCA ‘03] Priority Requests [Cuesta+, PDP ‘07] Virtual Hierarchies [Marty+, ISCA ’07] Ring Order [Marty+, MICRO ‘06] Predictive direct requests Multicast snooping [Bilir+, ISCA ‘99] Owner Prediction [Acacio+, SC ‘02] Producer-Consumer sharing [Cheng+, HPCA ‘07] Virtual Circuit Tree Multicast [Jerger+, ISCA ‘08] Bandwidth Adaptive Snooping [Martin+, HPCA ‘02] Embedded ring snooping Uncorq [Strauss+, MICRO ‘07] [ 36 ] PATCH - Arun Raghavan - MICRO 2008

37 Conclusion PATCH Directory protocol foundation Fast sharing? Direct requests Safety? Token counting Forward progress? Token tenure Broadcast-free Retain scaling of directory? Best-effort delivery Resulting properties One-size-fits-all Opportunistically uses bandwidth for performance Yet scales no worse than directory [ 37 ] PATCH - Arun Raghavan - MICRO 2008

38


Download ppt "Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav,"

Similar presentations


Ads by Google