Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University

Similar presentations


Presentation on theme: "Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University"— Presentation transcript:

1 Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2 Chris Gniady and Babak Falsafi

2 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage CPU … Cache Memory Bus Memory DSM Hardware Network Distributed Shared Memory (DSM) Logically shared but physically distributed memory  Shared-memory programming  Scalable  Long shared memory access can be a bottleneck!

3 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Programming DSM To achieve high performance:  Release Consistency (RC)  Relaxes memory order  Software annotation What programmers want:  Sequential Consistency (SC)  Intuitive  Memory order enforced  slow Prior work: Speculative SC (SC++) [ISCA’99]  Hardware speculatively relaxes order  High performance & intuitive  Large custom “speculative history” queue

4 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage This Talk’s Contributions 1.Characterize history size across apps  Varies from 16 to 8K entries!  Bursty: Over 85% of time empty 2.Propose SC++Lite  Allocates history in memory hierarchy  Enhances scalability across apps & systems  Reduces custom storage from 51 KB to 2 KB Result  Speculative SC (almost) for Free!

5 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Outline  Overview  Memory Ordering in RC  Memory Ordering in SC++  SC++Lite: SC++ with Little Custom Storage  Results  Conclusions

6 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage ST X ST A LD A ALU Retired Out of order Memory Ordering in RC  “LD A” & “ST A” retire out of order  Overlaps “ST X”, “LD Y” & “LD Z” misses  Software guarantees overlap is ok! Reorder Buffer LD Y LD Z... LD/ST Queue LD Z Miss ST X Miss LD Y Miss...

7 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Speculative Retirement ST X ST A LD A ALU SC++: Hardware Relaxes Memory Order [ISCA’99]  Speculatively retires instructions in hardware  Rolls back when coherence messages hit in history Reorder Buffer LD Y LD Z... Speculative History Queue... Look up for potential rollback Coherence Messages LD/ST Queue LD Z Miss ST X Miss LD Y Miss...

8 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++’s Implementation Overhead Speculative History Queue:  On-chip custom storage  Grows up to subsequent missing load  Size is application & system dependent — Must assume worst-case size at design! Can we (virtually) eliminate custom storage in SC++?

9 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++Lite: SC++ with Little Custom Storage Store history into memory hierarchy! 1.Queue allocated at boot time in physical memory 2.Use block buffer to pack history, ship to L2 3.Store ack updates head pointer (in LD/ST queue) 4.ROB retirement updates tail pointer 5.“Dead” history is not written back!

10 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Memory Ordering in SC++Lite  Only history burst retires into L2  History in L2 typically discarded ST A LD A ALU Speculative Block Buffer LD/ST Queue Location in L2 Speculative Retirement Reorder Buffer LD Y LD Z... Cache block to L2 LD Z Miss ST Z Miss LD Y Miss... ST X Miss ROB Index ROB... Head Look up for potential rollback Coherence Messages

11 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++Lite Design Requirements Avoid perturbing application’s critical path! SBB:  Size depends on L2 latency & retirement rate  Large enough to filter store hits into L2 L2:  Retirement rate proportional to required bandwidth  Large blocks help  Small blocks may need multiporting  Head & tail registers reduce history traffic

12 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Outline  Overview  Memory Ordering in RC  Memory Ordering in SC++  SC++Lite: SC++ with Little Custom Storage  Results  Conclusions

13 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Experimental Methodology Using RSIM  16 nodes with 1 GHz, 8-issue CPU  128-entry ROB & LD/ST queue  Average remote-to-local access ratio of ~2  32-Kbyte, direct-mapped L1 cache  512-Kbyte, 8-way L2 cache, 64 GB/s  256-entry Lookup Table  32-entry SBB

14 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage History Size Characterization  System & application dependent: varies 16–4K  History is bursty: non-empty < 15% time

15 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Base RC, SC++ & SC++Lite  Up to 80% gap between SC & RC  31% average speedup for SC++, 28% for SC++lite

16 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Sensitivity to 4x Network Latency  SC++ requires 2x queue size to perform best  SC++Lite’s performance remains stable

17 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Custom Storage Requirements SC++:  ~51KB of custom storage  Doubles for 4x network latency  Radix shows worst-case history SC++Lite:  ~2KB of custom storage for all apps  Performance insensitive to network latency

18 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage Conclusions Previously showed [ISCA’99]:  Speculative SC achieves RC’s performance This talk:  Proposed SC++Lite  Allocates history in memory hierarchy  Enhances scalability across apps & systems Result  Speculative SC (almost) for Free!

19 PACT 2002 Copyright 2002  Chris Gniady Speculative Sequential Consistency with Little Custom Storage For More Information Please visit our web site at Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2


Download ppt "Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University"

Similar presentations


Ads by Google