Download presentation
Presentation is loading. Please wait.
Published byBruce Doyle Modified over 9 years ago
1
Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2 Chris Gniady and Babak Falsafi
2
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage CPU … Cache Memory Bus Memory DSM Hardware Network Distributed Shared Memory (DSM) Logically shared but physically distributed memory Shared-memory programming Scalable Long shared memory access can be a bottleneck!
3
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Programming DSM To achieve high performance: Release Consistency (RC) Relaxes memory order Software annotation What programmers want: Sequential Consistency (SC) Intuitive Memory order enforced slow Prior work: Speculative SC (SC++) [ISCA’99] Hardware speculatively relaxes order High performance & intuitive Large custom “speculative history” queue
4
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage This Talk’s Contributions 1.Characterize history size across apps Varies from 16 to 8K entries! Bursty: Over 85% of time empty 2.Propose SC++Lite Allocates history in memory hierarchy Enhances scalability across apps & systems Reduces custom storage from 51 KB to 2 KB Result Speculative SC (almost) for Free!
5
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Outline Overview Memory Ordering in RC Memory Ordering in SC++ SC++Lite: SC++ with Little Custom Storage Results Conclusions
6
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage ST X ST A LD A ALU Retired Out of order Memory Ordering in RC “LD A” & “ST A” retire out of order Overlaps “ST X”, “LD Y” & “LD Z” misses Software guarantees overlap is ok! Reorder Buffer LD Y LD Z... LD/ST Queue LD Z Miss ST X Miss LD Y Miss...
7
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Speculative Retirement ST X ST A LD A ALU SC++: Hardware Relaxes Memory Order [ISCA’99] Speculatively retires instructions in hardware Rolls back when coherence messages hit in history Reorder Buffer LD Y LD Z... Speculative History Queue... Look up for potential rollback Coherence Messages LD/ST Queue LD Z Miss ST X Miss LD Y Miss...
8
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++’s Implementation Overhead Speculative History Queue: On-chip custom storage Grows up to subsequent missing load Size is application & system dependent — Must assume worst-case size at design! Can we (virtually) eliminate custom storage in SC++?
9
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++Lite: SC++ with Little Custom Storage Store history into memory hierarchy! 1.Queue allocated at boot time in physical memory 2.Use block buffer to pack history, ship to L2 3.Store ack updates head pointer (in LD/ST queue) 4.ROB retirement updates tail pointer 5.“Dead” history is not written back!
10
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Memory Ordering in SC++Lite Only history burst retires into L2 History in L2 typically discarded ST A LD A ALU Speculative Block Buffer LD/ST Queue Location in L2 Speculative Retirement Reorder Buffer LD Y LD Z... Cache block to L2 LD Z Miss ST Z Miss LD Y Miss... ST X Miss ROB Index ROB... Head Look up for potential rollback Coherence Messages
11
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage SC++Lite Design Requirements Avoid perturbing application’s critical path! SBB: Size depends on L2 latency & retirement rate Large enough to filter store hits into L2 L2: Retirement rate proportional to required bandwidth Large blocks help Small blocks may need multiporting Head & tail registers reduce history traffic
12
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Outline Overview Memory Ordering in RC Memory Ordering in SC++ SC++Lite: SC++ with Little Custom Storage Results Conclusions
13
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Experimental Methodology Using RSIM 16 nodes with 1 GHz, 8-issue CPU 128-entry ROB & LD/ST queue Average remote-to-local access ratio of ~2 32-Kbyte, direct-mapped L1 cache 512-Kbyte, 8-way L2 cache, 64 GB/s 256-entry Lookup Table 32-entry SBB
14
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage History Size Characterization System & application dependent: varies 16–4K History is bursty: non-empty < 15% time
15
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Base RC, SC++ & SC++Lite Up to 80% gap between SC & RC 31% average speedup for SC++, 28% for SC++lite
16
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Sensitivity to 4x Network Latency SC++ requires 2x queue size to perform best SC++Lite’s performance remains stable
17
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Custom Storage Requirements SC++: ~51KB of custom storage Doubles for 4x network latency Radix shows worst-case history SC++Lite: ~2KB of custom storage for all apps Performance insensitive to network latency
18
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage Conclusions Previously showed [ISCA’99]: Speculative SC achieves RC’s performance This talk: Proposed SC++Lite Allocates history in memory hierarchy Enhances scalability across apps & systems Result Speculative SC (almost) for Free!
19
PACT 2002 Copyright 2002 Chris Gniady Speculative Sequential Consistency with Little Custom Storage For More Information Please visit our web site at Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University http://www.ece.cmu.edu/~puma2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.