Presentation is loading. Please wait.

Presentation is loading. Please wait.

Is SC + ILP = RC? C. Gniady, B. Falsafi, and T.N. Vijaykumar - Purdue

Similar presentations


Presentation on theme: "Is SC + ILP = RC? C. Gniady, B. Falsafi, and T.N. Vijaykumar - Purdue"— Presentation transcript:

1 Is SC + ILP = RC? C. Gniady, B. Falsafi, and T.N. Vijaykumar - Purdue
Presented by: Eric Carty-Fickes

2 Introduction SC RC produces memory order with hardware
easier to program worse performance due to conservativism RC produces memory order with software harder to program better performance due to explicitness

3 catching up to RC SC limitation: no software guarantees
memory order is arbitrary, no devices such as fences SC can allow loads and stores to bypass one another processor state must be remembered, but rollbacks should be avoided – slow superscalar rollbacks are faster rollbacks caused by data races, false sharing, cache conflicts encourage load/store speculation but make it transparent check for reading or replacement of speculative blocks

4 SC++ ILP allows more speculation in SC – invisible to outside world due to in order retirement branch predictors, superscalar, non-blocking caches maybe can perform up to the level of RC allows stores to bypass as well as loads allows out-of-order operations to hide latency quickly recovers from mis-speculation assumes applications designed for MP’s/DSM

5 SC++ Architecture modelled after R10K
SHiQ allows for prefetching and non-blocking caches other processors see SC history buffer allows speculative retirement unblocks RoB stores load/store queue takes stores from RoB BLT has block addr’s for SHiQ

6 Simulations using RSIM for 8-node DSM, 16k L1, 8M L2
all use non-blocking caches, prefetching, speculative loads rollbacks = 1 cycle SC++ rollbacks = 4 wide SC blocks at stores RC hides network latency with store overlaps raytrace hurt by lock patterns, slow network

7 More Simulations RoB increase = more prefetch time
unstructured causes many rollbacks for SC SC++o = no speculative stores radix and raytrace = store-intensive, full load/store queue

8 Another Simulation L2 size reduced less room for speculative state
lu sees many rollbacks caused by replacements

9 Conclusions/Questions
SC++ nearly up to snuff with RC with minor additional hardware does this really matter – is it that much harder to program with RC? does this add any significant risk of errors due to extra hardware and speculation? do you buy their argument that applications causing rollback are not suited to DSM systems anyway?


Download ppt "Is SC + ILP = RC? C. Gniady, B. Falsafi, and T.N. Vijaykumar - Purdue"

Similar presentations


Ads by Google