Is SC + ILP = RC? Chris Gniady, Babak Falsafr, and T.N. Vijaykumar Presented By Jacob Harer
Idea Use large amounts of memory ILP to increase speed in SC Relax all memory order speculatively in each core. Appear to all other cores to be non speculative
Implementation Need Speculate on both loads and stores Large speculative state No additional overhead Well behaved programs Store all instructions in a Speculative History Que (SHiQ) Roll back data if speculative data is accessed before it commits
Roll Back On Invalidation of speculatively loaded or stored data. On read of speculatively stored data On replacement due to a miss Stored in Block Lookup Table (BLT) Roll back by restoring from the SHiQ No speculation until store completes
Example Processor 1 speculative load to Block Processor 1 Does some other work Processor 1 speculative Store to Block Processor 2 load to Block Get shared Get Exclusive Roll Back from SHiQ, Send old non speculative data
Conclusions Good results Potential for lots of pathological cases. Where blocks are loaded way ahead of time. Reducing effectiveness of speculation This is reduced by only speculatively storing once.
Questions? How many workloads are “well behaved” Could RC benefit from the same ILP exploitation? Can you speculatively load across cores? Slow down in processor due to additional hardware.