A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University

Outline  Motivation  Thread level speculation (TLS)  Coherence scheme  Optimizations  Methodology  Results  Conclusion

Motivation  Leading chip manufactures going for multi- core architectures  Usually used to increase throughput  To exploit these parallel resources to increase performance – need to parallelize programs  Integer programs hard to parallelize  Use speculation – thread level speculation (TLS)!

Thread level speculation (TLS)

Scalable Approach  The paper aims to design a scalable approach which applies to wide variety of multiprocessor like architectures  Only limitation is that the architecture should be shared memory based  The TLS is implemented over the invalidation based cache coherence protocol

Example  Each cache line has special bits SL – speculative load has accessed the line SM – the line is speculatively modified  Thread is squashed if Line is present SL is set If epoch number indicates an earlier thread

Speculation level  We are concerned only with the speculation level – level in the cache hierarchy where the cache protocol begins  We can ignore all the other levels

Cache line states  Apart from the cache state bits we need SL and SM bits  A cache line with speculative bits set cannot be replaced  The thread is either squashed or the operation is delayed

Basic cache coherence protocol  When a processor wants to load a value, it atleast needs shared access to the line  When it wants to write, it needs exclusive access  Coherence mechanism issues invalidation message when it receives request for exclusive access

Coherence mechanism

Commit  When the homefree token arrives there is no possibility of further squashes  SpE is changed to E and SpS to S  Lines with SM bit set has to have D bit set  If a line is speculatively modified and shared, we have to get exclusive access for that line Ownership required buffer (ORB) is used to track such lines

Squash  All speculatively modified lines have to be invalidated  SpE is changed to E and SpS to S

Performance Optimizations  Forwarding Data Between Epochs: Predictable data dependences are synchronized  Dirty and Speculatively Loaded State: Usually if a dirty line is speculatively loaded, it is flushed – this can be avoided  Suspending Violations: When we have to evict a speculative line, we don ’ t need to squash

Multiple writers  If two epochs write to the same line – we have to squash one to avoid multiple writer problem  Possible to avoid this by maintaining fine grained disambiguation bits

Implementation

Epoch numbers  Has two parts – TID and sequence number  To avoid costly comparisons during every access – the difference is precomputed and a logically later mask is formed  Epoch numbers are maintained at one place for one chip

Speculative state implementation

Multiple writers - implementation  False violations are also handled in the same way

Correctness considerations  Speculation fails if the speculative state is lost  Exceptions are handled only when the homefree token is got  System calls are also postponed

Methodology  Detailed out-of-order simulation based on MIPS R10000 is done  Fork and other synchronization overhead is 10 cycles

Results  Normalized execution cycles

Results  Buk and equake – memory performance is a bottleneck  When increased more than 4 processors ijpeg performance degrades Number of threads available is less Some conflicts in cache

Overheads  Violations  Cache locality is important  ORB size can be further reduced – early release of ORB

Communication overhead  Buk is insensitive

Multiprocessor performance  Advantages More cache storage  Disadvantage Increased communication latency

Conclusion  By using TLS even integer programs can be parallelized to get speedup  The approach is scalable and can be applied to various other architectures which support multiple threads  There are applications that are insensitive to communication latency – so large scale parallel architectures using TLS are possible

Thanks!

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.

Similar presentations

Presentation on theme: "A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.

Similar presentations

Presentation on theme: "A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback