Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher.

Similar presentations


Presentation on theme: "1 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher."— Presentation transcript:

1 1 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Computer Science Department Carnegie Mellon University

2 2 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Multithreaded Machines Are Everywhere  How can we use them? Parallelism! C P C C P C C P C Shared Memory SUN MAJC, IBM Power4 ALPHA 21464Dual PentiumSGI Origin Threads C P C C P C Shared Memory C C P C P

3 3 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Automatic Parallelization Proving independence of threads is hard: –complex control flow –complex data structures –pointers, pointers, pointers –run-time inputs How can we make the compiler’s job feasible?  Thread-Level Speculation (TLS)

4 4 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example while (...){ x = hash[index1]; … hash[index2] = y;... } Time = hash[3] … hash[10] = … Processor = hash[19] … hash[21] = … = hash[33] … hash[30] = … = hash[10] … hash[25] = …

5 5 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … Epoch 1 = hash[19] … hash[21] = … Epoch 2 = hash[33] … hash[30] = … Epoch 3 = hash[10] … hash[25] = … Epoch 4 Processor

6 6 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … Epoch 1 = hash[19] … hash[21] = … Epoch 2 = hash[33] … hash[30] = … Epoch 3 = hash[10] … hash[25] = … Epoch 4 Processor Violation!

7 7 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … commit? Epoch 1 = hash[19] … hash[21] = … commit? Epoch 2 = hash[33] … hash[30] = … commit? Epoch 3 = hash[10] … hash[25] = … commit? Epoch 4 Processor Violation! 

8 8 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … commit? Epoch 1 = hash[19] … hash[21] = … commit? Epoch 2 = hash[33] … hash[30] = … commit? Epoch 3 = hash[10] … hash[25] = … commit? Epoch 4 Processor Violation!  = hash[10] … hash[25] = … commit? Epoch 4 Retry

9 9 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Goals of Our Approach 1) Handle arbitrary memory accesses –i.e. not just array references 2) Preserve performance of non-speculative workloads –keep hardware support minimal and simple 3) Apply to any scale of multithreaded architecture –CMPs, SMT processors, more traditional MPs  effective, simple, and scalable TLS

10 10 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Overview of Our Approach System requirements: 1) Detect data dependence violations extend invalidation-based cache coherenceextend invalidation-based cache coherence 2) Buffer speculative modifications use the caches as speculative buffersuse the caches as speculative buffers  coherence already works at a variety of scales  hence our scheme is also scalable

11 11 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Related Schemes Wisconsin (Multiscalar, Trace Processor)Wisconsin (Multiscalar, Trace Processor) Stanford (Hydra)Stanford (Hydra) U.P. Catalunya (Speculative Multithreading)U.P. Catalunya (Speculative Multithreading) Intel/U. Portland (Dynamic Multithreading)Intel/U. Portland (Dynamic Multithreading) Illinois at U.C. (I-ACOMA)Illinois at U.C. (I-ACOMA)  our approach seamlessly scales both up and down

12 12 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Outline Details of our Approach  Details of our Approach –life cycle of an epoch –speculative coherence –what happens at commit time –forwarding data between epochs PerformancePerformance ConclusionsConclusions

13 13 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Init Speculative Work Wait to be Homefree? Slow Commit: Fast Commit: Complete, Pass Homefree Time

14 14 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms

15 15 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Thread A: Cache Processor - Tag Invalid State - Data Thread B:

16 16 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor - Tag Invalid State - Data Load X Read Thread A:Thread B:

17 17 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor X Tag Excl. State 2 Data Fill Load X Thread A:Thread B: Read

18 18 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor X Tag Excl. State 2 Data Read-Exclusive Load X Store X=3  read-exclusive invalidates all other copies Thread A:Thread B:

19 19 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor - Tag Invalid State - Data Load X Store X=3  read-exclusive invalidates all other copies Thread A:Thread B: Read-ExclusiveInvalidation

20 20 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X ) Cache Processor X Tag Dirty State 3 Data Cache Processor - Tag Invalid State - Data Load X Store X=3  the state ‘dirty’ implies exclusiveness Fill Thread A:Thread B: InvalidationRead-Exclusive

21 21 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Highlights of our scheme: –detection of a data dependence violation –speculatively modified and shared cache lines Epoch5: Epoch6: Load X Epoch4: Store X=3 Load X

22 22 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor - Tag Invalid State - Data Epoch6: Load X Read

23 23 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Fill Spec. Loaded  track which lines are speculatively loaded Read

24 24 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Spec. Loaded Store X=3 Sp Read-Ex (epoch5)  speculative msgs piggyback epoch number

25 25 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Spec. Loaded Store X=3 Sp Inv (epoch5)  epoch5 < epoch6, and speculatively loaded Sp Read-Ex (epoch5)

26 26 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor - Tag Invalid State - Data Epoch6: Load X Store X=3 speculation failed!   speculation fails for epoch 6 Sp Inv (epoch5)Sp Read-Ex (epoch5)

27 27 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Fill Spec. Modified  track which lines are speculatively modified Cache Processor - Tag Invalid State - Data Epoch6: Load X speculation failed!  Sp Inv (epoch5)Sp Read-Ex (epoch5)

28 28 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Highlights of our scheme: –detection of a data dependence violation –speculatively modified and shared cache lines Epoch5: Epoch6: Load X Epoch4: Store X=3 Load X

29 29 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Spec. Modified

30 30 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Spec. Modified Load X Read

31 31 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag State 3 Data Epoch5: Store X=3 Spec. Modified Load X notify shared Shared  both speculatively modified and shared! Read

32 32 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor X Tag State 2 Data Epoch4: Cache Processor X Tag State 3 Data Epoch5: Store X=3 Spec. Modified Load X Shared  multiple versions of the same cache line Fill Shared Spec. Loaded Readnotify shared

33 33 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Summary of New Speculative Line State New cache line state: –has it been speculatively loaded? detect dependence violationsdetect dependence violations –has it been speculatively modified? buffer speculative modificationsbuffer speculative modifications –is it in a speculative shared or exclusive state? important performance optimizationsimportant performance optimizations What if a speculative cache line is replaced? –speculation fails for that epoch

34 34 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Implementation of Speculative State Cache Processor TagState Data --- --- --- ---

35 35 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Implementation of Speculative State Cache Processor State Data ------ Tag - - - - -- SL - - - - SM - - - - Speculatively Modified Speculatively Loaded  modest amount of extra space

36 36 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms Squash

37 37 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Sp Ex* Sp Sh* Sp Ex* Tag * * * *Sp Sh* SL 1 1 0 1 SM 0 0 1 1 Flash Reset

38 38 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Excl* * Sp Ex* Tag * * * ** SL 0 0 0 0 SM 0 0 1 1 Shared Sp Sh If Set then Invalidate; Flash Reset

39 39 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Excl* * Invalid* Tag * * * *Invalid* SL 0 0 0 0 SM 0 0 0 0  quick bit operation Shared

40 40 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms Commit

41 41 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Sp Ex* Sp Sh* Sp Ex* Tag * * * *Sp Sh* SL 1 1 0 1 SM 0 0 1 1 Flash Reset

42 42 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * *Sp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared SM & Exclusive: Become Dirty

43 43 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * *Sp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared SM & Shared: Need Exclusive Access  want to avoid searching entire cache

44 44 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared  ownership required buffer (ORB) - - X ORB

45 45 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared Upgrade-Request (X) - - X ORB

46 46 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared Ack (X) - - - ORB If SM, Become Dirty; Flash Reset Upgrade-Request (X)

47 47 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Dirty* Tag * * * XDirty* SL 0 0 0 0 SM 0 0 0 0 Shared - - - ORB  flush the ORB, then quick bit operations

48 48 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Forwarding Data Between Epochs predictable dependences cause frequent violationspredictable dependences cause frequent violations compiler inserts wait-signal synchronizationcompiler inserts wait-signal synchronization  Store X Load X  synchronize to avoid violations Wait Forwarding With Store X Signal Load X

49 49 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Outline Details of our ApproachDetails of our Approach Performance  Performance –simulation infrastructure –single-chip multiprocessor performance –scaling beyond chip boundaries ConclusionsConclusions

50 50 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Simulation Infrastructure Compiler system and tools based on SUIF –help analyze dependences, insert synchronization –produce MIPS binaries containing TLS primitives Benchmarks (all run to completion) –buk, compress95, ijpeg, equake Simulator –superscalar, similar to MIPS R10K –models all bandwidth and contention  detailed simulation! C C P C P Crossbar

51 51 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP 56.6%47.3%39.3%22.1% Parallel Coverage:

52 52 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP  program speedups are limited by coverage 56.6%47.3%39.3%22.1% Parallel Coverage:

53 53 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Varying the Number of Processors Normalized Region Execution Time  buk and equake are memory-bound  compress95 and ijpeg are computation-intensive

54 54 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Varying the Number of Processors Normalized Region Execution Time  buk and equake scale well  passing the homefree token is not a bottleneck

55 55 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance of the ORB (on a 4-CMP) Application Average Flush Latency (cycles) ORB Size (entries) AverageMaximum buk13.952.389 compress950.040.018 equake0.130.0412 ijpeg1.060.175  a small ORB is sufficient

56 56 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Tracking Dependences Per Cache Line Problem: –analagous to false sharing: false violations –write-after-write dependences also cause violations but not a true dependence!but not a true dependence!Solution: –track dependences at a word granularity –have an SM and SL bit per word in each cache line  is per-word state worth the extra overhead?

57 57 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Tracking Dependences Per Cache Line Does it do any good? –not for our 4 benchmarks –adding this support showed no improvement Why not? –buk and equake have random access patterns –compress95 is heavily synchronized –ijpeg is unrolled to avoid false sharing  existing techniques for avoiding false sharing can address this problem

58 58 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Shared Memory C C P C P Crossbar C C P C P Node 200 Cycles  simulate architectures with 1, 2 and 4 nodes

59 59 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Normalized Region Execution Time  multi-chip systems benefit from TLS

60 60 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Normalized Region Execution Time  our scheme scales well

61 61 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Conclusions The overheads of our scheme are low: –mechanisms to squash or commit are not a bottleneck –per-word speculative state is not always necessary It offers compelling performance improvements: –program speedups from 8% to 46% on a 4-processor CMP –program speedups up to 75% on multi-chip architectures It is scalable: –coherence provides elegant data dependence tracking  seamless TLS on a wide range of architectures

62 62 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Backup Slides

63 63 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon The I-ACOMA Scalable Approach The I-ACOMA approach is hierarchical –Memory Disambiguation Table (MDT) structure used to detect data dependence violationsstructure used to detect data dependence violations –scalable hardware support using a hierarchy of MDTs –hierarchical ordering of threads one level inside each multiprocessor chipone level inside each multiprocessor chip another level across chipsanother level across chips Our approach is flat –speculation occurs along a flat speculation level  our scheme has no thread placement constraints

64 64 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon C C C C C C Underlying Architecture Interconnection Network M C M C M C M C M C M C C P C C P C PP  focus on the level where coherence begins speculation level

65 65 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Underlying Architecture Shared Memory PPPP C C speculation level  focus on the level where coherence begins

66 66 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculation in a Shared Cache Why? 1) Shared-cache multithreaded architectures eg. simultaneous multithreadingeg. simultaneous multithreading 2) Context switch to another chain of speculation 3) Start new epoch while current epoch waits to commit How?  replicate the speculative context

67 67 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Support for Speculation in a Shared Cache  replicate the speculative context Cache Processor State Data -- - -- Tag - - - --- SL - - - - SM - - - - - ORB

68 68 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Support for Speculation in a Shared Cache Cache Processor State Data -- - -- Tag - - - --- SL - - - - SM - - - - - ORB SL - - - - SM - - - - ORB  replicate the speculative context

69 69 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Preserving Correctness Speculation must fail whenever speculative state is lost –eg., replacement of a speculative line, ORB overflow Any exceptions are suppressed until epoch is homefree –eg., divide by zero, segfault Polling violation detection must avoid infinite looping –requires a poll inside each loop No system calls while speculative (for now)  ensures original sequential semantics are preserved

70 70 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Epoch Numbers Represent a partial ordering –signed-compare sequence numbers if TIDs match allows for wrap-aroundallows for wrap-around –otherwise the epochs are unordered from independent programsfrom independent programs from independent chains of speculation within one programfrom independent chains of speculation within one program Thread Identifier (TID)Sequence Number

71 71 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Thread Model Round-robin schedule of epochs to processors –not a requirement of our scheme, just for convenience Each epoch spawns the next –through a lightweight fork instruction (10 cycles) Violations detected through polling –each epoch runs to completion before detecting failed speculation and restarting Violation chaining –if an epoch suffers a violation, we squash all logically-later epochs  many possibilities to be evaluated in future work

72 72 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Multiple Writers Example Original 0000 ABCD SM[] Data Epoch i+1 1010 GBHD SM[] Data Committed 0000 GBHF SM[] Data Epoch i 1001 EBCF SM[] Data  combine speculatively modified lines at commit time

73 73 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Pipeline Parameters Issue Width 4 Functional Units 2Int, 2FP, 1Mem, 1Bra Reorder Buffer Size 32 Integer Multiply 12 cycles Integer Divide 76 cycles All Other Integer 1 cycle FP Divide 15 cycles FP Square Root 20 cycles All Other FP 2 cycles Branch Prediction GShare (16KB, 8 history bits)

74 74 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Memory Parameters Cache Line Size 32B Instruction Cache 32KB, 4-way set-assoc Data Cache 32KB, 2-way set-assoc, 2 banks Unified Secondary Cache 2MB, 4-way set-assoc, 4 banks Miss Handlers 8 for data, 2 for insts Crossbar Interconnect 8B per cycle per bank Minimum Miss Latency to Secondary Cache 10 cycles Minimum Miss Latency to Local Memory 75 cycles Main Memory Bandwidth 1 access per 20 cycles Intra-Chip Communication Latency 10 cycles Inter-Chip Communication Latency 200 cycles

75 75 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Benchmark Details: Regions and Epochs Application Unrolling Factor Avg. Insts. per Epoch Parallel Coverage buk 881.022.8% 8135.033.8% compress95 1196.724.6% 1240.422.7% ijpeg 321467.98.2% 180.82.2% 184.05.0% 1100.36.7% equake 12925.539.3%

76 76 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP Application Overall Region Speedup Parallel Coverage Program Speedup buk2.2656.6%1.46 compress951.2747.3%1.12 equake1.7739.3%1.21 ijpeg1.9422.1%1.08  program speedups are limited by coverage

77 77 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon TLS Overheads Application Dynamic Instruction Overhead Misses to Other Caches buk5.3%34.47% compress9530.6%3.02% equake3.7%1.67% ijpeg7.0%65.00%  buk and ijpeg can benefit greatly from improved locality

78 78 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Impact of Communication Latency Normalized Region Execution Time  speedups still possible with higher latencies

79 79 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Invalidation of Non-Speculative Cache Lines Normalized Region Execution Time  a worthwhile enhancement of our baseline scheme


Download ppt "1 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher."

Similar presentations


Ads by Google