Download presentation
Presentation is loading. Please wait.
Published byAlan Cole Modified over 8 years ago
1
1 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Computer Science Department Carnegie Mellon University
2
2 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Multithreaded Machines Are Everywhere How can we use them? Parallelism! C P C C P C C P C Shared Memory SUN MAJC, IBM Power4 ALPHA 21464Dual PentiumSGI Origin Threads C P C C P C Shared Memory C C P C P
3
3 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Automatic Parallelization Proving independence of threads is hard: –complex control flow –complex data structures –pointers, pointers, pointers –run-time inputs How can we make the compiler’s job feasible? Thread-Level Speculation (TLS)
4
4 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example while (...){ x = hash[index1]; … hash[index2] = y;... } Time = hash[3] … hash[10] = … Processor = hash[19] … hash[21] = … = hash[33] … hash[30] = … = hash[10] … hash[25] = …
5
5 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … Epoch 1 = hash[19] … hash[21] = … Epoch 2 = hash[33] … hash[30] = … Epoch 3 = hash[10] … hash[25] = … Epoch 4 Processor
6
6 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … Epoch 1 = hash[19] … hash[21] = … Epoch 2 = hash[33] … hash[30] = … Epoch 3 = hash[10] … hash[25] = … Epoch 4 Processor Violation!
7
7 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … commit? Epoch 1 = hash[19] … hash[21] = … commit? Epoch 2 = hash[33] … hash[30] = … commit? Epoch 3 = hash[10] … hash[25] = … commit? Epoch 4 Processor Violation!
8
8 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Example of Thread-Level Speculation Time = hash[3] … hash[10] = … commit? Epoch 1 = hash[19] … hash[21] = … commit? Epoch 2 = hash[33] … hash[30] = … commit? Epoch 3 = hash[10] … hash[25] = … commit? Epoch 4 Processor Violation! = hash[10] … hash[25] = … commit? Epoch 4 Retry
9
9 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Goals of Our Approach 1) Handle arbitrary memory accesses –i.e. not just array references 2) Preserve performance of non-speculative workloads –keep hardware support minimal and simple 3) Apply to any scale of multithreaded architecture –CMPs, SMT processors, more traditional MPs effective, simple, and scalable TLS
10
10 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Overview of Our Approach System requirements: 1) Detect data dependence violations extend invalidation-based cache coherenceextend invalidation-based cache coherence 2) Buffer speculative modifications use the caches as speculative buffersuse the caches as speculative buffers coherence already works at a variety of scales hence our scheme is also scalable
11
11 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Related Schemes Wisconsin (Multiscalar, Trace Processor)Wisconsin (Multiscalar, Trace Processor) Stanford (Hydra)Stanford (Hydra) U.P. Catalunya (Speculative Multithreading)U.P. Catalunya (Speculative Multithreading) Intel/U. Portland (Dynamic Multithreading)Intel/U. Portland (Dynamic Multithreading) Illinois at U.C. (I-ACOMA)Illinois at U.C. (I-ACOMA) our approach seamlessly scales both up and down
12
12 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Outline Details of our Approach Details of our Approach –life cycle of an epoch –speculative coherence –what happens at commit time –forwarding data between epochs PerformancePerformance ConclusionsConclusions
13
13 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Init Speculative Work Wait to be Homefree? Slow Commit: Fast Commit: Complete, Pass Homefree Time
14
14 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms
15
15 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Thread A: Cache Processor - Tag Invalid State - Data Thread B:
16
16 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor - Tag Invalid State - Data Load X Read Thread A:Thread B:
17
17 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor X Tag Excl. State 2 Data Fill Load X Thread A:Thread B: Read
18
18 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor X Tag Excl. State 2 Data Read-Exclusive Load X Store X=3 read-exclusive invalidates all other copies Thread A:Thread B:
19
19 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Cache Processor - Tag Invalid State - Data Load X Store X=3 read-exclusive invalidates all other copies Thread A:Thread B: Read-ExclusiveInvalidation
20
20 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon MESI Coherence Example Shared Memory (X ) Cache Processor X Tag Dirty State 3 Data Cache Processor - Tag Invalid State - Data Load X Store X=3 the state ‘dirty’ implies exclusiveness Fill Thread A:Thread B: InvalidationRead-Exclusive
21
21 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Highlights of our scheme: –detection of a data dependence violation –speculatively modified and shared cache lines Epoch5: Epoch6: Load X Epoch4: Store X=3 Load X
22
22 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor - Tag Invalid State - Data Epoch6: Load X Read
23
23 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Fill Spec. Loaded track which lines are speculatively loaded Read
24
24 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Spec. Loaded Store X=3 Sp Read-Ex (epoch5) speculative msgs piggyback epoch number
25
25 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor X Tag Excl. State 2 Data Epoch6: Load X Spec. Loaded Store X=3 Sp Inv (epoch5) epoch5 < epoch6, and speculatively loaded Sp Read-Ex (epoch5)
26
26 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch5: Cache Processor - Tag Invalid State - Data Epoch6: Load X Store X=3 speculation failed! speculation fails for epoch 6 Sp Inv (epoch5)Sp Read-Ex (epoch5)
27
27 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Fill Spec. Modified track which lines are speculatively modified Cache Processor - Tag Invalid State - Data Epoch6: Load X speculation failed! Sp Inv (epoch5)Sp Read-Ex (epoch5)
28
28 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Highlights of our scheme: –detection of a data dependence violation –speculatively modified and shared cache lines Epoch5: Epoch6: Load X Epoch4: Store X=3 Load X
29
29 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Spec. Modified
30
30 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag Excl. State 3 Data Epoch5: Store X=3 Spec. Modified Load X Read
31
31 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor - Tag Invalid State - Data Epoch4: Cache Processor X Tag State 3 Data Epoch5: Store X=3 Spec. Modified Load X notify shared Shared both speculatively modified and shared! Read
32
32 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Coherence Example Shared Memory (X=2) Cache Processor X Tag State 2 Data Epoch4: Cache Processor X Tag State 3 Data Epoch5: Store X=3 Spec. Modified Load X Shared multiple versions of the same cache line Fill Shared Spec. Loaded Readnotify shared
33
33 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Summary of New Speculative Line State New cache line state: –has it been speculatively loaded? detect dependence violationsdetect dependence violations –has it been speculatively modified? buffer speculative modificationsbuffer speculative modifications –is it in a speculative shared or exclusive state? important performance optimizationsimportant performance optimizations What if a speculative cache line is replaced? –speculation fails for that epoch
34
34 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Implementation of Speculative State Cache Processor TagState Data --- --- --- ---
35
35 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Implementation of Speculative State Cache Processor State Data ------ Tag - - - - -- SL - - - - SM - - - - Speculatively Modified Speculatively Loaded modest amount of extra space
36
36 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms Squash
37
37 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Sp Ex* Sp Sh* Sp Ex* Tag * * * *Sp Sh* SL 1 1 0 1 SM 0 0 1 1 Flash Reset
38
38 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Excl* * Sp Ex* Tag * * * ** SL 0 0 0 0 SM 0 0 1 1 Shared Sp Sh If Set then Invalidate; Flash Reset
39
39 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Fails Cache Processor State Data Excl* * Invalid* Tag * * * *Invalid* SL 0 0 0 0 SM 0 0 0 0 quick bit operation Shared
40
40 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Life Cycle of an Epoch Spawned Becomes Speculative Commit? Speculative Coherence Complete, Pass Homefree Time to Squash or Commit Mechanisms Commit
41
41 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Sp Ex* Sp Sh* Sp Ex* Tag * * * *Sp Sh* SL 1 1 0 1 SM 0 0 1 1 Flash Reset
42
42 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * *Sp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared SM & Exclusive: Become Dirty
43
43 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * *Sp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared SM & Shared: Need Exclusive Access want to avoid searching entire cache
44
44 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared ownership required buffer (ORB) - - X ORB
45
45 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared Upgrade-Request (X) - - X ORB
46
46 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Sp Ex* Tag * * * XSp Sh* SL 0 0 0 0 SM 0 0 1 1 Shared Ack (X) - - - ORB If SM, Become Dirty; Flash Reset Upgrade-Request (X)
47
47 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon When Speculation Succeeds Cache Processor State Data Excl* * Dirty* Tag * * * XDirty* SL 0 0 0 0 SM 0 0 0 0 Shared - - - ORB flush the ORB, then quick bit operations
48
48 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Forwarding Data Between Epochs predictable dependences cause frequent violationspredictable dependences cause frequent violations compiler inserts wait-signal synchronizationcompiler inserts wait-signal synchronization Store X Load X synchronize to avoid violations Wait Forwarding With Store X Signal Load X
49
49 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Outline Details of our ApproachDetails of our Approach Performance Performance –simulation infrastructure –single-chip multiprocessor performance –scaling beyond chip boundaries ConclusionsConclusions
50
50 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Simulation Infrastructure Compiler system and tools based on SUIF –help analyze dependences, insert synchronization –produce MIPS binaries containing TLS primitives Benchmarks (all run to completion) –buk, compress95, ijpeg, equake Simulator –superscalar, similar to MIPS R10K –models all bandwidth and contention detailed simulation! C C P C P Crossbar
51
51 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP 56.6%47.3%39.3%22.1% Parallel Coverage:
52
52 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP program speedups are limited by coverage 56.6%47.3%39.3%22.1% Parallel Coverage:
53
53 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Varying the Number of Processors Normalized Region Execution Time buk and equake are memory-bound compress95 and ijpeg are computation-intensive
54
54 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Varying the Number of Processors Normalized Region Execution Time buk and equake scale well passing the homefree token is not a bottleneck
55
55 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance of the ORB (on a 4-CMP) Application Average Flush Latency (cycles) ORB Size (entries) AverageMaximum buk13.952.389 compress950.040.018 equake0.130.0412 ijpeg1.060.175 a small ORB is sufficient
56
56 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Tracking Dependences Per Cache Line Problem: –analagous to false sharing: false violations –write-after-write dependences also cause violations but not a true dependence!but not a true dependence!Solution: –track dependences at a word granularity –have an SM and SL bit per word in each cache line is per-word state worth the extra overhead?
57
57 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Tracking Dependences Per Cache Line Does it do any good? –not for our 4 benchmarks –adding this support showed no improvement Why not? –buk and equake have random access patterns –compress95 is heavily synchronized –ijpeg is unrolled to avoid false sharing existing techniques for avoiding false sharing can address this problem
58
58 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Shared Memory C C P C P Crossbar C C P C P Node 200 Cycles simulate architectures with 1, 2 and 4 nodes
59
59 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Normalized Region Execution Time multi-chip systems benefit from TLS
60
60 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Scaling Beyond Chip Boundaries Normalized Region Execution Time our scheme scales well
61
61 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Conclusions The overheads of our scheme are low: –mechanisms to squash or commit are not a bottleneck –per-word speculative state is not always necessary It offers compelling performance improvements: –program speedups from 8% to 46% on a 4-processor CMP –program speedups up to 75% on multi-chip architectures It is scalable: –coherence provides elegant data dependence tracking seamless TLS on a wide range of architectures
62
62 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Backup Slides
63
63 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon The I-ACOMA Scalable Approach The I-ACOMA approach is hierarchical –Memory Disambiguation Table (MDT) structure used to detect data dependence violationsstructure used to detect data dependence violations –scalable hardware support using a hierarchy of MDTs –hierarchical ordering of threads one level inside each multiprocessor chipone level inside each multiprocessor chip another level across chipsanother level across chips Our approach is flat –speculation occurs along a flat speculation level our scheme has no thread placement constraints
64
64 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon C C C C C C Underlying Architecture Interconnection Network M C M C M C M C M C M C C P C C P C PP focus on the level where coherence begins speculation level
65
65 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Underlying Architecture Shared Memory PPPP C C speculation level focus on the level where coherence begins
66
66 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculation in a Shared Cache Why? 1) Shared-cache multithreaded architectures eg. simultaneous multithreadingeg. simultaneous multithreading 2) Context switch to another chain of speculation 3) Start new epoch while current epoch waits to commit How? replicate the speculative context
67
67 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Support for Speculation in a Shared Cache replicate the speculative context Cache Processor State Data -- - -- Tag - - - --- SL - - - - SM - - - - - ORB
68
68 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Support for Speculation in a Shared Cache Cache Processor State Data -- - -- Tag - - - --- SL - - - - SM - - - - - ORB SL - - - - SM - - - - ORB replicate the speculative context
69
69 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Preserving Correctness Speculation must fail whenever speculative state is lost –eg., replacement of a speculative line, ORB overflow Any exceptions are suppressed until epoch is homefree –eg., divide by zero, segfault Polling violation detection must avoid infinite looping –requires a poll inside each loop No system calls while speculative (for now) ensures original sequential semantics are preserved
70
70 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Epoch Numbers Represent a partial ordering –signed-compare sequence numbers if TIDs match allows for wrap-aroundallows for wrap-around –otherwise the epochs are unordered from independent programsfrom independent programs from independent chains of speculation within one programfrom independent chains of speculation within one program Thread Identifier (TID)Sequence Number
71
71 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Thread Model Round-robin schedule of epochs to processors –not a requirement of our scheme, just for convenience Each epoch spawns the next –through a lightweight fork instruction (10 cycles) Violations detected through polling –each epoch runs to completion before detecting failed speculation and restarting Violation chaining –if an epoch suffers a violation, we squash all logically-later epochs many possibilities to be evaluated in future work
72
72 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Multiple Writers Example Original 0000 ABCD SM[] Data Epoch i+1 1010 GBHD SM[] Data Committed 0000 GBHF SM[] Data Epoch i 1001 EBCF SM[] Data combine speculatively modified lines at commit time
73
73 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Pipeline Parameters Issue Width 4 Functional Units 2Int, 2FP, 1Mem, 1Bra Reorder Buffer Size 32 Integer Multiply 12 cycles Integer Divide 76 cycles All Other Integer 1 cycle FP Divide 15 cycles FP Square Root 20 cycles All Other FP 2 cycles Branch Prediction GShare (16KB, 8 history bits)
74
74 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Memory Parameters Cache Line Size 32B Instruction Cache 32KB, 4-way set-assoc Data Cache 32KB, 2-way set-assoc, 2 banks Unified Secondary Cache 2MB, 4-way set-assoc, 4 banks Miss Handlers 8 for data, 2 for insts Crossbar Interconnect 8B per cycle per bank Minimum Miss Latency to Secondary Cache 10 cycles Minimum Miss Latency to Local Memory 75 cycles Main Memory Bandwidth 1 access per 20 cycles Intra-Chip Communication Latency 10 cycles Inter-Chip Communication Latency 200 cycles
75
75 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Benchmark Details: Regions and Epochs Application Unrolling Factor Avg. Insts. per Epoch Parallel Coverage buk 881.022.8% 8135.033.8% compress95 1196.724.6% 1240.422.7% ijpeg 321467.98.2% 180.82.2% 184.05.0% 1100.36.7% equake 12925.539.3%
76
76 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Performance on a 4-Processor CMP Application Overall Region Speedup Parallel Coverage Program Speedup buk2.2656.6%1.46 compress951.2747.3%1.12 equake1.7739.3%1.21 ijpeg1.9422.1%1.08 program speedups are limited by coverage
77
77 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon TLS Overheads Application Dynamic Instruction Overhead Misses to Other Caches buk5.3%34.47% compress9530.6%3.02% equake3.7%1.67% ijpeg7.0%65.00% buk and ijpeg can benefit greatly from improved locality
78
78 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Impact of Communication Latency Normalized Region Execution Time speedups still possible with higher latencies
79
79 A Scalable Approach to Thread-Level SpeculationSteffan Carnegie Mellon Speculative Invalidation of Non-Speculative Cache Lines Normalized Region Execution Time a worthwhile enhancement of our baseline scheme
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.