Download presentation
Presentation is loading. Please wait.
1
Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003
2
We’ve Already Encountered Speculation in ECE 568 Branch prediction – When a branch is encountered, guess whether it is taken or not If the guess is correct, we have gained time If the guess is incorrect, we must undo any incorrectly executed instructions and move on Multi-word cache lines – When a cache miss is encountered, we bring in the entire cache line, not just the word we’re looking for If the access pattern shows spatial locality, we are prefetching other words that the program will soon ask for, thereby saving time. If the speculation is too aggressive (i.e., the cache lines are too long), we’ll fetch many words uselessly.
3
Speculation (contd.) Value Prediction – (Not covered in this course) – Idea is to predict what the value of a variable will be and use the predicted value. If the predicted value was right, we gain some time; if it was wrong, we did some useless execution. If this execution changed processor state, these changes will have to be undone. Not used in practice (to my knowledge): mainly an academic exercise so far.
4
Speculating on Time The pipeline clock cycle is the time by which each stage is guaranteed to complete its assigned operation This time is a function of – Actual hardware parameters: Gate and wire delays vary within the same die, from one die to another, and from one wafer to another. – Data involved in the computation: Example: Ripple-carry adder. Worst-case execution time is the time it takes to ripple the carry through from carry-in of the least significant, to the carry-out of the most significant, stage. Actual execution times may vary considerably. Requiring the worst-case delays to be accounted for often forces designers to be overly conservative in setting the clock rates
5
Timing Speculation: Basic Idea Suppose F is the frequency at which the pipeline is guaranteed to function correctly Run the pipeline at a somewhat higher rate, f. – Much of the time, this clock period, t_p=1/f, will be sufficient for all pipeline stages, and we’ll gain in execution speed – Some of the time, we may need more time: Need to discover when this is the case When this is the case, provide additional time by allowing the pipeline stages additional cycles to complete their operation
6
Implementation Recall that pipeline stages are separated by latches Duplicate each pipeline latch by introducing a shadow latch Consider any stage of the pipeline. Suppose it starts some activity at time 0. – At time t_p=1/f, latch the output of that stage into the regular pipeline latch. – At time T_p=1/F, latch the output of the stage into the shadow latch. – Compare the results of the regular and shadow latches – If they agree, do nothing: running at a higher speed has paid off – If they don’t agree, Use the result of the shadow latch as the correct one Squelch the computation that the following stage began on the basis of the incorrect shadow latch results Restart the computation in the following stage using the correct results, as stored in the shadow latch
7
Unless otherwise stated, all figures are from Ernst, et al., MICRO-36, 2003.
8
Issues to Consider How aggressive should we be? – If f is too high, a large fraction of the results will require correction with the shadow latch and we’ll actually lose time – If f is too low, the clock will be unnecessarily too slow and we won’t gain much
9
Issues to Consider (contd.) What about F? – Lower bound of F is given by the worst-case path (for the worst-case inputs) – What happens if F is too small? [This is one of the few instances in design when being too conservative at one level affects correctness of functioning!] F may be so small that the results of the next computation propagate through the stage and arrive at the shadow latch – We’d then be comparing the results of two different operations!
10
Metastability If the input data is not stable when the clock transition happens, the output of the latch may float at a voltage that is in neither the 0 nor in the 1 logic ranges – Duration of metastable stage is not bounded – Different gates may interpret such indeterminate voltages differently (in terms of logic values) Cannot reduce the probability of metastability to zero: all we can do is to keep it sufficiently low for all practical purposes
11
Recovery Technique 1: Global Clock Gating If any stage detects a timing problem – Stall the entire pipeline for one clock cycle. – Use this additional clock cycle to recompute using the correct shadow-latch values
14
Recovery Technique 2: Counterflow Pipelining When a mismatch (between regular and shadow latch contents) is detected: – Assert a bubble signal, to specify that the erring pipeline slot is now to be considered a bubble. – In the subsequent cycle, inject the shadow latch value into the next stage, allowing the errant operation to continue with the correct values – Trigger a flush train, traveling backwards from the errant stage, flushing operations at each stage it visits (Question: Is this flush operation necessary?? Can we do something else to avoid it?)
17
Power Consumption Using a Processor to Fry an Egg From: www.phys.ncku.edu.tw/~htsu/humor/fry_egg.html
18
Power Density From: Hsu and Feng, “A Power-Aware Real-Time System…”, 2005
19
Power Implications: Dynamic Power From Krishna & Lee: IEEE Trans. Computers, 2003.
20
Static Power Even when there is no switching, transistors leak current Leakage power is a strongly increasing function of temperature and supply voltage; it is inversely proportional to the threshold voltage.
21
Subthreshold leakage vs temperature From: Do, et al: Tech Report 2007-06, Dept of CSE, Chalmers Instt of Tech
22
Leakage Current vs Vdd From Do et al., op cit.
23
Voltage Control for Razor Latch System
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.