Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CSCI 4717/5717 Computer Architecture
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Pipelining and Control Hazards Oct
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Chapter 8. Pipelining.
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
Instruction-Level Parallelism (ILP)
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Synchronous Digital Design Methodology and Guidelines
Pipelined Processor II (cont’d) CPSC 321
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
COMP541 Arithmetic Circuits
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Basics of Energy & Power Dissipation
Pipelining and Parallelism Mark Staveley
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Patricia Gonzalez Divya Akella VLSI Class Project.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
ECE 171 Digital Circuits Chapter 9 Hazards Herbert G. Mayer, PSU Status 2/21/2016 Copied with Permission from prof. Mark PSU ECE.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
CS203 – Advanced Computer Architecture Pipelining Review.
CS203 – Advanced Computer Architecture
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Pipelining 7/12/2013.
Chapter Six.
Temperature and Power Management
Computer Architecture Chapter (14): Processor Structure and Function
Morgan Kaufmann Publishers
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Pipelining review.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Pipelining in more detail
How to improve (decrease) CPI
Chapter Six.
Chapter Six.
Control unit extension for data hazards
CS203 – Advanced Computer Architecture
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003

We’ve Already Encountered Speculation in ECE 568 Branch prediction – When a branch is encountered, guess whether it is taken or not If the guess is correct, we have gained time If the guess is incorrect, we must undo any incorrectly executed instructions and move on Multi-word cache lines – When a cache miss is encountered, we bring in the entire cache line, not just the word we’re looking for If the access pattern shows spatial locality, we are prefetching other words that the program will soon ask for, thereby saving time. If the speculation is too aggressive (i.e., the cache lines are too long), we’ll fetch many words uselessly.

Speculation (contd.) Value Prediction – (Not covered in this course) – Idea is to predict what the value of a variable will be and use the predicted value. If the predicted value was right, we gain some time; if it was wrong, we did some useless execution. If this execution changed processor state, these changes will have to be undone. Not used in practice (to my knowledge): mainly an academic exercise so far.

Speculating on Time The pipeline clock cycle is the time by which each stage is guaranteed to complete its assigned operation This time is a function of – Actual hardware parameters: Gate and wire delays vary within the same die, from one die to another, and from one wafer to another. – Data involved in the computation: Example: Ripple-carry adder. Worst-case execution time is the time it takes to ripple the carry through from carry-in of the least significant, to the carry-out of the most significant, stage. Actual execution times may vary considerably. Requiring the worst-case delays to be accounted for often forces designers to be overly conservative in setting the clock rates

Timing Speculation: Basic Idea Suppose F is the frequency at which the pipeline is guaranteed to function correctly Run the pipeline at a somewhat higher rate, f. – Much of the time, this clock period, t_p=1/f, will be sufficient for all pipeline stages, and we’ll gain in execution speed – Some of the time, we may need more time: Need to discover when this is the case When this is the case, provide additional time by allowing the pipeline stages additional cycles to complete their operation

Implementation Recall that pipeline stages are separated by latches Duplicate each pipeline latch by introducing a shadow latch Consider any stage of the pipeline. Suppose it starts some activity at time 0. – At time t_p=1/f, latch the output of that stage into the regular pipeline latch. – At time T_p=1/F, latch the output of the stage into the shadow latch. – Compare the results of the regular and shadow latches – If they agree, do nothing: running at a higher speed has paid off – If they don’t agree, Use the result of the shadow latch as the correct one Squelch the computation that the following stage began on the basis of the incorrect shadow latch results Restart the computation in the following stage using the correct results, as stored in the shadow latch

Unless otherwise stated, all figures are from Ernst, et al., MICRO-36, 2003.

Issues to Consider How aggressive should we be? – If f is too high, a large fraction of the results will require correction with the shadow latch and we’ll actually lose time – If f is too low, the clock will be unnecessarily too slow and we won’t gain much

Issues to Consider (contd.) What about F? – Lower bound of F is given by the worst-case path (for the worst-case inputs) – What happens if F is too small? [This is one of the few instances in design when being too conservative at one level affects correctness of functioning!] F may be so small that the results of the next computation propagate through the stage and arrive at the shadow latch – We’d then be comparing the results of two different operations!

Metastability If the input data is not stable when the clock transition happens, the output of the latch may float at a voltage that is in neither the 0 nor in the 1 logic ranges – Duration of metastable stage is not bounded – Different gates may interpret such indeterminate voltages differently (in terms of logic values) Cannot reduce the probability of metastability to zero: all we can do is to keep it sufficiently low for all practical purposes

Recovery Technique 1: Global Clock Gating If any stage detects a timing problem – Stall the entire pipeline for one clock cycle. – Use this additional clock cycle to recompute using the correct shadow-latch values

Recovery Technique 2: Counterflow Pipelining When a mismatch (between regular and shadow latch contents) is detected: – Assert a bubble signal, to specify that the erring pipeline slot is now to be considered a bubble. – In the subsequent cycle, inject the shadow latch value into the next stage, allowing the errant operation to continue with the correct values – Trigger a flush train, traveling backwards from the errant stage, flushing operations at each stage it visits (Question: Is this flush operation necessary?? Can we do something else to avoid it?)

Power Consumption Using a Processor to Fry an Egg From:

Power Density From: Hsu and Feng, “A Power-Aware Real-Time System…”, 2005

Power Implications: Dynamic Power From Krishna & Lee: IEEE Trans. Computers, 2003.

Static Power Even when there is no switching, transistors leak current Leakage power is a strongly increasing function of temperature and supply voltage; it is inversely proportional to the threshold voltage.

Subthreshold leakage vs temperature From: Do, et al: Tech Report , Dept of CSE, Chalmers Instt of Tech

Leakage Current vs Vdd From Do et al., op cit.

Voltage Control for Razor Latch System