Elementary Microarchitecture Algebra

Slides:



Advertisements
Similar presentations
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Advertisements

1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
A Proof of Correctness of a Processor Implementing Tomasulo’s Algorithm without a Reorder Buffer Ravi Hosabettu (Univ. of Utah) Ganesh Gopalakrishnan (Univ.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Chapter Six.
CS 352H: Computer Systems Architecture
Computer Organization
ECE 353 Lab 3 Pipeline Simulator
Exceptions Another form of control hazard Could be caused by…
Stalling delays the entire pipeline
Note how everything goes left to right, except …
ARM Organization and Implementation
CSCI206 - Computer Organization & Programming
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Morgan Kaufmann Publishers
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Single Clock Datapath With Control
Microprocessor Microarchitecture Dynamic Pipeline
ECE 353 Lab 3 Pipeline Simulator
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Design of the Control Unit for Single-Cycle Instruction Execution
Pipelining: Advanced ILP
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Morgan Kaufmann Publishers The Processor
Chapter 6 Enhancing Performance with Pipelining
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Pipelining review.
Single-cycle datapath, slightly rearranged
Pipelining in more detail
CSCI206 - Computer Organization & Programming
\course\cpeg323-05F\Topic6b-323
Data Hazards Data Hazard
Pipelining Basic concept of assembly line
Pipeline control unit (highly abstracted)
Chapter Six.
Chapter Six.
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Tri-state buffer A circuit which allows an input to go to output when desired Otherwise it behaves as if “nothing” is connected to the wire An equivalent.
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
CSC3050 – Computer Architecture
ARM ORGANISATION.
RTL for the SRC pipeline registers
Introduction to Computer Organization and Architecture
ECE 352 Digital System Fundamentals
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
©2003 Craig Zilles (derived from slides by Howard Huang)
Presentation transcript:

Elementary Microarchitecture Algebra John Matthews and John Launchbury Oregon Graduate Institute

Hawk Goals Develop specifications that are clear and concise Simulate the specifications, both concretely and symbolically Formally verify specifications at the source-code level

Algebraic Verification Developed a domain-specific algebra for microarchitectures Proved equational laws that hold between microarchitecture components We simplify pipelines using these laws while preserving functional (cycle-accurate) behavior But clock cycle period may change!

Transactions Group data and control information together Transactions - containing destinations, sources, and operations - flow through the model Decide control locally whenever possible R3 <- Add R1 R2 16 5 11

Example: The SuperSimple Pipeline Reg ALU Reference machine: Each transaction is completed in one (long) clock cycle Results are written back to register file on the next clock cycle

Example: The SuperSimple Pipeline Reg ALU Reference machine: R3 <- Add R1 R2 - - -

Example: The SuperSimple Pipeline Reg ALU Reference machine: R3 <- Add R1 R2 R3 <- Add R1 R2 - - - - 5 11

Example: The SuperSimple Pipeline Reg ALU Reference machine: R3 <- Add R1 R2 R3 <- Add R1 R2 R3 <- Add R1 R2 - - - - 5 11 16 5 11

Example: The SuperSimple Pipeline Reg ALU Reference machine: R3 <- Add R1 R2 R3 <- Add R1 R2 R3 <- Add R1 R2 - - - - 5 11 16 5 11 R3 <- Add R1 R2 16 5 11

Example: The SuperSimple Pipeline Reg ALU Reference machine: Reg ALU Pipelined machine:

Verifying SuperSimple Pipelined machine should behave the same as reference machine, except the pipelined machine has one more cycle of latency Reg ALU Reg ALU

Verifying SuperSimple We incrementally simplify the pipeline Use local algebraic laws, each proved by induction over time Reg ALU Reg ALU

Circuit Duplication Law We can always duplicate a circuit without changing its functional behavior F F F

Retiming the Pipeline We first move delay circuits forward, using the circuit duplication law Reg ALU Reg ALU

Retiming the Pipeline We first move delay circuits forward, using the circuit duplication law Reg ALU Reg ALU

Retiming the Pipeline We first move delay circuits forward, using the circuit duplication law Reg ALU Reg ALU

Time-Invariance Laws Delay circuits can be moved across time-invariant circuits without changing behavior ALU ALU

Retiming the Pipeline Apply time-invariance laws to continue moving delay circuits Reg ALU Reg ALU

Retiming the Pipeline Apply time-invariance laws to continue moving delay circuits Reg ALU Reg ALU

Retiming the Pipeline Apply time-invariance laws to continue moving delay circuits Reg ALU Reg ALU

Removing Forwarding Logic The register-bypass laws allow us to remove a bypass circuit on the output of a registerFile Reg Reg Reg Reg

Removing Forwarding Logic Apply register-bypass law to remove bypass circuit Reg ALU Reg ALU

Removing Forwarding Logic Apply register-bypass law to remove bypass circuit Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Removing Forwarding Logic Repositioning components Reg ALU Reg ALU

Simplification Complete! Pipeline has been reduced to reference machine, but delayed by one clock cycle Reg ALU Reg ALU

Simplifying Stalling Pipelines More complex pipelines often have to stall to resolve hazards or mis-speculation A stalling pipeline won’t be cycle-accurate with respect to a reference machine We still simplify as much as possible Then use other verification techniques on simplified pipeline Simplified pipeline should be easier to verify

The SomewhatSimple Pipeline Resolves mem-alu data hazards by stalling Resolves branch mispredictions by squashing misp ? hazard? ICache Reg ALU Mem Kill

misp ? hazard? ICache Reg ALU Mem Kill Original Pipeline

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Various Retiming Laws misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Various Retiming Laws misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Projection Laws Projections are circuits that reset selected transaction fields to default values Used to indicate that only a portion of a transaction is needed Also used to capture constraints holding on a wire Projections can express conditional laws ICache ICache br

More Projection Laws br misp ? misp ? hazard? hazard? ctrl ctrl

Various Projection Laws misp ? hazard? ICache Reg ALU Mem Kill Various Projection Laws Simplifying pipeline .....

Various Projection Laws br misp ? hazard? br ICache Reg ALU Mem Kill Various Projection Laws Simplifying pipeline .....

br misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

br misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? br ICache Reg ALU Mem Kill Simplifying pipeline .....

Conditional Laws Many components never modify branch info Expressed with branch projections br br br br Mem Mem

misp ? hazard? br ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? br ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? br ICache Reg ALU Mem Kill Simplifying pipeline .....

br misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

br misp ? hazard? br ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Hazard Projection Kill logic guarantees no data hazards on output wire H is a sequential circuit projecting out all hazards hazard? hazard? H Kill Kill

Hazard-Bypass Law Conditional law that allows us to remove forwarding logic between pipeline stages …But only if no hazards occur on the input Applicable to any two “execution-unit like” stages Exec1 Exec2 H Exec1 Exec2 H

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

Hazard-bypass Law misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

Hazard-bypass Law misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem H Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ctrl ctrl ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Register-bypass Law misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Register-bypass Law misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

Register-bypass Law misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Simplifying pipeline .....

misp ? hazard? ICache Reg ALU Mem Kill Final Pipeline

Finishing the Verification Pipeline is as close to reference machine as possible without breaking cycle-accurate behavior Use other techniques to finish the verification Removal of forwarding and delay logic makes verification simpler

Related Work Recursive signal definitions (Johnson) Transactions (Aagaard & Leeser) Retiming (Leiserson, Saxe et al) Ruby (Sheeran et al); Lustre (Halbwachs) Term-rewriting systems (Arvind et al) Much work on state-machine-based verification (Birch & Dill, McMillan, Hosabettu) Unpipelining (Levitt & Olukotun)

Future Work Perform complete verification algebraically Create a “remove-NOP” component Discover appropriate simplification laws Extend verification to superscalar and out-of-order microarchitectures Add sequence numbers to transactions Create a “reorder-transactions” component

Conclusions Algebraic verification can be used to simplify microarchitectures prior to verification Can reason about microarchitectures at the source-code level Laws can be expressed visually Using laws doesn’t require theorem-prover expertise Proving laws does; perhaps use decision procedures Discovering laws can be challenging But laws tend to be reusable across similar pipelines

Further Reading Most of these laws and transformations are described in the following paper: Elementary Microarchitecture Algebra, by John Matthews and John Launchbury, in CAV ‘99. We have several other papers introducing Hawk and describing microarchitecture verification based on transactions. All of these papers can be found at: http::/www.cse.ogi.edu/PacSoft/Hawk