Automatic Pipelining during Sequential Logic Synthesis Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with Marc Galceran-Oms.

Slides:



Advertisements
Similar presentations
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Advertisements

Lecture 4: CPU Performance
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Assume array size is 256 (mult: 4ns, add: 2ns)
Hardware and Petri nets: application to asynchronous circuit design Jordi CortadellaUniversitat Politècnica de Catalunya, Spain Michael KishinevskyIntel.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Goal: Reduce the Penalty of Control Hazards
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel.
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
1 State Encoding of Large Asynchronous Controllers Josep Carmona and Jordi Cortadella Universitat Politècnica de Catalunya Barcelona, Spain.
Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural.
Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
Asynchronous Circuit Verification and Synthesis with Petri Nets J. Cortadella Universitat Politècnica de Catalunya, Barcelona Thanks to: Michael Kishinevsky.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
Extreme Makeover for EDA Industry
1 H ardware D escription L anguages Modeling Digital Systems.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
Introduction to ASIC flow and Verilog HDL
VHDL and Hardware Tools CS 184, Spring 4/6/5. Hardware Design for Architecture What goes into the hardware level of architecture design? Evaluate design.
Branch Hazards and Static Branch Prediction Techniques
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Introduction to Computer Organization Pipelining.
Equivalence checking Prof Shobha Vasudevan ECE 598SV.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Asynchronous Interface Specification, Analysis and Synthesis
Elementary Microarchitecture Algebra
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers
CDA 3101 Spring 2016 Introduction to Computer Organization
Pipelining review.
From C to Elastic Circuits
Pipelining in more detail
Data Hazards Data Hazard
Dynamically Scheduled High-level Synthesis
Jordi Cortadella and Jordi Petit
Jun Chen and Changbo Long
Synchronization Verification in System-Level Design with ILP Solvers
Word-Level Aspects of ABC
Implementation of a De-blocking Filter and Optimization in PLX
Advanced Computer Architecture Lecture 3
Presentation transcript:

Automatic Pipelining during Sequential Logic Synthesis Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with Marc Galceran-Oms (eSilicon) and Mike Kishinevsky (Intel)

Synthesis and Verification Dec 10, 2015Automatic pipelining2 Behavior (SystemC, Matlab,…) RTL (Verilog) NetlistNetlist HLS Loop unrolling Common expr. Scheduling Binding Logic Synth. Combinational & Sequential = ? Simulation (Testbenchs) Equivalence Checking (mostly combinational) Sequential? ABC, Calypto’s SLEC

How far can Logic Synthesis go? Dec 10, 2015Automatic pipelining3 UnpipelinedPipelined

Combinational logic synthesis The sequential elements are unmovable. Combinational logic synthesis preserves the cycle-by-cycle behavior of all sequential elements. Dec 10, 2015Automatic pipelining4 CombinationalCombinational CLK

Why is verification easy? Dec 10, 2015Automatic pipelining5 Verification reduced to combinational equivalence

Dec 10, 2015Automatic pipelining6 Retiming: The sequential elements are movable !!! But the observable timing behavior is preserved External cycle accuracy AABB Data CLK

The future ahead Dec 10, 2015Automatic pipelining7 Source: Davide Sacchetto (EPFL) Source: Franz Kreupl (TUM) Time becomes more unpredictable

Introducing time elasticity

Elasticity is known from long ago Dec 10, 2015Automatic pipelining9 VME bus AMBA AHB

Rigid vs. Elastic timing Dec 10, 2015Automatic pipelining S InOut CLK time S InOut req ack req ack S InOut CLK valid stop valid stop

Elastic timing Can we elasticize time automatically? What are the benefits? Can we check elastic equivalence? Dec 10, 2015Automatic pipelining11

Transforming sync into elastic Automatic pipeliningDec 10,

Generalization: bounded FIFOsIn Out B1 B3 B2 Bounded Dataflow Networks Automatic pipeliningDec 10,

Transforming sync into elastic Automatic pipeliningDec 10,

Transforming sync into elastic Automatic pipelining Behavioral equivalence is preserved Dec 10,

16Automatic pipelining V S V S V S V S V S CLK Control layer Generation of filtered (gated) clocks Gated clocks Data path Dec 10, 2015

17Automatic pipelining V S V S V S V S V S CLK Control layer Generation of filtered (gated) clocks Gated clocks Dec 10, 2015

18Automatic pipelining V S V S V S V S V S CLK Gated clocks Dec 10, 2015

19Automatic pipelining CLK Gated clocks 0 0 Dec 10, 2015

Behavioral equivalence Automatic pipelining D: a b c d e f g h i j k … Synchronous: Elastic: D: a a b b b c d e e f g g h i i i j k … D: a a b b b c d e e f g g h i i i j k … V: … V: … Dec 10,

Elastic transformations

We can insert and retime bubbles Dec 10, 2015Automatic pipelining registers, 4 tokensRetimingBubblesCycle PeriodThroughput Effective Period 211  161  124/515 Bubble insertion + Retiming “Bubble insertion + Retiming” can be solved optimally using MILP Bufistov et al., 2007.

PC+4 Branch target address Example: mux for next-PC calculation Jump? Only wait for required inputs Late arriving tokens are cancelled by anti-tokens No jump Early evaluation Dec 10, 2015Automatic pipelining23

How to implement anti-tokens ? Valid + Valid – Valid + Stop + Valid – Stop – + - Automatic pipeliningDec 10,

Memory bypass Dec 10, 2015Automatic pipelining25 R0 R1 R2 R3 wa ra wd rd R0 R1 R2 R3 wd wa ra = rd

Elastic pipelining Dec 10, 2015Automatic pipelining26 rd wd ra READ WRITE wa A B2B1 Kam, et al. Correct-by-construction Microarchitectural Pipelining, ICCAD 08 Sequential execution: R B1 B2R AR B1 B2R A

Elastic pipelining Dec 10, 2015Automatic pipelining27 rd wd ra READ WRITE wa A B2B1

Elastic pipelining Dec 10, 2015Automatic pipelining28 A B2B1 2 bypasses rd wd = wd’ wa’ READ WRITE rawa wd’’ wa’’

Elastic pipelining Dec 10, 2015Automatic pipelining29 A B2B1 Forwarding rd = wa’ READ WRITE rawa wa’’

Elastic pipelining Dec 10, 2015Automatic pipelining30 A B2B1 Retiming rd = wa’ READ WRITE rawa wa’’

Elastic pipelining Dec 10, 2015Automatic pipelining31 A B2B1 Retiming with anti-tokens rd = wa’ READ WRITE rawa wa’’ Anti-token insertion allows retiming combinations that are not possible in a conventional synchronous circuit

Elastic pipelining Dec 10, 2015Automatic pipelining32 B2 B1 R R R R A A B2 B1 R R R R A A B2 B1 R R B2 B1 R R Stall

Micro-architectural exploration Apply Memory Bypass iteratively to RF and MEM Insert bubbles and retime Evaluate performance (effective cycle time) Dec 10, 2015Automatic pipelining33

Micro-architectural exploration Dec 10, 2015Automatic pipelining34

Dec 10, 2015Automatic pipelining35 Marc Galceran-Oms et al., Microarchitectural transformations using elasticity. JETCS, Dec 2011.

The Achilles’ heel: equivalence checking Combinational equivalence checking is easy (structural + SAT) Sequential equivalence checking is hard (the time dimension appears) Even retiming is hard to verify ! A possible way to go: logs – Create logs of tiny transformations – Incremental (step by step) verification 41Dec 10, 2015Automatic pipelining

LOGLOG Incremental verification Dec 10, 2015Automatic pipelining37 N0N0 N n-1 N1N1 N2N2 N3N3 NnNn T1T1 T2T2 T3T3 TnTn T n-1 Synthesis Verification A standard language for sequential transformations: Retime Add bubble Inject anti-token Add bypass to Regfile …

Conclusions Rigid systems preserve timing equivalence (data always valid at every cycle) Elastic systems waive timing equivalence to enable more concurrency (bubbles decrease throughput, but reduce cycle time) A new avenue of performance optimizations can emerge to build general-purpose, correct-by-construction pipelines ΘΘΘΘ Dec 10, 2015Automatic pipelining38 