Automatic Pipelining during Sequential Logic Synthesis Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with Marc Galceran-Oms (eSilicon) and Mike Kishinevsky (Intel)
Synthesis and Verification Dec 10, 2015Automatic pipelining2 Behavior (SystemC, Matlab,…) RTL (Verilog) NetlistNetlist HLS Loop unrolling Common expr. Scheduling Binding Logic Synth. Combinational & Sequential = ? Simulation (Testbenchs) Equivalence Checking (mostly combinational) Sequential? ABC, Calypto’s SLEC
How far can Logic Synthesis go? Dec 10, 2015Automatic pipelining3 UnpipelinedPipelined
Combinational logic synthesis The sequential elements are unmovable. Combinational logic synthesis preserves the cycle-by-cycle behavior of all sequential elements. Dec 10, 2015Automatic pipelining4 CombinationalCombinational CLK
Why is verification easy? Dec 10, 2015Automatic pipelining5 Verification reduced to combinational equivalence
Dec 10, 2015Automatic pipelining6 Retiming: The sequential elements are movable !!! But the observable timing behavior is preserved External cycle accuracy AABB Data CLK
The future ahead Dec 10, 2015Automatic pipelining7 Source: Davide Sacchetto (EPFL) Source: Franz Kreupl (TUM) Time becomes more unpredictable
Introducing time elasticity
Elasticity is known from long ago Dec 10, 2015Automatic pipelining9 VME bus AMBA AHB
Rigid vs. Elastic timing Dec 10, 2015Automatic pipelining S InOut CLK time S InOut req ack req ack S InOut CLK valid stop valid stop
Elastic timing Can we elasticize time automatically? What are the benefits? Can we check elastic equivalence? Dec 10, 2015Automatic pipelining11
Transforming sync into elastic Automatic pipeliningDec 10,
Generalization: bounded FIFOsIn Out B1 B3 B2 Bounded Dataflow Networks Automatic pipeliningDec 10,
Transforming sync into elastic Automatic pipeliningDec 10,
Transforming sync into elastic Automatic pipelining Behavioral equivalence is preserved Dec 10,
16Automatic pipelining V S V S V S V S V S CLK Control layer Generation of filtered (gated) clocks Gated clocks Data path Dec 10, 2015
17Automatic pipelining V S V S V S V S V S CLK Control layer Generation of filtered (gated) clocks Gated clocks Dec 10, 2015
18Automatic pipelining V S V S V S V S V S CLK Gated clocks Dec 10, 2015
19Automatic pipelining CLK Gated clocks 0 0 Dec 10, 2015
Behavioral equivalence Automatic pipelining D: a b c d e f g h i j k … Synchronous: Elastic: D: a a b b b c d e e f g g h i i i j k … D: a a b b b c d e e f g g h i i i j k … V: … V: … Dec 10,
Elastic transformations
We can insert and retime bubbles Dec 10, 2015Automatic pipelining registers, 4 tokensRetimingBubblesCycle PeriodThroughput Effective Period 211 161 124/515 Bubble insertion + Retiming “Bubble insertion + Retiming” can be solved optimally using MILP Bufistov et al., 2007.
PC+4 Branch target address Example: mux for next-PC calculation Jump? Only wait for required inputs Late arriving tokens are cancelled by anti-tokens No jump Early evaluation Dec 10, 2015Automatic pipelining23
How to implement anti-tokens ? Valid + Valid – Valid + Stop + Valid – Stop – + - Automatic pipeliningDec 10,
Memory bypass Dec 10, 2015Automatic pipelining25 R0 R1 R2 R3 wa ra wd rd R0 R1 R2 R3 wd wa ra = rd
Elastic pipelining Dec 10, 2015Automatic pipelining26 rd wd ra READ WRITE wa A B2B1 Kam, et al. Correct-by-construction Microarchitectural Pipelining, ICCAD 08 Sequential execution: R B1 B2R AR B1 B2R A
Elastic pipelining Dec 10, 2015Automatic pipelining27 rd wd ra READ WRITE wa A B2B1
Elastic pipelining Dec 10, 2015Automatic pipelining28 A B2B1 2 bypasses rd wd = wd’ wa’ READ WRITE rawa wd’’ wa’’
Elastic pipelining Dec 10, 2015Automatic pipelining29 A B2B1 Forwarding rd = wa’ READ WRITE rawa wa’’
Elastic pipelining Dec 10, 2015Automatic pipelining30 A B2B1 Retiming rd = wa’ READ WRITE rawa wa’’
Elastic pipelining Dec 10, 2015Automatic pipelining31 A B2B1 Retiming with anti-tokens rd = wa’ READ WRITE rawa wa’’ Anti-token insertion allows retiming combinations that are not possible in a conventional synchronous circuit
Elastic pipelining Dec 10, 2015Automatic pipelining32 B2 B1 R R R R A A B2 B1 R R R R A A B2 B1 R R B2 B1 R R Stall
Micro-architectural exploration Apply Memory Bypass iteratively to RF and MEM Insert bubbles and retime Evaluate performance (effective cycle time) Dec 10, 2015Automatic pipelining33
Micro-architectural exploration Dec 10, 2015Automatic pipelining34
Dec 10, 2015Automatic pipelining35 Marc Galceran-Oms et al., Microarchitectural transformations using elasticity. JETCS, Dec 2011.
The Achilles’ heel: equivalence checking Combinational equivalence checking is easy (structural + SAT) Sequential equivalence checking is hard (the time dimension appears) Even retiming is hard to verify ! A possible way to go: logs – Create logs of tiny transformations – Incremental (step by step) verification 41Dec 10, 2015Automatic pipelining
LOGLOG Incremental verification Dec 10, 2015Automatic pipelining37 N0N0 N n-1 N1N1 N2N2 N3N3 NnNn T1T1 T2T2 T3T3 TnTn T n-1 Synthesis Verification A standard language for sequential transformations: Retime Add bubble Inject anti-token Add bypass to Regfile …
Conclusions Rigid systems preserve timing equivalence (data always valid at every cycle) Elastic systems waive timing equivalence to enable more concurrency (bubbles decrease throughput, but reduce cycle time) A new avenue of performance optimizations can emerge to build general-purpose, correct-by-construction pipelines ΘΘΘΘ Dec 10, 2015Automatic pipelining38