1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology.

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

Chapter 4: Combinational Logic
Spartan-3 FPGA HDL Coding Techniques
Modular Combinational Logic
Combinational Logic.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
Digital Logic Circuits (Part 2) Computer Architecture Computer Architecture.
P. Keresztes, L.T. Kóczy, A. Nagy, G.Rózsa: Training Electrical Engineers on Asynchronous Logic Circuits on Constant Weight Codes 1 Training Electrical.
Synchronous Digital Design Methodology and Guidelines
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
Advanced Digital Design Asynchronous Design: DI Methods by A. Steininger and M. Delvai Vienna University of Technology.
1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.
1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology.
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
GOOD MORNING.
Digital Computer Design Fundamental
Synthesis Presented by: Ms. Sangeeta L. Mahaddalkar ME(Microelectronics) Sem II Subject: Subject:ASIC Design and FPGA.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Finite State Machines VHDL ET062G & ET063G Lecture 6 Najeem Lawal 2012.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
More Digital circuits. Ripple Counter The most common counter The problem is that, because more than one output is changing at once, the signal is glichy.
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
ALU (Continued) Computer Architecture (Fall 2006).
Curtis A. Nelson 1 Technology Mapping of Timed Circuits Curtis A. Nelson University of Utah September 23, 2002.
RTL Design Methodology Transition from Pseudocode & Interface
1 Part III: VHDL CODING. 2 Design StructureData TypesOperators and AttributesConcurrent DesignSequential DesignSignals and VariablesState Machines A VHDL.
ECE 511: Digital System & Microprocessor. Course Outline WeekSubject W1Digital Logic Review W2-W3Microprocessor Architecture & Overview W3-W6Microprocessor.
Introduction to ASIC flow and Verilog HDL
Data Flow Modeling in VHDL
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
Advanced Digital Design Asynchronous Design: Principles by A. Steininger and M. Delvai Vienna University of Technology.
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Advanced Digital Design
Advanced Digital Design
Behavioral Style Combinational Design with VHDL
IAY 0600 Digital Systems Design
Introduction to Programmable Logic
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Behavioral Style Combinational Design with VHDL
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
Field Programmable Gate Array
Field Programmable Gate Array
RTL Style در RTL مدار ترتيبي به دو بخش (تركيبي و عناصر حافظه) تقسيم مي شود. مي توان براي هر بخش يك پروسس نوشت يا براي هر دو فقط يك پروسس نوشت. مرتضي صاحب.
Hardware Description Languages
Advanced Digital Design
ECE 545 Lecture 12 Design of Controllers Finite State Machines and Algorithmic State Machine (ASM) Charts.
CSE 370 – Winter Sequential Logic-2 - 1
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
VHDL Introduction.
IAS 0600 Digital Systems Design
Clockless Logic: Asynchronous Pipelines
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code.
Low Power Digital Design
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
(Sequential-Circuit Building Blocks)
Presentation transcript:

1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology

2 Outline Introduction Introduction Principles Principles Basic gates Basic gates Design flow and tools Design flow and tools Circuit design with FSL Circuit design with FSL Pipeline Pipeline Data Paths Data Paths Conclusion Conclusion Research plans Research plans

3 Fundamental Design Problem Ensure a lossles data flow A SRCSNK f(x) Issue ConditionCapture Condition only valid and consistent data has to be consumed new data can be issued, when the previous one was already consumed

4 Synchronous Approach SRCSNK f(x) A T Clk Global Time Reference => indirect conclusion

5 Asynchronous Circuits SRCSNK f(x) local handshake protocol Acknowledge Request

6 1x „1“ | 2x „0“ Four State Logic SRC SNK f(x) ∆t => delay insensitive 3x „1“ | 2x „0“ 2x „1“ | 1x „0“ => additional information required => SNK must able to recognize when data is valid and consistent

7 FSL encoding  Use 2 codes per logic value X X.a X.b need two-rail coding: = L = H = h = l L => H (0,0) => (1,1) (0,0) ? (1,1) (1,0) (0,1)

8 Completion Detection SRC SNK Completion Detection CMPD HlLhHhLl consistent data

9 Phase Transition f(x) ? φ0φ0φ1φ1φ0φ0φ1φ1 We have to ensure, that:  unconsistent input vectors are not processed  f(x) is a monotonic function

10 Combinational Functions Variant 1:  Hazard free impl.  Consistency detector CDCD Processing of unconsistent inputs unevitable due to internal skew Variant 2:  each basic gate processes only const. inputs  function of each basic gate is monotonic f(x) local intelligence => hardware overhead

11 Consistent in φ1 (Combinational) Basic Gates And And Or Or Inv Inv (MUX) (MUX) (XOR) (XOR) … HL LL hl ll ** ** ** ** H L h l HLhlY E1 E2 Truth Table FSL-AND FSL AND E1 E2 Y * keep old value Consistent in φ0

12 E1.a E1.b E1 E2.a E2.b E2 Y.a Y.b Y Mem C D f a (x) f b (x) enable separation of control and data path => not delay insensitive FSL Gate Implementation

13 Further FSL Gates φ-Inverter φ-Inverter φ-Detector φ-Detector φ-Converter φ-Converter Register/Latch Register/Latch Memories Memories

14 rail b rail a H (1,0) h (1,1)HIGH L (0,1)l (0,0) LOW  1 (a,b)  0 (a,b) φ–Inverter φ–Inverter => simple inversion of rail b

15 rail a Std → FSL rail b Std FSL 1 HIGH 0LOW φ–Converter φ–Converter FSL logic Std logic H (1,1)h (1,0) HIGH L (0,0)l (0,1) LOW  1 (a,b)  0 (a,b) Sig. XORXOR requested φ Std rail a rail b FSL FSL → Std

16 H (1,1) h (1,0) HIGH L (0,0) l (0,1) LOW  1 (a,b)  0 (a,b) XOR ‘0‘ XOR ‘1‘ φ–Detector φ–Detector

17 Ctrl Latch φ in input data consistent and valid output data already consumed input data input data consumed freeze the latches again data in data out φ c-done pass FSL Register

18 Memory Standard RAM φ-det CONVCONV CONVCONV  FSL_LogicSTD_LogicFSL_Logic STD_Logic Alternative: Store entire data in φ0 and φ1 => huge overhead (4x)

19 Design Flow and tools Requirements: Standard tools (Synopsys/Quartus) Standard tools (Synopsys/Quartus) Modelling on RTL level Modelling on RTL level Support for simulation and synthesis Support for simulation and synthesis Target platform FPGA Target platform FPGA

20 Adaptation: VHDL Definition of an FSL_logic type Definition of an FSL_logic type Redefinintion of std_1164 package Redefinintion of std_1164 package Additional functions Additional functions φ_det, φ_inv, conversion functions φ_det, φ_inv, conversion functions stable stable => Modelling FSL circuits on RTL level

21 Example: Program Counter stable_signals <= AddrInc_conv&JmpExe&JmpAddr; pc_next_SM: process begin stable(stable_signals); if JmpExe = JMP_EXE_I1 or JmpExe = JMP_EXE_I0 then AddrNxt <= JmpAddr; else AddrNxt <= AddrInc_conv; end if; end process pc_next_SM;

22 Adaptation: Synthesis (1) FSL Target FSL Library FSL Target FSL Library FSL AND, FSL OR, FSL INV, FSL Register, φ-Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ-Detector … Synthesis with FSL Target Library Synthesis with FSL Target Library  Netlist Package FSL_Rail Package FSL_Rail Definition FSL_rail_logic :Record (a,b) Definition FSL_rail_logic :Record (a,b) FSL AND, FSL OR, FSL INV, FSL Register, φ-Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ-Detector … Netlist: Replace FSL with FSL_rail Netlist: Replace FSL with FSL_rail Synthesis with FPGA Target Library Synthesis with FPGA Target Library L & L H (0,0) & (1,1)

23 Adapation: Synthesis (2) conventional design flow FSL design flow

24 Adaptation: Simulation Same testbench for FSL_logic and FSL_rail_logic ciruits Same testbench for FSL_logic and FSL_rail_logic ciruits => Verification of FSL circuits Testbench FSL Response FSL Logic FSL Stimuli FSL Rail Logic Conversion

25 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools Circuit design Circuit design Pipeline Pipeline Data Paths Data Paths Modeling complex circuits Modeling complex circuits Open points Open points Conclusion Conclusion

26 (Linear) Pipeline LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH K1 K2K3K4 00 11 00 11 00 Full initialized 00 00 00 00 00 Empty initialized

27 Bubble Concept (1) Progress is possible when a circuit contains at least one bubble K1 K2K3 00 11 00 K4 11 00 K3 K1 K2 11 11 00 11 00 bubble identical values

28 Bubble Concept (2) Initialization => ensure that the circuit contains at least one bubble Initialization => ensure that the circuit contains at least one bubble More than one bubble => higher processing speed More than one bubble => higher processing speed K1 K2K3K4K5K6K7 11 00 11 00 00 11 11 00

29 Non-linear Pipeline: Forward Path (1) K1 K2K3 00 11 00 11 11 00 11 φ-inv 00 Request SRC SNK

30 K1 K2K3 00 00 Operation K3 11 11 00 11 φ-inv 11 1 00 2 11 3 00 4 00  0  1 Non-linear Pipeline: Forward Path (2)

31 K1 K2K3 Operation K3 φ-inv 00 5 00 11 00 00 11 Non-linear Pipeline: Forward Path (2) 11 11

32 K1 K2K3 00 11 00 11 11 11 00 φ-inv 00 Non-linear Pipeline: Feedback Path (1)

33 K1 K2K3 00 00 Operation K3 11 11 11 00 φ-inv 11 1 00 2 11 3 00  1  0 Non-linear Pipeline: Feedback Path (2)

34 First Conclusion: Non-linear Pipeline K1 K2K3 00 11 00 11 00 11 φ-inv 00 K1 K2K3 00 11 00 11 11 00 Forward Path Feedback Path Phase inverter placement : Forward Path: Eliminate inconsistent inputs in the init. state Feedback Path: Generate inconsistent inputs in the init. state => true, when the pipeline is initialized full

35 Empty Non-linear Pipeline: Forward Path (1) K1 K2K3 φ-inv 11 00 00 00 00 00 00 00 11 00 11 00 11 00 empty initialized=> no phase inverter required

36 00 00 K1 K2K3 00 00 Operation K3 11 1 φ1φ1 2 11 3 φ1φ1 4 00  φ1 00 Empty Non-linear Pipeline: Forward Path (2) => different phase inverter placement for full and empty initialized circuit

37 “Invalid” Feedback Path K1 φ-Inv always required no inversion of the request signal Definition: A valid feedback path must contain at least two registers nodes

38 Phase Inverter Placement Phase inverter are required to avoid deadlocks Phase inverter are required to avoid deadlocks Placement of phase inverters depends on: Placement of phase inverters depends on: Topology of the circuit Topology of the circuit Type and number of components inside valid feedbacks Type and number of components inside valid feedbacks Dynamic behavior Dynamic behavior Initialization Initialization Handshake signals have to be considered Handshake signals have to be considered Processing speed depends on initialization Processing speed depends on initialization More configurations are possible More configurations are possible

39 Identification of Phase Inverter: Full Initialised Circuits Systematic approach: 1. Identify combinational logic and registers/memories 2. Generate a graph representation of the circuit based on registers /memories 3. Apply a coloring algorithm 4. Identify feedback path (=> loops) 5. Eliminate invalid feedbacks 6. Add a phase invert to each remaing loop (phase inverter can be shared)

40 Data Paths Reg f(x ) Reg DESEL. NODE f(x ) Reg f(x ) Reg SEL. NODE DEMUX FORK MUX MERGE

41 Example: Merging Data Paths Reg f(x ) Reg MUXMUX f(x ) Ack DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) Assumption Acknowlegde is activated when selected data is available Differnent delay for both data paths DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) ∆1 ∆3 DW1 (  0) Step 1: In1 selected DW1 (  0) DW2 (  1) Step 2: In1 selected Step 3: In2 selected DW1 (  0) In1 In2 Ack DW2 (  1)

42 Example: Merging Data Paths (2) Depending on the circuit functionality: a) both inputs have to be consumed in each processing step  Ensure that the difference in processing speed in all data paths is small enough  Wait until all input are available (even the unused ones) b) all inputs have to be processed and consumed  Insert synchronizer circuits to adjust the phase encoding of the input signals

43 DEMUX Example f(x ) Reg f(x ) Reg DEMUXDEMUX f(x ) DW1 (  0) DW4 (  1)DW3 (  0) DW2 (  1) Avoid loss of synchronization Dummy data approach Synchronizer circuits Performance considerations Wait on ack. of data paths Wait only on required ack.

44 Asynchronous ASPEAR : Asynchronous SPEAR

45 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) Our Current Status

46 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations Our Current Status

47 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) Our Current Status APEX 20KC

48 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) investigation of DI investigation of DI experimental robustness assessment: (fault-injection: synchronous design versus asyn) experimental robustness assessment: (fault-injection: synchronous design versus asyn) Our Current Status

49 Conclusion FSL Four State Logic (FSL Four State Logic (FSL Delay insensitive logic Delay insensitive logic Two Representation Low/High => Dual rail encoding Two Representation Low/High => Dual rail encoding Even combinational require a memory elements Even combinational require a memory elements Circuit design with FSL Circuit design with FSL Non linear pipeline Non linear pipeline Additional phase inverter required Additional phase inverter required Placement depends on Placement depends on Circuit topology Circuit topology Initialization Initialization … … Data paths Data paths Splitting: Synchronizer Circuits or Dummy Data required Splitting: Synchronizer Circuits or Dummy Data required Merging: Non-eager MUX or timing assumptions required Merging: Non-eager MUX or timing assumptions required