1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology.

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

Chapter 4: Combinational Logic
ECE C03 Lecture 71 Lecture 7 Delays and Timing in Multilevel Logic Synthesis Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Spartan-3 FPGA HDL Coding Techniques
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
1 Fundamentals of Computer Science Sequential Circuits.
Digital Logic Circuits (Part 2) Computer Architecture Computer Architecture.
Circuits require memory to store intermediate data
P. Keresztes, L.T. Kóczy, A. Nagy, G.Rózsa: Training Electrical Engineers on Asynchronous Logic Circuits on Constant Weight Codes 1 Training Electrical.
Synchronous Digital Design Methodology and Guidelines
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
Embedded Systems Hardware:
Advanced Digital Design Asynchronous Design: DI Methods by A. Steininger and M. Delvai Vienna University of Technology.
1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
ELEN 468 Advanced Logic Design
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Digital Integrated Circuits for Communication
GOOD MORNING.
1 H ardware D escription L anguages Basic Language Concepts.
Digital Computer Design Fundamental
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
Chapter 10 State Machine Design. 2 State Machine Definitions State Machine: A synchronous sequential circuit consisting of a sequential logic section.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
More Digital circuits. Ripple Counter The most common counter The problem is that, because more than one output is changing at once, the signal is glichy.
Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
(1) Basic Language Concepts © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
ALU (Continued) Computer Architecture (Fall 2006).
Curtis A. Nelson 1 Technology Mapping of Timed Circuits Curtis A. Nelson University of Utah September 23, 2002.
1 Part III: VHDL CODING. 2 Design StructureData TypesOperators and AttributesConcurrent DesignSequential DesignSignals and VariablesState Machines A VHDL.
CEC 220 Digital Circuit Design Latches and Flip-Flops Monday, March 03 CEC 220 Digital Circuit Design Slide 1 of 19.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
May 9, 2001Systems Architecture I1 Systems Architecture I (CS ) Lab 5: Introduction to VHDL Jeremy R. Johnson May 9, 2001.
Advanced Digital Design Asynchronous Design: Principles by A. Steininger and M. Delvai Vienna University of Technology.
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
George Mason University Behavioral Modeling of Sequential-Circuit Building Blocks ECE 545 Lecture 8.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
ECE 331 – Digital System Design Introduction to Sequential Circuits and Latches (Lecture #16)
Chapter 3 Boolean Algebra and Digital Logic T103: Computer architecture, logic and information processing.
TOPIC : Introduction to Sequential Circuits UNIT 1: Modeling and Simulation Module 4 : Modeling Sequential Circuits.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Advanced Digital Design
Basic Language Concepts
Advanced Digital Design
Behavioral Style Combinational Design with VHDL
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Behavioral Style Combinational Design with VHDL
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
RTL Style در RTL مدار ترتيبي به دو بخش (تركيبي و عناصر حافظه) تقسيم مي شود. مي توان براي هر بخش يك پروسس نوشت يا براي هر دو فقط يك پروسس نوشت. مرتضي صاحب.
Hardware Description Languages
Advanced Digital Design
CSE 370 – Winter Sequential Logic-2 - 1
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Clockless Logic: Asynchronous Pipelines
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Sequntial-Circuit Building Blocks
(Sequential-Circuit Building Blocks)
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology

2 Outline Introduction Introduction Principles Principles Basic gates Basic gates Design flow and tools Design flow and tools Circuit design with FSL Circuit design with FSL Pipeline Pipeline Data paths Data paths Current status Current status Conclusion Conclusion

3 Fundamental Design Problem Ensure a lossles data flow A SRCSNK f(x) Issue ConditionCapture Condition only valid and consistent data has to be consumed new data can be issued, when the previous one was already consumed

4 Synchronous Approach SRCSNK f(x) A T Clk Global Time Reference => indirect conclusion

5 Asynchronous Circuits SRCSNK f(x) local handshake protocol Acknowledge Request

6 1x „1“ | 2x „0“ Four State Logic SRC SNK f(x) ∆t => delay insensitive 3x „1“ | 2x „0“ 2x „1“ | 1x „0“ => additional information required => SNK must able to recognize when data is valid and consistent

7 FSL encoding  Use 2 codes per logic value X X.a X.b need two-rail coding: = L = H = h = l L => H (0,0) => (1,1) (0,0) ? (1,1) (1,0) (0,1)

8 Completion Detection SRC SNK HlLhHhLl consistent data

9 FSL Gates Combinational Gates Combinational Gates AND, OR, INV, … AND, OR, INV, …

10 Phase Transition f(x) ? φ0φ0φ1φ1φ0φ0φ1φ1 We have to ensure, that:  unconsistent input vectors are not processed  f(x) is a monotonic function

11 Combinational Functions Variant 1:  Hazard free impl.  Consistency detector CDCD Processing of unconsistent inputs inevitable due to internal skew Variant 2:  each basic gate processes only const. inputs  function of each basic gate is monotonic f(x) local intelligence => hardware overhead

12 Consistent in φ1 Combinational Gates And And Or Or Inv Inv (MUX) (MUX) (XOR) (XOR) … HL LL hl ll ** ** ** ** H L h l HLhlY E1 E2 Truth Table FSL-AND FSL AND E1 E2 Y * keep old value Consistent in φ0

13 E1.a E1.b E1 E2.a E2.b E2 Y.a Y.b Y Mem f a (x) f b (x) Challenge: preserve the delay insensitive for implementation Gate Template

14 H (1,0) h (1,1)HIGH L (0,1)l (0,0) LOW  1 (a,b)  0 (a,b) Inverter Inverter Is the inverter delay insensitive? rail b rail a

15 FSL Gates √ Combinational Gates √ AND, OR, INV, … Register Register

16 Completion Detection SRC SNK HlLhHhLl consistent data

17 Completion Detection SRC SNK HlLhHhLl consistent data Register Latch CMPD enable

18 FSL Register LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH Latch CMPD Latch CMPD  additional handshake signals are required

19 FSL Register Latch CMPD Is the output data already consumed ? input data output data  handshake signal from the next register required CTRL LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH

20 FSL Register Latch CMPD input data output data CTRL When do we close the latch again ?  when the input data was taken over

21 FSL Register Latch CMPD input data output data CTRL Input data is ready to be consumed when all input signals carry the same phase Input data is consumed when input and output carry the same phase Output data was consumed when the output of the next register carry the same phase as the current output data Only phase detectors are required to generate all handshake signals => phase detector

22 H (1,1) h (1,0) HIGH L (0,0) l (0,1) LOW  1 (a,b)  0 (a,b) XOR ‘0‘ XOR ‘1‘ φ–Detector φ–Detector

23 Ctrl Latch φ in input data consistent and valid output data already consumed input data input data consumed freeze the latches again data in data out c-done pass FSL Register φ out

24 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector Memory Memory

25 Memory Two options: Store directly FSL signals Store directly FSL signals 4 bits per logical value 4 bits per logical value  huge overhead but delay insensitive (in theory) Store only logical information Store only logical information 1 bits per logical value 1 bits per logical value  low overhead but not delay insensitive

26 Memory Standard RAM φ-det CONVCONV CONVCONV  FSL_LogicSTD_LogicFSL_Logic STD_Logic

27 rail a Std → FSL rail b Std FSL 1 HIGH 0LOW φ–Converter φ–Converter FSL logic Std logic H (1,1)h (1,0) HIGH L (0,0)l (0,1) LOW  1 (a,b)  0 (a,b) Sig. XORXOR requested φ Std rail a rail b FSL FSL → Std

28 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector √ Memory √ φ–Converter ( √ φ–Converter (FSL→Std, Std→FSL) φ–Inverter φ–Inverter

29 rail b rail a H (1,0) h (1,1)HIGH L (0,1)l (0,0) LOW  1 (a,b)  0 (a,b) φ–Inverter φ–Inverter => simple inversion of rail b

30 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector √ Memory √ φ–Converter ( √ φ–Converter (FSL→Std, Std→FSL) √ φ–Inverter

31 Design Flow and tools Requirements: Standard tools (Synopsys/Quartus) Standard tools (Synopsys/Quartus) Modelling on RTL level Modelling on RTL level Support for simulation and synthesis Support for simulation and synthesis Target platform FPGA Target platform FPGA

32 Adaptation: VHDL Definition of an FSL_logic type Definition of an FSL_logic type Redefinintion of std_1164 package Redefinintion of std_1164 package Additional functions Additional functions φ_det, φ_inv, conversion functions φ_det, φ_inv, conversion functions stable stable => Modelling FSL circuits on RTL level

33 Example: Program Counter stable_signals <= AddrInc&JmpExe&JmpAddr; pc_next: process begin stable(stable_signals); if JmpExe = ‘H‘ or JmpExe = ‘l‘ then AddrNxt <= JmpAddr; else AddrNxt <= AddrInc; end if; end process pc_next; JmpExe JmpAddr AddrInc AddrNxt f(x)

34 Adaptation: Synthesis (1) FSL Target FSL Library FSL Target FSL Library FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … Synthesis with FSL Target Library Synthesis with FSL Target Library  Netlist Package FSL_Rail Package FSL_Rail Definition FSL_rail_logic :Record (a,b) Definition FSL_rail_logic :Record (a,b) FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … Netlist: Replace FSL with FSL_rail Netlist: Replace FSL with FSL_rail Synthesis with FPGA Target Library Synthesis with FPGA Target Library L & L H (0,0) & (1,1)

35 Adapation: Synthesis (2) conventional design flow FSL design flow

36 Adaptation: Simulation Same testbench for FSL_logic and FSL_rail_logic ciruits Same testbench for FSL_logic and FSL_rail_logic ciruits => Verification of FSL circuits Testbench FSL Response FSL Logic FSL Stimuli FSL Rail Logic Conversion

37 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools Circuit design Circuit design Pipeline Pipeline Data Paths Data Paths Current status Current status Conclusion Conclusion

38 (Linear) Pipeline LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH K1 K2K3K4 00 11 00 11 00 Full initialized 00 00 00 00 00 Empty initialized

39 Bubble Concept (1) Progress is possible when a circuit contains at least one bubble K1 K2K3 00 11 00 K4 11 00 K3 K1 K2 11 11 00 11 00 bubble identical values

40 Bubble Concept (2) Initialization => ensure that the circuit contains at least one bubble Initialization => ensure that the circuit contains at least one bubble More than one bubble => higher processing speed More than one bubble => higher processing speed K1 K2K3K4K5K6K7 11 00 11 00 00 11 11 00

41 Bubble Concept (3) Bubbles can be consumed: Bubbles can be consumed: Slow SRC → empty pipeline Slow SRC → empty pipeline Slow SNK → full pipeline Slow SNK → full pipeline K2K3K4K5K6 11 00 11 00 11 SRCSNK 11 bubble

42 Non-linear Pipeline Definition: A non-linear pipeline is a pipeline which contains at least one feedback or forward path K1 K2K3K4K5K6 forward path feedback path Consequences:  Internal regulation  bubble cannot be consumed  Potential sources of deadlocks

43 Non-linear Pipeline: Forward Path (1) K1 K2K3 00 11 00 K4 11 11 00 11 φ-inv 00 request SRC SNK

44 K1 K2K3 00 00 Operation K4 11 11 00 11 φ-inv 11 1 00 2 11 3 00 4 00  0  1 Non-linear Pipeline: Forward Path (2)

45 K1 K2K3 Operation K4 φ-inv 00 5 00 11 00 00 11 Non-linear Pipeline: Forward Path (2) 11 11

46 Empty Non-linear Pipeline: Forward Path (1) K1 K2K3K4 φ-inv 11 00 00 00 00 00 00 00 11 00 11 00 11 00 empty initialized=> no phase inverter is required

47 00 00 K1 K2K3 00 00 Operation K4 11 1 φ1φ1 2 11 3 φ1φ1 4 00  φ1 00 Empty Non-linear Pipeline: Forward Path (2) => different phase inverter placement for full and empty initialized circuit

48 Conclusion: Feedforward Paths K1 K2K3 00 11 00 K4 11 00 11 φ-inv 00 Full initialized Empty initialized Ensure consistent inputs Phase inverter placement depends on initialisation K1 K2K3 00 00 K4 00 00 00 00 switching sequence

49 K1 K2K3 00 00 Operation K4 11 11 00 11 φ-inv  0 Non-linear Pipeline: Feedback Path (1)

50 K1 K2K3 00 00 Operation K4 11 11 11 00 φ-inv 11 1 00 2 11 3 00  1  0 Non-linear Pipeline: Feedback Path (2)

51 K1 K2K3 00 00 Operation K4 11 11 00 Non-linear Pipeline: Feedback Path (2) 00 00

52 K1 K2K3 00 00 Operation K4 00 Non-linear Pipeline: Feedback Path (2) 00 00  1 11 00 φ-inv

53 Conclusion: Feedback Paths Full initialized Empty initialized Ensure inconsistent inputs Phase inverter placement depends on initialisation switching sequence K1 K2K3 00 00 K4 11 11 00 K1 K2K3 00 00 K4 00 00 00 11 00 φ-inv

54 Conceptional Difference Feedback and Forward full Init: Ensure inconsistent inputs s. seq. K1 K2K3K4 s. seq. empty full K1 K2K3K4 empty s. seq. Init: Ensure consistent inputs Feedback path Forward path  a well defined event sequence !!! Either both or no input switch before K4 can fire Only one input switches before K1 can fire

55 Conceptional Difference Feedback and Forward full Ensure inconsistent inputs s. seq. K1 K2K3K4 s. seq. empty full K1 K2K3K4 empty s. seq. Ensure consistent inputs Feedback path Forward path data flow data flow feedbackdata flow forward  a well defined event sequence !!! Either both or no input switch before K4 can fire Only one input switches before K1 can fire

56 “Invalid” Feedback Path K1 φ-Inv always required no inversion of the request signal Definition: A valid feedback path must contain at least two registers nodes

57 Phase Inverter Placement Phase inverter are required to avoid deadlocks Phase inverter are required to avoid deadlocks Placement of phase inverters depends on: Placement of phase inverters depends on: Topology of the circuit Topology of the circuit Type and number of components inside valid feedbacks Type and number of components inside valid feedbacks Dynamic behavior Dynamic behavior Initialization Initialization Handshake signals have to be considered Handshake signals have to be considered Processing speed depends on initialization Processing speed depends on initialization More configurations are possible More configurations are possible

58 Identification of Phase Inverter: Generic Approach Systematic approach: 1. Identify combinational logic and registers/memories 2. Generate a graph representation of the circuit based on registers /memories 3. Eliminate inconsistent inputs by phase inverter insertion 4. Identify feedback path (=> loops) 5. Eliminate invalid feedbacks 6. Add a phase invert to each remaining loop (phase inverter can be shared among feedback paths)

59 Data Paths Reg f(x ) Reg DESEL. NODE f(x ) Reg f(x ) Reg SEL. NODE DEMUX FORK MUX MERGE

60 Example: Merging Data Paths Reg f(x ) Reg MUXMUX f(x ) Ack DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) Assumption Acknowledge is activated when selected data is available Differnent delay for both data paths DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) ∆1 ∆3 DW1 (  0) Step 1: In1 selected DW1 (  0) DW2 (  1) Step 2: In1 selected Step 3: In2 selected DW1 (  0) In1 In2 Ack DW2 (  1)

61 Example: Merging Data Paths (2) Depending on the circuit functionality: a) both inputs have to be consumed in each processing step  Ensure that the difference in processing speed in all data paths is small enough  Wait until all input are available (even the unused ones) b) all inputs have to be processed and consumed  Insert synchronizer circuits to adjust the phase encoding of the input signals

62 DEMUX Example f(x ) Reg f(x ) Reg DEMUXDEMUX f(x ) DW1 (  0) DW4 (  1)DW3 (  0) DW2 (  1) Avoid loss of synchronization Dummy data approach Synchronizer circuits Performance considerations Wait on ack. of data paths Wait only on required ack.

63 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools √ Circuit design √ Pipeline √ Data Paths Current status Current status Conclusion Conclusion

64 Asynchronous ASPEAR : Asynchronous SPEAR

65 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) Our Current Status

66 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations Our Current Status

67 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) Our Current Status APEX 20KC

68 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) investigation of DI investigation of DI experimental robustness assessment: (fault-injection: synchronous design versus asyn) experimental robustness assessment: (fault-injection: synchronous design versus asyn) Our Current Status

69 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools √ Circuit design √ Pipeline √ Data Paths √ Current status Conclusion Conclusion

70 Conclusion FSL Four State Logic (FSL) Four State Logic (FSL) Delay insensitive logic Delay insensitive logic Two Representation Low/High => Dual rail encoding Two Representation Low/High => Dual rail encoding Even combinational gate require memory elements Even combinational gate require memory elements Circuit design with FSL Circuit design with FSL Pipelines: Linear and non-linear Pipelines: Linear and non-linear Data paths: Splitt and merge Data paths: Splitt and merge FSL based processor (SPEAR) available FSL based processor (SPEAR) available