Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology.

Similar presentations


Presentation on theme: "1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology."— Presentation transcript:

1 1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology

2 2 Outline Introduction Introduction Principles Principles Basic gates Basic gates Design flow and tools Design flow and tools Circuit design with FSL Circuit design with FSL Pipeline Pipeline Data paths Data paths Current status Current status Conclusion Conclusion

3 3 Fundamental Design Problem Ensure a lossles data flow A SRCSNK f(x) Issue ConditionCapture Condition only valid and consistent data has to be consumed new data can be issued, when the previous one was already consumed

4 4 Synchronous Approach SRCSNK f(x) A T Clk Global Time Reference => indirect conclusion

5 5 Asynchronous Circuits SRCSNK f(x) local handshake protocol Acknowledge Request

6 6 1x „1“ | 2x „0“ Four State Logic SRC SNK f(x) ∆t => delay insensitive 3x „1“ | 2x „0“ 2x „1“ | 1x „0“ => additional information required => SNK must able to recognize when data is valid and consistent

7 7 FSL encoding  Use 2 codes per logic value X X.a X.b need two-rail coding: = L = H = h = l L => H (0,0) => (1,1) (0,0) ? (1,1) (1,0) (0,1)

8 8 Completion Detection SRC SNK HlLhHhLl consistent data

9 9 FSL Gates Combinational Gates Combinational Gates AND, OR, INV, … AND, OR, INV, …

10 10 Phase Transition f(x) ? φ0φ0φ1φ1φ0φ0φ1φ1 We have to ensure, that:  unconsistent input vectors are not processed  f(x) is a monotonic function

11 11 Combinational Functions Variant 1:  Hazard free impl.  Consistency detector CDCD Processing of unconsistent inputs inevitable due to internal skew Variant 2:  each basic gate processes only const. inputs  function of each basic gate is monotonic f(x) local intelligence => hardware overhead

12 12 Consistent in φ1 Combinational Gates And And Or Or Inv Inv (MUX) (MUX) (XOR) (XOR) … HL LL hl ll ** ** ** ** H L h l HLhlY E1 E2 Truth Table FSL-AND FSL AND E1 E2 Y * keep old value Consistent in φ0

13 13 E1.a E1.b E1 E2.a E2.b E2 Y.a Y.b Y Mem f a (x) f b (x) Challenge: preserve the delay insensitive for implementation Gate Template

14 14 H (1,0) h (1,1)HIGH L (0,1)l (0,0) LOW  1 (a,b)  0 (a,b) Inverter Inverter Is the inverter delay insensitive? rail b rail a

15 15 FSL Gates √ Combinational Gates √ AND, OR, INV, … Register Register

16 16 Completion Detection SRC SNK HlLhHhLl consistent data

17 17 Completion Detection SRC SNK HlLhHhLl consistent data Register Latch CMPD enable

18 18 FSL Register LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH Latch CMPD Latch CMPD  additional handshake signals are required

19 19 FSL Register Latch CMPD Is the output data already consumed ? input data output data  handshake signal from the next register required CTRL LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH

20 20 FSL Register Latch CMPD input data output data CTRL When do we close the latch again ?  when the input data was taken over

21 21 FSL Register Latch CMPD input data output data CTRL Input data is ready to be consumed when all input signals carry the same phase Input data is consumed when input and output carry the same phase Output data was consumed when the output of the next register carry the same phase as the current output data Only phase detectors are required to generate all handshake signals => phase detector

22 22 H (1,1) h (1,0) HIGH L (0,0) l (0,1) LOW  1 (a,b)  0 (a,b) XOR ‘0‘ XOR ‘1‘ φ–Detector φ–Detector

23 23 Ctrl Latch φ in input data consistent and valid output data already consumed input data input data consumed freeze the latches again data in data out c-done pass FSL Register φ out

24 24 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector Memory Memory

25 25 Memory Two options: Store directly FSL signals Store directly FSL signals 4 bits per logical value 4 bits per logical value  huge overhead but delay insensitive (in theory) Store only logical information Store only logical information 1 bits per logical value 1 bits per logical value  low overhead but not delay insensitive

26 26 Memory Standard RAM φ-det CONVCONV CONVCONV  FSL_LogicSTD_LogicFSL_Logic STD_Logic

27 27 rail a Std → FSL rail b Std FSL 1 HIGH 0LOW φ–Converter φ–Converter FSL logic Std logic H (1,1)h (1,0) HIGH L (0,0)l (0,1) LOW  1 (a,b)  0 (a,b) Sig. XORXOR requested φ Std rail a rail b FSL FSL → Std

28 28 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector √ Memory √ φ–Converter ( √ φ–Converter (FSL→Std, Std→FSL) φ–Inverter φ–Inverter

29 29 rail b rail a H (1,0) h (1,1)HIGH L (0,1)l (0,0) LOW  1 (a,b)  0 (a,b) φ–Inverter φ–Inverter => simple inversion of rail b

30 30 FSL Gates √ Combinational Gates √ AND, OR, INV, … √ Register √ Latch √ φ–Detector √ Memory √ φ–Converter ( √ φ–Converter (FSL→Std, Std→FSL) √ φ–Inverter

31 31 Design Flow and tools Requirements: Standard tools (Synopsys/Quartus) Standard tools (Synopsys/Quartus) Modelling on RTL level Modelling on RTL level Support for simulation and synthesis Support for simulation and synthesis Target platform FPGA Target platform FPGA

32 32 Adaptation: VHDL Definition of an FSL_logic type Definition of an FSL_logic type Redefinintion of std_1164 package Redefinintion of std_1164 package Additional functions Additional functions φ_det, φ_inv, conversion functions φ_det, φ_inv, conversion functions stable stable => Modelling FSL circuits on RTL level

33 33 Example: Program Counter stable_signals <= AddrInc&JmpExe&JmpAddr; pc_next: process begin stable(stable_signals); if JmpExe = ‘H‘ or JmpExe = ‘l‘ then AddrNxt <= JmpAddr; else AddrNxt <= AddrInc; end if; end process pc_next; JmpExe JmpAddr AddrInc AddrNxt f(x)

34 34 Adaptation: Synthesis (1) FSL Target FSL Library FSL Target FSL Library FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … Synthesis with FSL Target Library Synthesis with FSL Target Library  Netlist Package FSL_Rail Package FSL_Rail Definition FSL_rail_logic :Record (a,b) Definition FSL_rail_logic :Record (a,b) FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … FSL AND, FSL OR, FSL INV, FSL Register, φ- Detector … Netlist: Replace FSL with FSL_rail Netlist: Replace FSL with FSL_rail Synthesis with FPGA Target Library Synthesis with FPGA Target Library L & L H (0,0) & (1,1)

35 35 Adapation: Synthesis (2) conventional design flow FSL design flow

36 36 Adaptation: Simulation Same testbench for FSL_logic and FSL_rail_logic ciruits Same testbench for FSL_logic and FSL_rail_logic ciruits => Verification of FSL circuits Testbench FSL Response FSL Logic FSL Stimuli FSL Rail Logic Conversion

37 37 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools Circuit design Circuit design Pipeline Pipeline Data Paths Data Paths Current status Current status Conclusion Conclusion

38 38 (Linear) Pipeline LATCHLATCH f(x) LATCHLATCH LATCHLATCH LATCHLATCH K1 K2K3K4 00 11 00 11 00 Full initialized 00 00 00 00 00 Empty initialized

39 39 Bubble Concept (1) Progress is possible when a circuit contains at least one bubble K1 K2K3 00 11 00 K4 11 00 K3 K1 K2 11 11 00 11 00 bubble identical values

40 40 Bubble Concept (2) Initialization => ensure that the circuit contains at least one bubble Initialization => ensure that the circuit contains at least one bubble More than one bubble => higher processing speed More than one bubble => higher processing speed K1 K2K3K4K5K6K7 11 00 11 00 00 11 11 00

41 41 Bubble Concept (3) Bubbles can be consumed: Bubbles can be consumed: Slow SRC → empty pipeline Slow SRC → empty pipeline Slow SNK → full pipeline Slow SNK → full pipeline K2K3K4K5K6 11 00 11 00 11 SRCSNK 11 bubble

42 42 Non-linear Pipeline Definition: A non-linear pipeline is a pipeline which contains at least one feedback or forward path K1 K2K3K4K5K6 forward path feedback path Consequences:  Internal regulation  bubble cannot be consumed  Potential sources of deadlocks

43 43 Non-linear Pipeline: Forward Path (1) K1 K2K3 00 11 00 K4 11 11 00 11 φ-inv 00 request SRC SNK

44 44 K1 K2K3 00 00 Operation K4 11 11 00 11 φ-inv 11 1 00 2 11 3 00 4 00  0  1 Non-linear Pipeline: Forward Path (2)

45 45 K1 K2K3 Operation K4 φ-inv 00 5 00 11 00 00 11 Non-linear Pipeline: Forward Path (2) 11 11

46 46 Empty Non-linear Pipeline: Forward Path (1) K1 K2K3K4 φ-inv 11 00 00 00 00 00 00 00 11 00 11 00 11 00 empty initialized=> no phase inverter is required

47 47 00 00 K1 K2K3 00 00 Operation K4 11 1 φ1φ1 2 11 3 φ1φ1 4 00  φ1 00 Empty Non-linear Pipeline: Forward Path (2) => different phase inverter placement for full and empty initialized circuit

48 48 Conclusion: Feedforward Paths K1 K2K3 00 11 00 K4 11 00 11 φ-inv 00 Full initialized Empty initialized Ensure consistent inputs Phase inverter placement depends on initialisation K1 K2K3 00 00 K4 00 00 00 00 switching sequence

49 49 K1 K2K3 00 00 Operation K4 11 11 00 11 φ-inv  0 Non-linear Pipeline: Feedback Path (1)

50 50 K1 K2K3 00 00 Operation K4 11 11 11 00 φ-inv 11 1 00 2 11 3 00  1  0 Non-linear Pipeline: Feedback Path (2)

51 51 K1 K2K3 00 00 Operation K4 11 11 00 Non-linear Pipeline: Feedback Path (2) 00 00

52 52 K1 K2K3 00 00 Operation K4 00 Non-linear Pipeline: Feedback Path (2) 00 00  1 11 00 φ-inv

53 53 Conclusion: Feedback Paths Full initialized Empty initialized Ensure inconsistent inputs Phase inverter placement depends on initialisation switching sequence K1 K2K3 00 00 K4 11 11 00 K1 K2K3 00 00 K4 00 00 00 11 00 φ-inv

54 54 Conceptional Difference Feedback and Forward full Init: Ensure inconsistent inputs s. seq. K1 K2K3K4 s. seq. empty full K1 K2K3K4 empty s. seq. Init: Ensure consistent inputs Feedback path Forward path  a well defined event sequence !!! Either both or no input switch before K4 can fire Only one input switches before K1 can fire

55 55 Conceptional Difference Feedback and Forward full Ensure inconsistent inputs s. seq. K1 K2K3K4 s. seq. empty full K1 K2K3K4 empty s. seq. Ensure consistent inputs Feedback path Forward path data flow data flow feedbackdata flow forward  a well defined event sequence !!! Either both or no input switch before K4 can fire Only one input switches before K1 can fire

56 56 “Invalid” Feedback Path K1 φ-Inv always required no inversion of the request signal Definition: A valid feedback path must contain at least two registers nodes

57 57 Phase Inverter Placement Phase inverter are required to avoid deadlocks Phase inverter are required to avoid deadlocks Placement of phase inverters depends on: Placement of phase inverters depends on: Topology of the circuit Topology of the circuit Type and number of components inside valid feedbacks Type and number of components inside valid feedbacks Dynamic behavior Dynamic behavior Initialization Initialization Handshake signals have to be considered Handshake signals have to be considered Processing speed depends on initialization Processing speed depends on initialization More configurations are possible More configurations are possible

58 58 Identification of Phase Inverter: Generic Approach Systematic approach: 1. Identify combinational logic and registers/memories 2. Generate a graph representation of the circuit based on registers /memories 3. Eliminate inconsistent inputs by phase inverter insertion 4. Identify feedback path (=> loops) 5. Eliminate invalid feedbacks 6. Add a phase invert to each remaining loop (phase inverter can be shared among feedback paths)

59 59 Data Paths Reg f(x ) Reg DESEL. NODE f(x ) Reg f(x ) Reg SEL. NODE DEMUX FORK MUX MERGE

60 60 Example: Merging Data Paths Reg f(x ) Reg MUXMUX f(x ) Ack DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) Assumption Acknowledge is activated when selected data is available Differnent delay for both data paths DW1 (  0)DW4 (  1)DW2 (  1)DW3 (  0) ∆1 ∆3 DW1 (  0) Step 1: In1 selected DW1 (  0) DW2 (  1) Step 2: In1 selected Step 3: In2 selected DW1 (  0) In1 In2 Ack DW2 (  1)

61 61 Example: Merging Data Paths (2) Depending on the circuit functionality: a) both inputs have to be consumed in each processing step  Ensure that the difference in processing speed in all data paths is small enough  Wait until all input are available (even the unused ones) b) all inputs have to be processed and consumed  Insert synchronizer circuits to adjust the phase encoding of the input signals

62 62 DEMUX Example f(x ) Reg f(x ) Reg DEMUXDEMUX f(x ) DW1 (  0) DW4 (  1)DW3 (  0) DW2 (  1) Avoid loss of synchronization Dummy data approach Synchronizer circuits Performance considerations Wait on ack. of data paths Wait only on required ack.

63 63 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools √ Circuit design √ Pipeline √ Data Paths Current status Current status Conclusion Conclusion

64 64 Asynchronous ASPEAR : Asynchronous SPEAR

65 65 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) Our Current Status

66 66 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations Our Current Status

67 67 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) theoretical investigations theoretical investigations working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) Our Current Status APEX 20KC

68 68 semi-automated design flow (based on Synopsys) semi-automated design flow (based on Synopsys) working 16-bit processor (on FPGA platform) working 16-bit processor (on FPGA platform) investigation of DI investigation of DI experimental robustness assessment: (fault-injection: synchronous design versus asyn) experimental robustness assessment: (fault-injection: synchronous design versus asyn) Our Current Status

69 69 Outline √ Introduction √ Principles √ Basic gates √ Design flow and tools √ Circuit design √ Pipeline √ Data Paths √ Current status Conclusion Conclusion

70 70 Conclusion FSL Four State Logic (FSL) Four State Logic (FSL) Delay insensitive logic Delay insensitive logic Two Representation Low/High => Dual rail encoding Two Representation Low/High => Dual rail encoding Even combinational gate require memory elements Even combinational gate require memory elements Circuit design with FSL Circuit design with FSL Pipelines: Linear and non-linear Pipelines: Linear and non-linear Data paths: Splitt and merge Data paths: Splitt and merge FSL based processor (SPEAR) available FSL based processor (SPEAR) available


Download ppt "1 Advanced Digital Design Asynchronous Design: FSL by A. Steininger and M. Delvai Vienna University of Technology."

Similar presentations


Ads by Google