Download presentation
Presentation is loading. Please wait.
1
Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.)
2
Network of Computing Units In Out B1 B3 B2
3
Network of Computing Units In Out B1 B3 B2
4
Network of Computing Units In Out B1 B3 B2
5
Latency-insensitive (elastic) system In Out B1 B3 B2 Every block only makes one step when all inputs are valid
6
Why Scalable Modular (Plug & Play) Tolerance to variable latency –Communication –Computation Not asynchronous –Use existing design paradigms –CAD tools
7
Outline The cost of elasticity SELF: an elastic protocol –Basic implementation (linear pipelines) –General netlists (forks and joins) –Formal models and verification Synthesis of elastic architectures Related work
8
Elastic block Data Valid Stop Control Core CLK Gated clock What’s the cost of elasticity?
9
Communication channel receiversender Data Long wires: slow transmission
10
Pipelined communication senderreceiver Data
11
senderreceiver Data Pipelined communication
12
senderreceiver Data How about if the sender does not always send valid data? Pipelined communication
13
The Valid bit senderreceiver Data Valid
14
The Valid bit senderreceiver Data Valid Data Valid
15
The Valid bit sender Data Valid receiver Data Valid
16
The Valid bit sender Data Valid receiver Data Valid
17
Data Valid The Valid bit senderreceiver Data Valid How about if the receiver is not always ready ?
18
The Stop bit 00000 sender Data Valid Stop receiver Data Valid Stop
19
The Stop bit 11000 sender Data Valid Stop receiver Data Valid Stop
20
The Stop bit 11100 sender Data Valid Stop receiver Data Valid Stop
21
The Stop bit 11111 sender Data Valid Stop receiver Data Valid Stop Back-pressure
22
The Stop bit 10000 sender Data Valid Stop receiver Data Valid Stop Long combinational path
23
Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S
24
Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S
25
Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S
26
Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender V S V S V S V S
27
Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S
28
Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S
29
Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S
30
Carloni’s relay stations (double storage) main aux shell pearl sender shell pearl receiver V S V S V S V S
31
Carloni’s relay stations (double storage) main aux shell pearl receiver shell pearl sender Handshakes with short wires Double storage required V S V S V S V S
32
Proposal: an elastic protocol SELF (Synchronous ELastic Flow) Simple and provably correct Data-path with no overhead in: –Area –Latency –Energy Negligible control overhead Fine-grain elasticity
33
Flip-flops vs. latches senderreceiver 1 cycle FF
34
Flip-flops vs. latches senderreceiver 1 cycle HLHL
35
Flip-flops vs. latches senderreceiver 1 cycle HLHL
36
Flip-flops vs. latches senderreceiver 1 cycle HLHL
37
Flip-flops vs. latches senderreceiver 1 cycle HLHL
38
Flip-flops vs. latches senderreceiver 1 cycle HLHL
39
Flip-flops vs. latches senderreceiver 1 cycle HLHL
40
Flip-flops vs. latches senderreceiver 1 cycle HLHL Flip-flops already have a double storage capability, but …
41
Flip-flops vs. latches senderreceiver 1 cycle HLHL Not allowed in conventional FF-based design !
42
Flip-flops vs. latches senderreceiver 1 cycle HLLH Let’s make the master/slave latches independent
43
Flip-flops vs. latches senderreceiver HLHL ½ cycle Let’s make the master/slave latches independent Only half of the latches (H or L) can move tokens
44
Elastic buffer keeps data while stop is in flight W1R1 W2R1 W1R2 W2R2 Cannot be done with Single Edge Flops without double pumping Use latches inside MS Carloni’s relay station belongs to this class
45
Shorthand notation (clock lines not shown) D Q clk En …
46
SELF (linear communication) senderreceiver V V V V S S S S En 11 Data Valid Stop Data Valid Stop 1 1
47
SELF senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0
48
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
49
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
50
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
51
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
52
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF
53
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF
54
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF
55
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF
56
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 0 0 SELF
57
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 1 SELF
58
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
59
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
60
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
61
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
62
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
63
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
64
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
65
senderreceiver V V V V S S S S En Data Valid Stop 1 1 Data Valid Stop SELF
66
senderreceiver V V V V S S S S En Data Valid Stop 1 0 Data Valid Stop SELF
67
senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF
68
senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF
69
senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF
70
senderreceiver V V V V S S S S En 1 0 Data Valid Stop Data Valid Stop SELF
71
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
72
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
73
senderreceiver V V V V S S S S En Data Valid Stop Data Valid Stop 1 0 SELF
74
The protocol SenderReceiver Data Valid Stop Idle cycle: Valid = 0 0
75
The protocol SenderReceiver Data Valid Stop Transfer cycle: Valid = 1 Stop = 0 1 0 D
76
The protocol SenderReceiver Data Valid Stop Retry cycle: Valid = 1 Stop = 1 1 1 D Persistency: G [ V S (Data=D) Next (V Data=D) ] Persistency: G [ V S (Data=D) Next (V Data=D) ]
77
Retry Transfer The protocol SenderReceiver Data Valid Stop Data Valid Stop * D D * C C C B * A 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 0 0
78
Elastic Half Buffer SiSiSiSi En i ViViViVi S i-1 V i-1 Data Latch EHB
79
Join EHB + V1V1 V2V2 S1S1 S2S2 V S
80
Lazy Fork V1V1 V2V2 S1S1 S2S2 V S
81
Eager Fork V1V1 V2V2 S1S1 S2S2 ^ ^ V S
82
Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB
83
Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB Enable signal to data latches
84
Elastic combinational paths Fork Join Join / Fork Wire EBEBEB EB
85
Elastic buffer: formal model … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Buffer [ 0.. ] Initial state: rd = wr = 0 Invariant: wr rd
86
Elastic buffer: formal model … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Liveness properties (finite unbounded latencies) Finite forward latency: G (rd wr F Vout) Finite backward latency : G( Sout F Sin)
87
Formal verification … i i+1 i+k rdwr Dout Vout Sout Din Vin Sin Din Vin Sin Dout Vout Sout Implementation
88
Formal verification The abstract FSM model is appropriate for compositional verification Verification of implementations with model checking (1-bit abstractions of the datapath) –LTL specs + NuSMV –Buffer is a refinement of the spec –In-order data-transmission –Correct synchronization of fork/join structures –Absence of deadlocks
89
Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM) Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM)
90
Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM) Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM)
91
Formal verification Din Vin Sin Dout Vout Sout Abstract model (NFSM) Din Vin Sin Abstract model (NFSM) Dout Vout Sout Abstract model (NFSM) Assuming the same initial contents (e.g. empty)
92
Observational equivalence D: a b c d e f g h i j k … Synchronous: Elastic: D: a a b b b c d e e f g g h i i i j k … D: a a b b b c d e e f g g h i i i j k … En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …
93
Elasticization Synchronous Elastic
94
CLK
95
CLK PC IF/IDID/EXEX/MEMMEM/WB JOIN JOIN FORK FORK
96
V S CLK V S V S V S V S JOINJOIN JOINJOIN FORKFORK FORK
97
1 0 CLK 1 0 1 0 1 0 1 0 JOINJOIN JOINJOIN FORKFORK
98
1 0 CLK 1 0 1 0 1 0 1 0 JOINJOIN JOINJOIN FORKFORK 0 0
99
1 0 1 0 1 0 1 0 1 0 Elastic control layer Generation of gated clocks CLK
100
Variable-latency Units [0 - k] cycles VS done go
101
Variable-latency units Telescopic units: –1 cycle for fast operations –2 cycles for slow operations Examples: –Short / long additions (carry propagation) –A × 0, A / 1 –Dynamic changes in latency (fast if cold, slow if hot)
102
Microarchitectural exploration Bubble insertion + Variable-latency units –May improve performance More bubbles but reduces cycle time –Reduce power Units designed for most frequent input data Exploration at fine-granularity
103
Some related work Asynchronous design –Micropipelines (Sutherland) –Rings (Williams, Sparso) –CHP and slack-elasticity (Martin, Burns, Manohar et al.) Latency insensitive design –Carloni and a few follow-ups (large overhead) –Wire pipelining: Svensson, Nookala, Casu, … Interlock pipelines (H. Jacobson et al.) De-synchronization –J. Cortadella et al. –V. Varshavsky Synchronous implementations of CSP –J. O’Leary et al. –A. Peeters et al.
104
Summary SELF: a specific protocol and implementation for elastic systems with very small overhead buffering Compositional theory proving correctness (Krstic et al., FMCAD’06) Library of controllers has been designed and their correctness verified Elasticization CAD in progress New micro-architectural opportunities based on bubbles and variable latency units
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.