De-synchronization: from synchronous to asynchronous Based on the paper: Blunno, Cortadella, Kondratyev, Lavagno, Lwin, Sotiriou, Handshake protocols for de-synchronization, ASYNC 2004.
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
Asynchronous De-synchronize CLK Synchronous CLK
Synchronous circuit MS flip-flop L L L L 1 1 CLK L L
De-synchronization L L L L 1 1 C L L
De-synchronization Distributed controllers substitute the clock network C C C C C C The data path remains intact !
Design flow Think synchronous Design synchronous: one clock and edge-triggered flip-flops De-synchronize (automatically) Run it asynchronously
Prior work Micropipelines (Sutherland, 1989) Local generation of clocks Varshavsky et al., 1995 Kol and Ginosar, 1996 Theseus Logic (Ligthart et al., 2000) Commercial HDL synthesis tools Direct translation and special registers Phased logic (Linder and Harden, 1996) (Reese, Thornton, Traver, 2003) Conceptually similar Different handshake protocol (2 phase vs. 4 phase)
Automatic de-synchronization Devise an automatic method for de-synchronization Identify a subclass of synchronous circuits suitable for de-synchronization Formally prove correctness
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
Synchronous flow
De-synchronized flow
+
Flow equivalence [Guernic, Talpin, Lann, 2003]
A B
De-synchronized behavior Flow equivalence CLK A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 De-synchronized behavior
De-synchronized behavior Flow equivalence CLK A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 Synchronous behavior A 1 3 0 2 1 5 3 1 6 0 B 5 1 2 3 1 4 2 4 3 1 De-synchronized behavior
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
L L L L 1 1 C L L
C C C C C C
L C
A B C D A+ B- C+ D- A- B+ C- D+ A latch cannot read another data item until the successor has captured the current one
A B 1 C D A+ B- C+ D- A- B+ C- D+ A latch cannot read another data item until the successor has captured the current one
A B C D A+ B- C+ D- A- B+ C- D+ A latch cannot read another data item until the successor has captured the current one
A 1 B C D A+ B- C+ D- A- B+ C- D+ A latch cannot read another data item until the successor has captured the current one
A B C D A+ B- C+ D- A- B+ C- D+
A B C D 1 A+ B- C+ D- A- B+ C- D+
A B C D A+ B- C+ D- A- B+ C- D+
A B C 1 D A+ B- C+ D- A- B+ C- D+ A latch cannot become opaque before having captured the data item from its predecessor
A B 1 C 1 D A+ B- C+ D- A- B+ C- D+ A latch cannot become opaque before having captured the data item from its predecessor
A B C 1 D A+ B- C+ D- A- B+ C- D+ A latch cannot become opaque before having captured the data item from its predecessor
A B C D A+ B- C+ D- A- B+ C- D+ A latch cannot become opaque before having captured the data item from its predecessor
A B C D A+ B- C+ D- A- B+ C- D+
A B C D A+ B+ C+ D+ A- B- C- D- A B
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
Can we increase concurrency ? A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- not flow-equivalent
A B A B data overrun A B data lost
Can we reduce concurrency ? How much ? A+ B+ A- B- Can we reduce concurrency ? How much ?
A+ B+ A- B- (8 states) A+ B+ A- B- A+ B+ A- B- (6 states) A+ B+ A- B- (5 states) A+ B+ A- B- (4 states)
A B de-synchronization model A B A B fully decoupled (Furber & Day) GasP, IPCMOS A B semi-decoupled (Furber & Day) A B A B simple 4-phase non-overlapping
A+ B+ A- B- A+ B+ A- B- de-synchronization model A+ B+ A- B- A+ B+ A- B- fully decoupled (Furber & Day) GasP, IPCMOS simple 4-phase non-overlapping A+ B+ A- B- A+ B+ A- B- semi-decoupled (Furber & Day)
4-phase latch controllers Lt Lt Rin Rout Rin Rout Ain Aout Ain Aout Furber and Day, IEEE Trans. VLSI, June 1996 Implementation note: Lt=0 (transparent), Lt=1 (opaque)
4-phase latch controllers Rin+ Rout+ Lt+ Ain+ Aout+ ? Lt Rin- Rin Rout Rout- Lt- Ain Aout Ain- Aout-
4-phase latch controllers Rin+ Rout+ Ain+ Lt+ Aout+ Lt Rin- Rout- Rin Rout Ain Aout Ain- Lt- Aout- Simple 4-phase controller
4-phase latch controllers Rin+ Rout+ Ain+ Lt+ Aout+ Rin- Rout- Ain- Lt- Aout- Simple 4-phase controller
4-phase latch controllers Rin+ A+ Rout+ Ain+ Lt+ Aout+ Lt Rin- A- Rout- Rin Rout Ain Aout Ain- Lt- Aout- Semi-decoupled controller
4-phase latch controllers Rin+ A+ Rout+ Ain+ Lt+ Aout+ Rin- A- Rout- Ain- Lt- Aout- Semi-decoupled controller
4-phase latch controllers Rin+ A+ Rout+ Ain+ Lt+ Aout+ B+ Lt Rin- A- Rout- Rin Rout Ain Aout Ain- Lt- Aout- B- Fully decoupled controller
4-phase latch controllers Rin+ A+ Rout+ Ain+ Lt+ Aout+ B+ Rin- A- Rout- Ain- Lt- Aout- B- Fully decoupled controller
4-phase latch controllers (state graphs) Semi-decoupled controller Fully decoupled controller
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao Ri+ A- Rx+ B- Ro+ Ai+ Ax+ Ao+ Ri- A+ Rx- B+ Ro- Ai- Ax- Ao- (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
(semi-decoupled 4-phase protocol) B Rx Ri Ro cntrl cntrl Ax Ai Ao A- B- A+ B+ (semi-decoupled 4-phase protocol)
A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B-
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
Which protocols are valid for de-synchronization ?
Theorem: the de-synchronization protocol preserves flow-equivalence A+ B+ A- B- Theorem: the de-synchronization protocol preserves flow-equivalence Proof: by induction on the length of the traces Induction hypothesis: same latch values at reset Induction step: same values at cycle i same values at cycle i+1
A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B-
Theorem: any reduction in concurrency preserves flow-equivalence A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B- A+ B+ A- B-
Any hybrid approach preserves flow-equivalence ! Semi- decoupled Fully decoupled non- overlapping
A B C D A+ B+ C+ D+ A- B- C- D-
Flow-equivalence is preserved, … but … A+ B+ C+ D+ A- B- C- D- semi- decoupled non- overlapping fully decoupled Flow-equivalence is preserved, … but …
Liveness Preservation of flow-equivalence: all the generated traces are equivalent Are all traces generated ? (Is the marked graph live ?) Not always !
Semi-decoupled 4-phase handshake protocol A+ B+ C+ D+ A- B- C- D- Semi-decoupled 4-phase handshake protocol Liveness: all cycles have at least one token [Commoner 1971]
Simple 4-phase handshake protocol A+ B+ C+ D+ A- B- C- D- Simple 4-phase handshake protocol
Results about liveness At least three latches in a ring are required with only one data token circulating [Muller 1962] Theorem (this paper): any hybrid combination of protocols is live if the simple 4-phase protocol is not used Proof: any cycle has at least one token
Valid for de-synchronization A+ B+ A- B- A+ B+ A- B- model A+ B+ A- B- A+ B+ A- B- fully decoupled (Furber & Day) GasP, IPCMOS simple 4-phase non-overlapping A+ B+ A- B- A+ B+ A- B- semi-decoupled (Furber & Day)
Outline What is de-synchronization ? Behavioral equivalence 4-phase protocols for de-synchronization Concurrency Correctness An example
Async DLX block diagram
= Synchronous RTL Synchronous Desynchronized Cycle: 4.4ns Power: 70.9mW Area: 372,656m Cycle: 4.45ns Power: 71.2mW Area: 378,058m All numbers are after Placement & Routing Total of 1500 flip-flops, 3000 latches DE-SYNC design includes 5 controllers, each driving 2 clock trees Power numbers include the clock tree Technology: UCM/Virtual Silicon 0.18 µm
Discussion The de-synchronization model provides an abstraction of the timing behavior
Exploration of the design space [2,3] [1,2] [8,9] [5,7] [3,5] [2,4] A B E F G C D [0,0] [3,5] [5,7] [2,3] [2,4] [1,2] [8,9] Timing analysis Exploration of the design space
Conclusions EDA tools require a formal support (they must work for all circuits) A complete characterization of 4-phase protocols has been presented (partial order based on concurrency) Design flow developed at Cadence Berkeley Labs Automated from gate netlist Static timing analysis to derive matched delays Constrained P&R to meet timing constraints