Download presentation
Presentation is loading. Please wait.
Published byUlysses Smail Modified over 9 years ago
1
Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013
2
Goals Convince ourselves that: – designing an asynchronous circuit is easy – synchronous and asynchronous circuits are similar – asynchronous circuits bring new advantages Not to cover exotic asynchronous schemes Elasticity can also be synchronous EMicro 2013Elastic circuits2
3
Clocking EMicro 2013Elastic circuits Nvidia Kepler TM GK110 How to distribute the clock? How to determine the clock frequency? How to implement robust communications? How to reduce and manage energy? 3 28nm, 7.1B transistors, 550mm 2, 2688 CUDA cores, Base clock: 836MHz, Memory clock: 6GHz
4
EMicro 2013Elastic circuits4
5
Outline Synchronous and Source-synchronous circuits Completion detection Handshaking Performance analysis Why asynchronous? Design automation Synchronous elasticity Globally-asynchronous Locally-synchronous EMicro 2013Elastic circuits5
6
Synchronous and Source-Synchronous
7
Synchronous circuit EMicro 2013Elastic circuits PLLPLL 7
8
1 1 2 2 1 1 1 1 2 2 Synchronous circuit EMicro 2013Elastic circuits CL Two competing paths: Launching path Capturing path Launching path < Capturing path + Period CLKtree + CL < CLKtree + Period CL < Period (no clock skew) 2 2PLLPLL 8
9
Source-synchronous EMicro 2013Elastic circuits CLK gen matched delay No global clock required More tolerance to PVT variations Period > longest combinational path Good for acyclic pipelines Launching path Capturing path 9
10
CLK gen ?? Source-synchronous with forks and joins EMicro 2013Elastic circuits How to synchronize incoming events? 10
11
C element (Muller 1959) EMicro 2013Elastic circuits C C A B C A B CABC000 01C 10C 111 11
12
C element (Muller 1959) EMicro 2013Elastic circuits A B C A B CABC000 01C 10C 111 MAJMAJ 12 (many implementations exist)
13
Completion detection
14
EMicro 2013Elastic circuits CLKgenCLKgen fixed delay The fixed delay must be longer than the worst-case logic delay (plus variability) Q: could we detect when a computation has completed ASAP ? 14
15
A 1 SP 0 SP 1 SP 1 SP Delay-insensitive codes: Dual Rail Dual rail: every bit encoded with two signals EMicro 2013Elastic circuits A.tA.fA 00Spacer 010 101 11Not used A.t A.f 15
16
Dual Rail AND gate EMicro 2013Elastic circuits ABC SP 0-0 -00 1 1 111 A B C A.t A.f B.t B.f C.t C.f 16
17
Dual Rail Inverter EMicro 2013Elastic circuits AZ SP 01 10 A.t A.f Z.t Z.f 17
18
Dual Rail AND/OR gate EMicro 2013Elastic circuits A B C A.t A.f B.t B.f C.t C.f A B C A.f A.t B.f B.t C.f C.t A B C 18
19
Dual rail: completion detection Dual-rail logic C done Completion detection tree EMicro 2013Elastic circuits19
20
Multi-input C element EMicro 2013Elastic circuits CC CC CC CC CC CC a1 a2 a3 a4 a5 a6 a7 c 20
21
Dual rail: completion detection EMicro 2013Elastic circuits ANDOR INV AND CLKgenCLKgen 21
22
Dual rail: completion detection EMicro 2013Elastic circuits ANDOR INV AND C C CLKgenCLKgen 22
23
Dual rail: operation EMicro 2013Elastic circuits ANDOR INV AND C C CLKgenCLKgen ResetResetComputeCompute ComputeComputeComputeComputeComputeCompute all internal signals For a correct operation, all internal signals should be reset before the compute phase: Use a more complex implementation of dual-rail (e.g., DIMS), or Have internal completion detection, or Use timing assumptions 23
24
Other DI codes There are many DI codes: – k-out-of n, Berger, Knuth, … Example: 1-out-of-4 – 2 bits with 4 wires – Same wire efficiency as DR – Less power consuming – Good for communication – Bad for logic EMicro 2013Elastic circuits WiresValue 0000Spacer 00010 00101 01002 10003 othersnot used 24
25
Single rail data vs. dual rail Some back-of-the-envelope estimations: EMicro 2013Elastic circuits Single rail Dual Rail Area12 Delay1<< 1 Static power12 Dynamic power< 0.22 Dual rail: Good for speed Large area High power comsumption 25
26
Handshaking
27
EMicro 2013Elastic circuits CLKgenCLKgen unknown delay Assume that the source module can provide data at any rate: When should the CLK generator send an event if the internal delays of the circuit are unknown? Solution:handshaking Solution: handshaking 27
28
Handshaking EMicro 2013Elastic circuits I have data I want data Data Request Acknowledge 28
29
Asynchronous elastic pipelineCC ReqInReqOut AckIn AckOut CC CC CC David Muller’s pipeline (late 50’s) Sutherland’s Micropipelines (Turing award, 1989) EMicro 2013Elastic circuits29
30
Multiple inputs and outputs EMicro 2013Elastic circuits30
31
Multiple inputs and outputs EMicro 2013Elastic circuits delaydelay 31
32
Mulitple inputs and outputs EMicro 2013Elastic circuits CC Req Ack Req Ack 32
33
Channel-based communication A channel contains data and handshake wires EMicro 2013Elastic circuits Single-Rail Data Req Ack Dual-Rail Data Ack 33
34
Push/pull channels Push: the sender initiates the communication Pull: the receiver initiates the communication EMicro 2013Elastic circuits SenderSender ReceiverReceiver Single-Rail Data Req (push) Ack Single-Rail Data Ack Req (pull) 34
35
Four-phase protocol Valid data on the active edge of Req Req/Ack must return to zero before the next transfer Different variations of the 4-phase protocol exist EMicro 2013Elastic circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 35
36
Two-phase protocol Every edge is active It may require double-edge triggered flip-flops or pulse generators EMicro 2013Elastic circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 36
37
How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC ???? 2-phase or 4-phase ? 37
38
How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC Pulse generator 2-phase 38
39
How to memorize? EMicro 2013Elastic circuits Combinational Logic LL LL delay CC CC 4-phase 39
40
Performance analysis
41
Ring oscillators EMicro 2013Elastic circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage) 41 1 2 3 4 5 6 7
42
Ring oscillators EMicro 2013Elastic circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage) 42 1 2 3 4 5 6 7
43
Global Rings EMicro 2013Elastic circuits43 CC CC CC CC CC CC
44
Global Rings EMicro 2013Elastic circuits Th = 1 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, 1998. 44
45
Global Rings EMicro 2013Elastic circuits Th = 2 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, 1998. 45
46
Global Rings EMicro 2013Elastic circuits Th = 3 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, 1998. 46
47
Global Rings EMicro 2013Elastic circuits Th = 1 / 6 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, 1998. 47
48
Global Rings EMicro 2013Elastic circuits 0N N/2 tokens Th 1/2 Ramamoorthy and Ho, 1980 Performance evaluation of asynchronous concurrent systems with Petri nets T. Williams et al., A self-timed chip for division, 1987 Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990 Manohar and Martin, Slack elasticity in concurrent computing, 1998. Token limited Bubble limited 48
49
A latch-based view of synchronous circuits EMicro 2013Elastic circuits Filp-flop = Master + Slave 49
50
Multiple Rings EMicro 2013Elastic circuits 2 / 4 2 / 5 5 / 7 ? It’s bubble limited !!! 2 / 7 50
51
Slack matching EMicro 2013Elastic circuits 2 / 4 2 / 5 2 / 7 ? 4 / 9 We can add as many bubbles as we want (but not tokens!) Slack matching can be solved optimally in polynomial time Slack matching is conceptually equivalent to buffer (FIFO) sizing or recycling 51
52
Performance analysis EMicro 2013Elastic circuits52 CC CC CC CC CC CC (Mean Cycle Ratio)
53
Latch-based design EMicro 2013Elastic circuits L3L3L2L2L1L1L4L4 L1 L2 L3 L4 53 Launching path Capturing path
54
Matched delays can be adjustable EMicro 2013Elastic circuits L3L3L2L2L1L1L4L4 54 delay selection Delays can be adjusted: At testing/boot time (to adjust to static variability) At runtime (to compensate dynamic variability)
55
Why asynchronous?
56
Exploiting elasticity CLK Rigid clock High performance Low energy EMicro 2013Elastic circuits56
57
High performance Exploiting elasticity Voltage Performance 1 V Rigid 2 GHz 1 GHz 500 MHz Low energy 0.9 V 0.8 V 0.7 V Rigid clock High performance Low energy Voltage scaling EMicro 2013Elastic circuits57
58
Voltage scaling and power savings-24%-14% 3 ARM926 cores on the same die EMicro 2013Elastic circuits58
59
Tracking variability EMicro 2013Elastic circuits59 matched delay
60
Tracking variability delay best typ worst multi-corner matched delay critical paths Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) EMicro 2013Elastic circuits60
61
Margins Gate and wire delays (typ) PPVVTTAgingAging PLL Jitter SkewSkew Rigid Clocks: Cycle period Gate and wire delays (typ) PPVVTTAgingAging Elastic Clocks: SkewSkew Cycle period Margin reduction Speed-up / Power savings EMicro 2013Elastic circuits61
62
wasted time computation time Rigid clock computation time Cycle period Elastic clock Clock elasticity EMicro 2013Elastic circuits62
63
Design Automation
64
Design automation paradigms Synthesis of asynchronous controllers – Logic synthesis from Petri nets or asynchronous FSMs Syntax-directed translation – Correct-by-construction composition of handshake components De-synchronization – Automatic transformation from synchronous to asynchronous EMicro 2013Elastic circuits64
65
Synthesis of asynchronous controllers EMicro 2013Elastic circuits Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle 65
66
Synthesis of asynchronous controllers EMicro 2013Elastic circuits LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller Signal Transition Graph 66
67
Synthesis of asynchronous controllers EMicro 2013Elastic circuits DTACK D DSr LDS LDTACK LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ Cortadella et al., Petrify 67
68
Syntax-directed translation EMicro 2013Elastic circuits int = type [0..255] & gcd: main proc (in? chan > & out! chan int) begin x, y: var int | forever do in? > ; do x <> y then if x < y then y:=y-x else x:=x-y fi od ; out!x od end Sources: J. Kessels and A. Peeters. DESCALE: A Design Experiment for a Smart Card Application Consuming Low Energy, in Principles of Asynchronous Circuit Design, A Systems Perspective, Eds., J. Sparso and S. Furber, Kluwer Academic Publishers, 2001. P.A.Beerel, R.O. Ozdag and M. Ferretti. A Designer’s Guide to Asynchronous VLSI, Cambridge University Press, 2010. 68
69
De-synchronization Strategy: substitute the clock tree by local clocks and handshakes Combinational logic and latches are not modified More tolerance to variability – Similar area, less power and/or more speed Cortadella, Kondratyev, Lavagno and Sotiriou. Desynchronization: Synthesis of asynchronous circuits from synchronous specifications. IEEE TCAD, Oct 2006. EMicro 2013Elastic circuits69
70
Synchronous operation EMicro 2013Elastic circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 70
71
Synchronous operation EMicro 2013Elastic circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 71
72
De-synchronization EMicro 2013Elastic circuits Transforming a synchronous circuit into asynchronous (automatically) 72
73
De-synchronization EMicro 2013Elastic circuits Transforming a synchronous circuit into asynchronous (automatically) 73
74
System-level de-synchronization EMicro 2013Elastic circuits74 CLK
75
System-level de-synchronization EMicro 2013Elastic circuits75
76
System-level de-synchronization EMicro 2013Elastic circuits76
77
System-level de-synchronization EMicro 2013Elastic circuits77
78
Synchronous elasticity
79
Different flavors of elasticity EMicro 2013Elastic circuits + 1 4 7 … 3 4 8 2 0 1 … …Rigid + e 4 8 … 1 4 7 … 2 0 1 … 3 Elastic 79 4 3 8 + s … 1 4 7 2 0 1 … Synchronous Elastic Carloni et al., Latency-insensitive systems.
80
Asynchronous elasticity req ack EMicro 2013Elastic circuits80
81
Synchronous elasticity valid stop Ring oscillator CLK PLLPLL EMicro 2013Elastic circuits81
82
Latch-based elasticity senderreceiver V V V V En Data Valid Stop Data Valid Stop EMicro 2013Elastic circuits82
83
Elastic netlists Fork Join Join / Fork EBEBEB EB Enable signal to data latches EMicro 2013Elastic circuits83
84
Variable Latency Units EMicro 2013Elastic circuits [0 - k] cycles done go clear 84 V/S
85
Globally-asynchronous Locally-synchronous GALS
86
SoC design with GALS Most IPs are synchronous Different components may have different operating frequencies Some components have variable latencies (e.g., cache hit/miss latency) Multiple clock domains are essential EMicro 2013Elastic circuits86 BridgeBridge CDCCDC DSPDSP PPPP PPPP Fast Bus Slow Bus BridgeBridge CDCCDCMemMem CLK2 CLK1 CLK3
87
Multiple clock domains EMicro 2013Elastic circuits CLK Single clock (mesochronous) f1/f0 f2/f0 f3/f0 CLK (f0) Rational clock frequencies CLK1 CLK2 CLK3 CLK0 Independent clocks (controllable skew) 87
88
Synchronous handshakes EMicro 2013Elastic circuits CLK1CLK2 DataData SenderSenderReceiverReceiver Valid Ack The arrival of data is unpredictable Handshakes solve the problem 88
89
The problem: metastability EMicro 2013Elastic circuits DQ ФTФT DQ ? D Q ФRФR ФRФR setup hold 89
90
How long does it take to resolve metastability? EMicro 2013Elastic circuits Metastability MTBF: Mean Time Between Failures 90
91
Classical synchronous solution EMicro 2013Elastic circuits DQDQDQDQ ФTФT ФRФR Mean Time Between Failures f Ф :frequency of the clock f D :frequency of the data t r :resolve time available W:metastability window :resolve time constant # FFs MTBF 1 FF 15 min 2 FF 9 days 3 FF 23 years Example 91
92
Handshake with synchronizers EMicro 2013Elastic circuits CLK1CLK2 DataData SenderSenderReceiverReceiver Valid Ack Simple solution Throughput can be highly degraded: a long round trip for every transaction 92
93
Asynchronous FIFOs EMicro 2013Elastic circuits Circular buffer Valid Ack Data Clk In Clk Out FIFO control Ack is issued as soon as data has been delivered No impact on throughput (1 token/cycle) Min latency determined by the internal synchronizers Some tricky structures for the FIFO pointers (e.g. Grey encoding) 93
94
SoC design with GALS EMicro 2013Elastic circuits BridgeBridge CDCCDC DSPDSP PPPP PPPP Fast Bus Slow Bus BridgeBridge CDCCDCMemMem CLK2 CLK1 CLK3 Bridges for Clock Domain Crossing usually contain asynchronous FIFOs Latency cost only when interfacing with synchronous domains No latency penalty between asynchronous domains 94
95
Conclusions Elasticity offers flexibility in time – Modularity – Dynamic adaptability – Tolerance to variability Better optimization of power/performance Why isn’t it an important trend in circuit design? – Lack of commercial EDA support (timing sign-off) – Designers do not feel comfortable with “unpredictable” timing – Other aspects: testing, verification, … De-synchronization might be a viable solution EMicro 2013Elastic circuits95
96
Bibliography Carmona, Cortadella, Kishinevsky and Taubin, Elastic Circuits, IEEE Trans. On CAD, Oct. 2009. Beerel, Ozdag and Ferreti, A Designer’s Guide to Asynchronous VLSI, Cambridge 2001. Sparso and Furber, Principles of Asynchronous Circuit Design: A Systems Perspective, Kluwer 2001. Myers, Asynchronous Circuit Design, John Wiley&Sons, 2001 EMicro 2013Elastic circuits96
97
EMicro 2013Elastic circuits97
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.