Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology.

Similar presentations


Presentation on theme: "Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology."— Presentation transcript:

1 Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

2 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 2 Outline Global synchrony & clock distribution Global synchrony & clock distribution types of synchrony types of synchrony The GALS approach The GALS approach communication communication synchronization synchronization Muller C-Element, Mutex & Arbiter Muller C-Element, Mutex & Arbiter data driven clock & pausable clock data driven clock & pausable clock TMR example with pausible clock TMR example with pausible clock

3 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 3 Even/Odd Synchronizer works for two periodic clocks only with frequency ratio within certain range works for two periodic clocks only with frequency ratio within certain range avoids performance penalty of synchronizers avoids performance penalty of synchronizers largely eliminates potential for metastability largely eliminates potential for metastability for details see [Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010] for details see [Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010]

4 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 4 Types of Synchrony synchronous synchronous identical frequency, constant phase relation identical frequency, constant phase relation classical synchronous system driven by one clock source classical synchronous system driven by one clock source mesochronous = multisynchronous mesochronous = multisynchronous identical frequency (no accumulating drift) but unknown maybe varying phase relationship (bounded) identical frequency (no accumulating drift) but unknown maybe varying phase relationship (bounded) example: different PLLs driven by the same source example: different PLLs driven by the same source plesiochronous plesiochronous same nominal clock frequency, mutual (low) drift same nominal clock frequency, mutual (low) drift independent clock sources with same nominal frequency independent clock sources with same nominal frequency heterochronous = multisynchronous heterochronous = multisynchronous clocks totally unrelated clocks totally unrelated independent clock sources with different nominal frequency independent clock sources with different nominal frequency

5 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 5 Global Synchrony? Problem 1: Clock distribution Problem 1: Clock distribution Low-skew clock distribution becomes difficult for large chips and high frequencies Low-skew clock distribution becomes difficult for large chips and high frequencies Clock networks consume a considerable share of the power Clock networks consume a considerable share of the power Problem 2: Clock selection Problem 2: Clock selection SoC contains many IPs, each specified for its own frequency SoC contains many IPs, each specified for its own frequency specific frequencies required for some functions (interface standards, e.g.) specific frequencies required for some functions (interface standards, e.g.) dynamic local changes due to voltage & frequency scaling, clock & power gating dynamic local changes due to voltage & frequency scaling, clock & power gating

6 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 6 Clock Distribution TRG src TRG snk t CO t pd t CO valid valid valid synchronous approach: clock skew 1 setup violation *

7 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 7 Clock Distribution TRG src TRG snk t CO t pd t CO valid alid alid synchronous approach: clock skew 2 hold violation *

8 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 8 Clock Distribution TRG src t CO t pd valid valid asynchronous approach: REQ delay REQ completion detection ACK TRG src TRG snk t CO valid ACK *

9 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 9 Clock Distribution TRG src t CO t pd valid valid asynchronous approach: ACK delay REQ completion detection ACK TRG snk ACK TRG src t CO valid *

10 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 10 Clock Distribution TRG src t CO t pd alid alid asynchronous approach: data delay ACK REQ completion detection TRG src TRG snk t CO valid ACK *

11 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 11 The GALS Approach SoC is clearly structured into IPs anyway SoC is clearly structured into IPs anyway run each at its desired individual frequency => synchronous islands run each at its desired individual frequency => synchronous islands efficient, well understood efficient, well understood communication between IPs communication between IPs has to bridge clock boundaries has to bridge clock boundaries may run over larger distances may run over larger distances => asynchronous paradigm (handshake- based) better suited for composition => asynchronous paradigm (handshake- based) better suited for composition Globally Asynchronous Locally Synchronous (GALS) First mention in PhD thesis by Chapiro / Stanford 84

12 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 12 A GALS Example CPU 2GHz PCI-IF 533MHz DSP 2,7GHz USB-IF 24MHz

13 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 13 Communication in GALS Shared Memory Shared Memory producer writes to memory, consumer reads from there pro: control flow stays independent shared single-port memory shared single-port memory true dual-port memory true dual-port memory Direct Messages (Data words) Direct Messages (Data words) move data word from producer‘s output register to consumer‘s input register non-buffered / buffered (FIFO-queues) non-buffered / buffered (FIFO-queues) clock fixed, data-driven or pausible clock fixed, data-driven or pausible

14 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 14 Shared Memory perfect decoupling of data path perfect decoupling of data path potential metastability problems at arbitration logic potential metastability problems at arbitration logic potential blocking through arbitration potential blocking through arbitration CPU 2GHz shared memory Arbi- tration 0xff14 DSP 2,7GHz *

15 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 15 Shared Memory decoupling of clock domains by memory acting as a third party => high area overhead => unusual decoupling of clock domains by memory acting as a third party => high area overhead => unusual memory must be asynchronous, otherwise direct message model applies (producer => memory and memory => consumer) memory must be asynchronous, otherwise direct message model applies (producer => memory and memory => consumer) for single port memory arbitration required for single port memory arbitration required arbitration problem (unbounded delay…) arbitration problem (unbounded delay…) one side may block the other at the arbiter one side may block the other at the arbiter for multiport memory problems are confined to access to the same cell for multiport memory problems are confined to access to the same cell busy flag may become metastable busy flag may become metastable blocking still possible for one specific address blocking still possible for one specific address

16 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 16 Direct Messages data moving over clock domain boundary data moving over clock domain boundary metastability problems metastability problems => need to insert handshake => need to insert handshake …with synchronizers …with synchronizers and (optional) buffers and (optional) buffersS0xff14 CPU 2GHz DSP 2,7GHz S *

17 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 17 Direct Messages clock domain boundary is between producer‘s output register and consumer‘s input register clock domain boundary is between producer‘s output register and consumer‘s input register in general a synchronizer is needed at consumer‘s input in general a synchronizer is needed at consumer‘s input definitely for conventional (fixed) clock definitely for conventional (fixed) clock can be avoided by data-driven / pausible clocking can be avoided by data-driven / pausible clocking control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks the other party control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks the other party buffers/queues/FIFOs can buffers/queues/FIFOs can mitigate, but not avoid this problem (full/empty) mitigate, but not avoid this problem (full/empty) compensate variations in the data rate on both sides, but not different average data rates compensate variations in the data rate on both sides, but not different average data rates

18 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 18 Muller C-Element RS reset set a b y IF a = b THEN y = a ELSE hold y C ab y C a b y David Eugene Muller (1924 – 2008), Professor at Univ. of Illinois: Muller, D. E.; Bartky, W. S. (1959), "A Theory of Asynchronous Circuits", Proc. Int'l Symp. Theory of Switching, Part 1 (Harvard Univ. Press): 204–243

19 Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 19 Muller C-Element: Circuit [Sutherland] [Martin] [van Berkel]

20 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 20 Mutual Exclusion purpose: purpose: decide order of asynchronous events decide order of asynchronous events function: function: handle pairs of request_in / grant_out handle pairs of request_in / grant_out requests may arrive in any order requests may arrive in any order MUTEX must activate only one grant_out at a time (respond to the first requester) MUTEX must activate only one grant_out at a time (respond to the first requester) problem: problem: resolve concurrent requests => metastability problem resolve concurrent requests => metastability problem r1 r2 g1 g2

21 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 21 MUTEX: Circuit SR-latch g1’ g2’ r1 r2 g1 g2 „Metastability filter“: e.g., lo-threshold inverter [from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley] V out,latch t V th,inv V meta BUT: Doesn’t a lo-threshold inverter produce glitches? * V out,inv

22 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 22 MUTEX: Operation g1’ g2’ r1 r2 g1 g2 V out,FF t V th,inv V meta r1 g1 r2 g2 4-phase protocol

23 MUTEX vs. Synchronizer Synchronizer Synchronizer purpose: purpose: important: important: freedom: freedom: circuit: circuit: MUTEX MUTEX purpose: purpose: important: important: freedom: freedom: circuit: circuit: Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 23 synchronize asynchronous input serialize concurrent requests fast resolution in both directions never activate both grants final decision not important infinite resolution time flip flop (special design) SR-latch plus metastability filter *

24 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 24 Arbiter: Principle purpose: purpose: manage access of clients to shared ressource(s) manage access of clients to shared ressource(s) method: method: handle pairs of request_in / grant_out handle pairs of request_in / grant_out on the client side on the client side on the ressource side on the ressource side client requests may arrive in any order client requests may arrive in any order arbiter must assign one ressource to only one client at a time (respond to the first requester) arbiter must assign one ressource to only one client at a time (respond to the first requester) => needs Mutual Exclusion (MUTEX)

25 Arbiter: Function Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 25 C1r C2r C1g C2g R1g R1r Client 1 Client 2 Common Resource 1 can have more than two clients: “multiway arbiter” can have more than one resource *

26 Arbiter: Operation Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 26 C1r R1r C2r R1g C1g C2g *

27 Arbiter: Circuit Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 27 MUTEX client 1 client 2 Common Resource R1g R1r C C r1 r2 g1 g2 C1r C2r C1g C2g allow one request at a time only delay request until previous cycle finished merge requests relay grant to requester keep grant alive until resource disables it *

28 Tree Arbiter Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 28 C1r C2r C1g C2g R1g R1r Client 1 Client 2 Common Resource can add further tree levels to handle more clients C1r C2r C1g C2g R1g R1r C1r C2r C1g C2g R1g R1r Client 3 Client 4

29 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 29 Data-Driven Clocking Principle: Principle: as soon as new data arrive => start clocking as soon as new data arrive => start clocking determine number k of clock cycles required to process new data determine number k of clock cycles required to process new data stop clocking after k cycles, wait for next data stop clocking after k cycles, wait for next data Properties: Properties: need to switch clock on and off => beware spurious clock pulses! need to switch clock on and off => beware spurious clock pulses! no metastability problem: data stable as soon as consumer clock starts no metastability problem: data stable as soon as consumer clock starts potential for power saving potential for power saving useful for very specific applications only (no pipe!) useful for very specific applications only (no pipe!)

30 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 30 Data-Driven Clock: Circuit CLK out  CLK half period deter- mined by  CLK half period deter- mined by  

31 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 31 Data-Driven Clock: Circuit  C REQ ACK CLK out REQ ACK transition on REQ answered by transition on CLK out transition on REQ answered by transition on CLK out min CLK half period deter- mined by  min CLK half period deter- mined by  CLK out  metastability? *

32 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 32 Pausable Clocking Principle: Principle: producer requests consumer‘s clock to pause producer requests consumer‘s clock to pause data are provided to input register during idle time data are provided to input register during idle time consumer‘s clock may resume consumer‘s clock may resume free running („pausable clock“) free running („pausable clock“) with one cycle only („stoppable clock“) with one cycle only („stoppable clock“) Properties: Properties: need to switch clock on and off => beware spurious clock pulses! => beware of clock tree delays! need to switch clock on and off => beware spurious clock pulses! => beware of clock tree delays! producer controls consumer‘s clock (blocking!) producer controls consumer‘s clock (blocking!) applications must be able to cope with paused clock applications must be able to cope with paused clock

33 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 33 Pausable Clock: Circuit  C REQ ACK CLK out REQ ACK inverter generates next REQ from ACK inverter generates next REQ from ACK self-oscillation self-oscillation CLK out  *

34 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 34 Pausable Clock: Circuit  C REQ’ ACK’ external unit can safely stop CLK by activating REQ’ external unit can safely stop CLK by activating REQ’ … and gets ACK’ as a response … and gets ACK’ as a response CLK out REQ’ ACK’ Mu- tex  metastability? *

35 Lecture "Advanced Digital Design"© A. Steininger / TU Vienna 35 Pausable Clock: n Clients  C REQ1 ACK1 CLK out Mu- tex for more external sources an arbiter can be added before the Mutex for more external sources an arbiter can be added before the Mutex the two inverters can be eliminated by using a Muller C-Element with inverting output the two inverters can be eliminated by using a Muller C-Element with inverting output Arb- iter REQ2 ACK2 * R. Mullins and S. Moore “Demystifying Data-Driven and Pausible Clocking Schemes”, Proc. 13th Intl. Symp. on Advanced Research in Asynchronous Circuits and Systems (ASYNC), 2007 pp. 175–185

36 Pausable Clock in GALS Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 36 SRC SNK REQ ACK pausable clock stop receiver clock receiver clock is stopped data can be safely applied when stable release receiver clock

37 Conventional TMR Advantages: Advantages: mask all single faults mask all single faults Drawbacks: Drawbacks: single clock source single clock source no recovery no recovery Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 37

38 GALS-TMR Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 38 From J. Lechner, Designing Robust GALS Circuits with Triple Modular Redundancy, Proc. 9 th European Dependable Computing Conference 2012, IEEE CS press, pp. 122-131. use independent clock => avoid single point of failure use independent clock => avoid single point of failure cannot do concurrent voting, since operation not in sync cannot do concurrent voting, since operation not in sync use voting over FF state at predefined intervals instead use voting over FF state at predefined intervals instead

39 GALS-TMR Details every nth clock cycle every nth clock cycle stop own clock stop own clock synchronize with others synchronize with others perform recovery step perform recovery step Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 39

40 Pausable Clock vs. Crystal pros: pros: cheap to implement internally cheap to implement internally no extra pins no extra pins no mechanical issues (acceleration) no mechanical issues (acceleration) stoppable stoppable cons: cons: arbiter is no standard cell arbiter is no standard cell frequency is not as stable (PVT) frequency is not as stable (PVT) frequency is not as high frequency is not as high lacking tool support lacking tool support Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 40

41 Summary (1) The generally used MTBU formula does not assume any knowledge about the input signal and its relation to the clock. In practice, such knowledge can often be exploited to optimize the synchronizer. The generally used MTBU formula does not assume any knowledge about the input signal and its relation to the clock. In practice, such knowledge can often be exploited to optimize the synchronizer. Synchrony is not a binary property, there is a range of globally synchronous, mesochronous, plesiochronous and heterochronous systems. Synchrony is not a binary property, there is a range of globally synchronous, mesochronous, plesiochronous and heterochronous systems. Asynchronous systems are tolerant against delays, while synchronous systems are not. The GALS approach thereofre makes long-term communication asynchronous, while retaining the efficient and well proven synchronous paradigm for locally restricted islands. Asynchronous systems are tolerant against delays, while synchronous systems are not. The GALS approach thereofre makes long-term communication asynchronous, while retaining the efficient and well proven synchronous paradigm for locally restricted islands. Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 41

42 Summary (2) GALS allows choosing the most appropriate clock for each island. GALS allows choosing the most appropriate clock for each island. Communication in GALS can be either message based or via shared memory. Communication in GALS can be either message based or via shared memory. For the message based solution handshake signals must be employed and properly synchronized. One party may easily block the other one. For the message based solution handshake signals must be employed and properly synchronized. One party may easily block the other one. For the shared memory either an arbitration scheme or a truly dual ported memory is needed, and the busy flag may require synchronization. For the shared memory either an arbitration scheme or a truly dual ported memory is needed, and the busy flag may require synchronization. For the Muller C-Element, if both inputs match the output will assume the same value. For the Muller C-Element, if both inputs match the output will assume the same value. Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 42

43 Summary (3) The purpose of a MUTEX element is to select one among two (or more) possibly concurrent client requests. It may remain undecided for an arbitrary time, but never select more than one clients. The purpose of a MUTEX element is to select one among two (or more) possibly concurrent client requests. It may remain undecided for an arbitrary time, but never select more than one clients. The purpose of an arbiter is to grant access to one (or more) resource(s) shared between two (or more) clients. Again access must be granted to one client at a time only. The purpose of an arbiter is to grant access to one (or more) resource(s) shared between two (or more) clients. Again access must be granted to one client at a time only. A data driven clock is activated on demand only when data arrives to be processed. A data driven clock is activated on demand only when data arrives to be processed. A pausable clock can be stopped on demand. This is useful in GALS when moving data from one domain to the other, as it confines the potential for metastability to the arbiter. A pausable clock can be stopped on demand. This is useful in GALS when moving data from one domain to the other, as it confines the potential for metastability to the arbiter. Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 43

44 Summary (4) In GALS a communication can be safely based on pausable clocks. In GALS a communication can be safely based on pausable clocks. Even a fault-tolerant TMR solution can be implemented that avoids the clock source as a single point of failure. Even a fault-tolerant TMR solution can be implemented that avoids the clock source as a single point of failure. Lecture "Advanced Digital Design"© A. Steininger & M. Delvai / TU Vienna 44


Download ppt "Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology."

Similar presentations


Ads by Google