Download presentation
Presentation is loading. Please wait.
1
April, 2008 1 / 38 Network on Chip Advanced Topics in VLSI 1 - 049036 Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis – 308873140 Mentor: Ran Ginosar
2
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 2 / 38 Agenda Introduction Synchronous On-Chip Networks – Previous approaches survey Asynchronous On-Chip Networks – Previous approaches survey New approaches Conclusion – Sync – Async Bibliography
3
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 3 / 38 Introduction Highlights the wide range Sync / Async On chip networks Contrast different approaches Present new approaches
4
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 4 / 38 Synchronous On-Chip Networks
5
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 5 / 38 Generic On-Chip Router
6
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 6 / 38 Synchronous Router Pipeline Numerous stages of Router Pipeline – Raise communication latency – Can make packet buffers less effective – Incurs pipelining overheads
7
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 7 / 38 Speculative Router Architecture VC and switch allocation may be performed concurrently – Speculate that waiting packets will be successful in acquiring a VC – Prioritize non-speculative requests over speculative ones Li-Shiuan Peh and William J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers”, In Proceedings HPCA’01, 2001.
8
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 8 / 38 Single Cycle Speculative Router R. D. Mullins, A. West and S. W. Moore, “Low-Latency Virtual-Channel Routers for On-Chip Networks”, In Proceedings ISCA’04.
9
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 9 / 38 Single Cycle Speculative Router Single cycle router made possible by use of speculation Clock period is almost unchanged (compared to pipelined design) – Approx. 30 FO4 (simple standard-cell design) Presence of clock simplifies design – Arbitration Fast combinational matrix arbiters Can easily be extended to handle priority traffic etc. – Speculation Aided by the clear notion of a clock “cycle” Simple abort logic (abort detection and actual abort)
10
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 10 / 38 Synch architecture techniques (almost async) Router clocks derived from a single source Locally Generated Clocks (periodic & free-running) Synchronous Routers with Asynchronous Links Locally Clocked Routers / Asynchronous Interconnect (GALS style network) – Can support asynchronous interconnects No longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the link – GALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliably (value safe) Need to construct local delay line Local aperiodic clock generation Data-Driven Local Clock – Similarities to stoppable GALS interface and asynchronous priority arbiters
11
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 11 / 38 GALS – Clock Pausing Simple GALS interface (receiver) Note: Req/Ack uses 2-phase handshaking protocol
12
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 12 / 38 Synchronous Routers - Summary Can design high-performance single cycle routers Design is simplified by presence of global synchrony Distribution of global clock can be eased by – New clock generation / distribution techniques – Source synchronous communication Network operating frequency – Relax global synchrony further – Data-driven clocking determines most appropriate router clock frequency automatically
13
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 13 / 38 Asynchronous On-Chip Networks
14
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 14 / 38 Why are asynchronous NoCs interesting? No clock distribution, simple solution Networked IP blocks run at different clock frequencies – No synchronization issues at interfaces Ability to exploit data / path-dependent delays – Low-latency common or high-priority paths through router Freedom to optimize network links – Not constrained by need to distribute/generate multiple clock frequencies. Can exploit high-frequency narrow links – Dynamic latency/throughput trade-offs (adaptive pipeline depth) – Exploit dynamic optimizations on links (e.g. DVS) Easy to use interfaces, modularity, Robust and simple implementation, Reduced design time Some arguments for reduced power
15
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 15 / 38 Asynchronous Circuit Basics Control in asynchronous circuits often relies on simple handshaking protocols (req / ack event cycles) Delay-insensitive event-driven system - every signal transition is acknowledged The C-element is a fundamental building block of many asynchronous circuits – Can be thought of as a AND-gate for events Arbitration – Mutex – Tree arbiter element – Multiway arbiter – Static Priority Arbiters
16
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 16 / 38 Asynchronous on-chip networks How do we build more complex on-chip routers? – Support for virtual-channels – QoS Challenges – Multi-way & prioritised arbitration – Control overheads Arbitration and Delay Insensitive circuits can be slow! How can control overheads be hidden?
17
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 17 / 38 Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC E. Beigné, F. Clermidy, S. Miermont, and P. Vivet NOCS 2008
18
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 18 / 38 Outline of the paper Dynamic and Static power consumption issues NoC architecture for DVFS support NoC Unit architecture NoC Unit design DVFS execution at system level Power gain & physical implementation
19
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 19 / 38 Dynamic and Static power consumption issues Dynamic power consumption reduction – Reduce: switching activity, capacitances, supply voltage, frequency Static power consumption reduction – Reduce: supply voltage, dominant leakage currents: I STH, I GIDL, I GATE – Multi V TH design Low power techniques must exists at all design levels from architecture to physical implementation Proposed a fully integrated solution to control locally dynamic and Static power at Unit level within a GALS NoC
20
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 20 / 38 NoC architecture for DVFS (1) – NoC architecture A fully asynchronous Network-on-Chip IP units are synchronous islands using programmable Local Clock Generator Within the IP unit – Synchronization is done thanks to Pausable Clock – A Power Unit manages internal V core generated using external V high and V low – A Network Interface is in charge of NoC communications Local Power Management Main CPU in charge of global power management Local fine grain power management can be executed during IP computation and communication independently from the others
21
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 21 / 38 NoC architecture for DVFS (2) – Main Principles Each synchronous IP is an independent power and frequency domain A local fine grain Dynamic Voltage Scaling: – Implementation of a local hardware controller to control transitions between V high and V low – Ensures smooth DVS transitions for IP safe computation A local fine grain Dynamic Frequency Scaling: – Automatic frequency scaling – Use of clock generation re-programming to find the optimal V/F point of operation Thanks to pausable clock technique, IP unit continues its operation during DVFS phases GALS architecture and local clock generation is a natural enabler for easy local DVFS
22
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 22 / 38 NoC Unit architecture Each IP core encapsulated with – Network Interface – Test Wrapper – Pausable Clock – Power Supply Unit IP units have 5 supply modes – Init: reset at V high (1.2V) – High: V high supply – Low: V low supply (0.8V) – Hopping: switch V high / V low for DVFS – Idle: retention state at V low (no clock) – Off: stand-by mode
23
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 23 / 38 NoC Unit design (1) - Local Power Manager Local Power Manager handles unit power modes A set of programmable registers, through the NoC Configuration of – Programmable delay line – Power Supply Unit Pulse Width modulator used to control the Hopping mode
24
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 24 / 38 NoC Unit design (2) – Power Supply Unit Power Supply Unit manages Vcore Two power switches Thigh and Tlow LVT transistors A Hopping Unit An Ultra Cut-Off Generator Local Power Supply Unit offers a safe control of internal power supply depending on pre-defined power modes
25
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 25 / 38 NoC Unit design (3) – Hopping Unit Energy per operation scales with V² – Decrease Voltage (and Frequency) to be energy efficient «Triple state» power supply – Use of two PMOS power switches Vhigh (1.2 V), Vlow (0.7 V), or OFF (0 V) Switch between Vhigh and Vlow – Transitions take less than 100 ns – Mean speed / mean power of the IP is programmed by a PWM Compatible with synchronous and asynchronous IPs – For GALS system: coordination done with local clock generator Can easily be integrated in any CMOS circuit – No inductor contrary to traditional DC/DC converters – No capacitor contrary to charge pump implementation
26
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 26 / 38 NoC Unit design (4) – Ultra Cut-Off Generator When reverse polarizing the gate, the leakage current goes through a minimum The optimal polarization point varies with the temperature, the supply voltage and the process corners The proposed UCO generator automatically polarizes the gate of the Power switch to its point of minimum leakage Compensates for temperature variation, alleviates corners variations. The gate oxide reliability is considered by introducing a passive stress reduction mechanism
27
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 27 / 38 NoC Unit design (5) – Pausable Clock Interface Pause temporary the clock when a transfer (NoC) or a supply switch is required Based on – Two GALS ports : Synchronous-to Asynchronous and Asynchronous-to-Synchronous – A programmable delay line – A pausable clock generator Pausable Clock Generator arbitrates pause requests
28
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 28 / 38 NoC Unit design (5) – Pausable Clock Interface Programmable delay line – Precise, small and low power – Using Standard cells – On the same unit power domain Pausable Clock Interface allows an efficient synchronization and a safe dynamic voltage and frequency scaling with minimal latency cost
29
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 29 / 38 Power gain Programmable delay line matches with unit logic on the same power domain – Compensates any mismatch thanks to re-programmation Power reduction – V high =1.2V and V low =0.8V – 35 % dynamic power reduction between High and Low modes – Hopping mode is used to save power without any latency cost – Leakage power thanks to UCO is reduced by 2 decade Power Supply Unit efficiency – Hopping Unit Only resistive losses in the power transistors About 1 mW dynamic power – => more than 95 % power efficiency – 90 % total efficiency (external DC-DC taken into account) An adaptive and reliable Power Supply Unit giving high power reduction factor and high power efficiency
30
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 30 / 38 Physical Implementation Power Switch – One single Power-Switch for the complete power domain – Sized to get a speed loss<5% – Area : about <5% of the power domain Hopping Unit – Area : 140μm*35μm – Hopping Transition : <100 ns
31
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 31 / 38 What we see in recent work ? A fully integrated DVFS architecture within a GALS NoC We are able to handle leakage problems due to technology scaling – insertion of power switches Dynamic power reduction is possible through voltage scaling: – Hopping – Management of multi power domains in a complex SoC The knowledge of GALS systems lead to automatic frequency scaling – Pausable clock interfaces – Asynchronous implementation
32
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 32 / 38 Synchronous or Asynchronous NoCs ?
33
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 33 / 38 Comparing Approaches Published works on asynchronous routers and networks are not enough, lack of detailed explanation, simulation are difficult and almost impossible Single latency / throughput figures don’t tell whole story Detailed comparative studies with real traffic are required Often difficult to isolate impact of choice of system timing style, many things tend to be different: – Technology, circuit style, architecture Problems in comparing synchronous and asynchronous designs
34
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 34 / 38 Questions about Asynchronous design? Testing asynchronous circuits – An asynchronous circuit replaces the clock with a large number of distributed state holding elements – Large area overhead associated with test – Testing of non-deterministic elements (MUTEX) Performance – ““Asynchronous circuits avoid issues of timing closure, they are correct- by-construction” – But performance guarantees are still required. Slow synchronous circuits are easy to build! – Value safe versus time safe – Predicting performance is complex Perhaps on-chip communication is an application where such characteristics can be tolerated?
35
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 35 / 38 Synchronous or Asynchronous? A clock less on-chip network appears to be an elegant solution although some questions remain: – Test – Performance concerns Shouldn’t asynchronous designs offer latency advantages? – Fast local control, path/data dependent delays, DI interconnects Perhaps asynchronous routers mimic synchronous architectures too closely? – Exploit flexibility, novel architectures, different topologies Overheads for data-driven clocking or GALS currently look small in comparison Synchronous design has advantages too – Predictability and determinism can be exploited Fast single cycle routers possible – Global snapshot of state is good for scheduling Still lots of interesting research to be done – Need more data points
36
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 36 / 38 Conclusions High cost associated with both global synchrony and delay-insensitive circuits – Can relax constraints in both directions Which techniques achieve the best cost/benefit mix for on-chip networks? – Data-driven clocks look promising ? SYNCHRONOUS ASYNCHRONOUS
37
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 37 / 38 Bibliography R. Mullins, “Asynchronous versus synchronous design techniques for NoCs,” Invited lecture given at SoC 2005, part of a tutorial entitled “The status of the NoC revolution: design methods, architectures and silicon implementation,” Tampere, Finland, November 2005. E. Beigné, F. Clermidy, S. Miermont, and P. Vivet, “Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC,” NOCS 2008, Apr. 2008. E. Beigne and P. Vivet, “Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture”, Proceedings of IEEE International Symposium on Advanced Research in Asynchronous Circuits and Systems, ASYNC'2006, Grenoble, France, pp. 172-181, March 2006.
38
April, 2008Advanced Topics in VLSI 1 - 049036, Network on Chip 38 / 38 Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.