Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036
Introduction Problem Definition Proposed Solution Related Approaches NoC Implementation Alternatives Fully asynchronous Multi-synchronous (GALS) Synchronous Proposed Solution Systematic Comparison between Different Strategies Silicon Area Network Saturation Threshold Communication Throughput Packet Latency Power Consumption Implementation Flexibility and Tools Related Approaches I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, “Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture”, NoCs 2008
Synchronous Router Router Pipeline may include many stages VCA SA Router Data path LINK LINK Router Pipeline may include many stages Increases communication latency Router Pipeline may be optimized to single cycle router Possible by use of speculation Clock period same as pipeline router Presence of clock simplify design Standard libraries and tools Speculative Control Signals A. Kumar, P. Kundu, A. Singh, L. Peh and N. Jha , "A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator", International Conference on Computer Design (ICCD), October, 2007.
Limitations of Fully-Synchronous Networks Difficult to distribute clock Network spread over die & may have irregular layout Minimising skew costs complexity and power Solution: Alternatives/extensions to PLL and H-tree Single Network Clock Frequency Communicating synchronous IP blocks with different frequencies What is most appropriate network clock frequency? Problem: Clock Distribution and Frequency Selection Solution: Beyond a Single Global Clock
Synchronous Routers with Asynchronous Links (GALS) Asynchronous FIFO Synchronization is simple Traditional 2 FF synchronizers Can support asynchronous interconnects No longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the link GALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliably Need to construct local delay line Connect Frequency Independent Routers
Asynchronous NoCs Simple/elegant solution when networked IP blocks run at different clock frequencies Data driven, no superfluous switching activity No synchronization/clock alignment issues at interfaces Solves synchronization, clock domain crossings, timing, long connects No clock distribution issues Security and EMI advantages Clock focuses EM emissions The presence of a clock can also aid fault-induction and side-channel analysis attacks Reduced design time Easy to use interfaces, modularity Robust and simple implementation Reduced power But network latency significantly increased
Asynchronous NoCs Approaches “An Asynchronous Router for Multiple Service Levels Networks on Chip”, R. Dobkin et al, ASYNC’05. (QNoC Group) MANGO Clockless Network-on-Chip “A Scheduling Discipline for Latency and Bandwidth Guarantees in Asynchronous Network-on-Chip”, T. Bjerregaard and J. Spars, ASYNC’05. “A router Architecture for Connection-Orientated Service Guarantees in the MANGO Clockless Network-on-Chip”, T. Bjerregaard and J. Spars, DATE’05 R. Dobkin Provide Synchronous versus Asynchronous Router Study
Synchronous or Asynchronous NoCs? “Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture” I. Miro-Panades, F. Clermidy, P. Vivet and A. Greiner NoCs 2008
Motivation Physically implement the DSPIN NoC into the FAUST application platform Compare the performances between ANOC and DSPIN on a real application and traffic Silicon Area Throughput Packet Latency Power Consumption
FAUST Architecture with ANOC Asynchronous NoC (ANOC) QDI 4-phase/4-rail asynchronous logic 20 Routers 5 port router Source routing Wormhole packet switch 32 bit payload GALS Conception 24 independent clocks FIFO based Interface Hard-macro approach for ANOC reuse
DSPIN Architecture Packet Based Distributed Router Architecture Suited for GALS Approach Mesochronouse links between routers Metastability Resolved by “bi-synchronous” FIFO Synthesizable with Standard Cells
DSPIN Clock Tree Mesochronous Link between Neighbor Routers
NoC Architecture Comparison Both implementation use GALS principles
Network Comparison DSPIN Power Issues Parameter ANOC DSPIN Implementation Hard-Macro Soft-Macro Area 0.281 mm² 0.187 mm² Throughout (worst case conditions( ~ 160Mflit/s ≤289Mflit/s (nominal conditions) ~ 220Mflit/s ≤408Mflit/s Power Consumption (F=150MHz) 3.69mW 5.89mW Power Consumption (F=250MHz) 10.39mW DSPIN throughput is deterministic with respect to the clock frequency DSPIN Power Issues Power consumption mainly dominated by FIFO data registers The DSPIN clock-gating reduced the power consumption by 67% DSPIN clock-tree Consumes as much Power as the Router Itself
Network Comparison - Latency Flit Path ANOC DSPIN F=150MHz F=250MHz Intermediate Router Latency 6.80 ns 16.66 ns 10.00 ns First + Last Router Latency 60.00 ns 56.66 ns 47.00 ns 34.00 ns Latency for 5 hops path 80.00 ns 106.66 ns 68.00 ns 64.00 ns Latency for 9 hops path 173.30 ns 96.00 ns 104.00 ns DSPIN routers resynchronize the data packets DSPIN should be clocked to 367MHz DSPIN Router is IP Data Locality Aware
Conclusion Little published work on asynchronous routers and networks Comparing synchronous and asynchronous designs are difficult System timing style Technology Circuit style and architecture Difficult to reproduce and simulate asynchronous designs from published work No notion of cycle-accurate model Hide detailed control and datapath delays Asynchronous Performance Guarantees Performance guarantees are required Less predictable, non-deterministic Predicting performance is more complex Asynchronous EDA Tool Requirements Synchronous Routers Predictability and determinism can be exploited Fast single cycle routers possible ANoC for Low Power & SNoC for Small Area
Thank You!!!