Asynchronous vs. Synchronous Design Techniques for NoCs Robert Mullins “The Status of the Network-on-Chip Revolution: Design Methods, Architectures and.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 10 – Circuit Switching and Packet Switching.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
Asynchronous vs. Synchronous Network-on-Chip
The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Tutorial presented at 18 th UK Asynchronous Forum Newcastle, September 2006.
COMP Clockless Logic and Silicon Compilers Lecture 3
Lecture 8: Clock Distribution, PLL & DLL
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
April, / 38 Network on Chip Advanced Topics in VLSI Asynchronous vs. Synchronous Design Techniques for NoCs Presented by: Alex Rekhelis.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007,
EE 4272Spring, 2003 Chapter 9: Circuit Switching Switching Networks Circuit-Switching Networks Circuit-Switching Concept  Space-Division Switching  Time-Division.
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation.
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
Communication-Centric Design Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge (University of Twente, December 11.
Communication-Centric Design Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge Workshop on On- and Off-Chip Interconnection.
Network-on-Chip Links and Implementation Issues System-on-Chip Group, CSE-IMM, DTU.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
A Distributed Scheduling Algorithm for Real-time (D-SAR) Industrial Wireless Sensor and Actuator Networks By Kiana Karimpour.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.
Computer Networks with Internet Technology William Stallings
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Reading Assignment: Rabaey: Chapter 9
Lecture 16: Router Design
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Interconnection Structures
Chapter #6: Sequential Logic Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Israel Cidon, Ran Ginosar and Avinoam Kolodny
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Clockless Logic: Asynchronous Pipelines
Presentation transcript:

Asynchronous vs. Synchronous Design Techniques for NoCs Robert Mullins “The Status of the Network-on-Chip Revolution: Design Methods, Architectures and Silicon Implementation”, (Tutorial) International Symposium on System-on-Chip, Tampere, Finland. November 14 th, 2005.

2/67 Aims of Tutorial  Highlight the wide range of system timing alternatives for NoCs  Discuss the impact of the choice of timing regime on the architecture of NoC routers  Contrast different approaches

3/67 Synchronous to Delay-Insensitive Approaches to System Timing Synchronous Delay Insensitive GlobalNone Timing Assumptions Local Relative Wire Delay Less Detection Sub-System Local Isochronic Forks Multiple clocks Pausible clocks and locally triggered clock pulses Bundled Data Quasi-Delay Insensitive Local Clocks/ Interaction with data (becoming aperiodic)

4/67 System Timing Approaches to system timing are distinguished by what delay assumptions they make A number of different approaches to system timing may also be combined: –Globally-Asynchronous Locally-Synchronous (GALS) e.g. Synchronous IP interconnected by an asynchronous network

Synchronous On-Chip Networks

6/67 Generic On-Chip Router

7/67 Synchronous Router Pipeline Router Pipeline may be many stages –Increases communication latency –Can make packet buffers less effective –Incurs pipelining overheads

8/67 Speculative Router Architecture VC and switch allocation may be performed concurrently: –Speculate that waiting packets will be successful in acquiring a VC –Prioritize non-speculative requests over speculative ones Li-Shiuan Peh and William J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers”, In Proceedings HPCA’01, 2001.

9/67 Single Cycle Speculative Router R. D. Mullins, A. West and S. W. Moore, “Low-Latency Virtual-Channel Routers for On-Chip Networks”, In Proceedings ISCA’04.

10/67 Single Cycle Speculative Router Single cycle router made possible by use of speculation Clock period is almost unchanged (compared to pipelined design) –Approx. 30 FO4 (simple standard-cell design) Presence of clock simplifies design –Arbitration Fast combinational matrix arbiters Can easily be extended to handle priority traffic etc. –Speculation Aided by the clear notion of a clock “cycle” Simple abort logic (abort detection and actual abort)

11/67 Single Cycle Speculative Router Lochside Chip (2004) 4x4 mesh network, 25mm 2 Single Cycle Routers (router + link = 1 clock) –Low common case latency 4 virtual-channels/input 80-bit links –64-bit data + 16-bit control 250MHz (worst-case PVT) 16Gb/s/channel, 0.18um. TILE Traffic Generator, Debug & Test R R. D. Mullins, A. West and S. W. Moore, “The design and implementation of a low- latency on-chip network”, In Proceedings ASP-DAC’06

Beyond a Single Global Clock

13/67 Limitations of Fully-Synchronous Networks 1. Difficult to distribute clock –Network spread over die & may have irregular layout –Minimising skew costs complexity and power Alternatives/extensions to PLL and H-tree: –Clock deskewing techniques –Distributed Clock Generator (DCG). –Distributed PLLs –Standing-wave oscillators and rotary clock schemes –Resonant global clocks, optical clock distribution etc.

14/67 Limitations of Fully-Synchronous Networks 2. Single Network Clock Frequency –Communicating synchronous IP blocks may operate at different and potentially adaptive clock frequencies –What is most appropriate network clock frequency? We don’t want to have to generate and distribute a very high frequency clock in order to emulate an asynchronous network

15/67 Frequency Distribution Clock skew may force the system to be partitioned into multiple clock domains Can exploit the fact that only the phase of each router’s clock differs, simple error-free clock-domain crossing possible (single clock source)

16/67 Router clocks derived from a single source Each router’s clock may be generated from the global network clock, either by: –Clock division or –Clock multiplication Clock domain crossing techniques can exploit known clock frequency relationships Chakraborty and M. Greenstreet, “Efficient Self-Timed Interfaces for Crossing Clock Domains”, In Proceedings ASYNC’03 L. F. G. Sarmenta, G. A. Pratt and S. A. Ward, “Rational Clocking”, ICCD’95

17/67 Locally Generated Clocks (periodic & free-running) Can exploit knowledge about clocks (when crossing clock domains) even if all we know is that they are periodic, examples: –predictive synchronizers [Dally][Frank/Ginosar] –asynchronous FIFOs [Chakraborty/Greenstreet]

18/67 Synchronous Routers with Asynchronous Links Synchronization: –Time Safe: e.g. Traditional 2 FF synchronizers –Value Safe: Clock Pausing/Data-driven clocks

19/67 Locally Clocked Routers/Asynchronous Interconnect (GALS style network) Can support asynchronous interconnects –No longer exploiting periodic nature of router clocks –Correct operation is independent of the delay of the link GALS interfaces with pausible clocks –If necessary clock is stretched, data is always transferred reliably (value safe) –Need to construct local delay line

20/67 GALS – Clock Pausing Simple GALS interface (receiver) Note: Req/Ack uses 2-phase handshaking protocol

21/67 GALS – Multiple Inputs Clock is free running (although it can be paused) It is the clock that really determines if asynchronous data is transferred into the synchronous clock domain on a particular cycle Impact on performance in on-chip network requiring multiple input data/control ports?

22/67 GALS – Stoppable Clock

23/67 Local aperiodic clock generation Discard free-running clock but retain a single delay assumption for router Options for clock pulse generation: 1.Use stoppable GALS interface and attempt to stop every cycle – overheads? 2.Wait for data/null-data from all neighbours before generating pulse (global synchrony!) 3.Data driven clock 4.Traditional asynchronous bundled-data approach (with a single delay assumption for whole router) Can still exploit synchronous router implementation

24/67 Data-Driven Local Clock Idea: –If data at any input, sample all inputs –Determine which inputs are to be admitted on next clock cycle (requires MUTEX) –Ensure data that is not admitted is ‘locked out’ for next clock cycle –After all MUTEXes have made a decision (and never faster than the delay line!) generate a clock pulse Similarities to stoppable GALS interface and asynchronous priority arbiters

25/67 Data-Driven Clock Waveform

26/67 Data-Driven Clock Waveform Imagine data from two packets arriving at a single router node at different rates An aperiodic clock may be generated to minimise latency and power Minimum clock period set by delay line Value safe synchronization (no chance data is ever lost)

27/67 Data-Driven Local Clock Updated: June 2006 May be generalized to n- input ports. Only the control interfaces are shown here (r1,a2 and r2,a2) grantn is simply used to control the latching of data at each input port (register enable)

28/67 Data-Driven Local Clock Simple implementation shown (work in progress) –Some small timing constraints –Performance tweaks possible Possible Extensions –Force synchronization on subset of inputs Some inputs must be present for clock to be generated –Generate additional clock pulses to handle pipelining Counter & clock driven lock signal –Select a different clock period (delay line) depending on which inputs have been granted Data-dependent clock period See also: M. Krstic and E. Grass, “New GALS Technique for Datapath Architectures”, PATMOS (and ASYNC’05 paper)

29/67 Clocking alternatives for Synchronous Routers

30/67 Synchronous Routers - Summary Can design high-performance single cycle routers Design is simplified by presence of global synchrony Distribution of global clock can be eased by: –New clock generation/distribution techniques –Source synchronous communication Network operating frequency –Relax global synchrony further –Data-driven clocking determines most appropriate router clock frequency automatically

Asynchronous On-Chip Networks

32/67 Why are asynchronous NoCs interesting? Simple/elegant solution when networked IP blocks run at different clock frequencies –Data driven, no superfluous switching activity –No synchronization/clock alignment issues at interfaces Ability to exploit data/path-dependent delays –Low-latency common or high-priority paths through router No clock distribution issues Security and EMI advantages –Clock focuses EM emissions –The presence of a clock can also aid fault-induction and side-channel analysis attacks

33/67 Why are asynchronous NoCs interesting? Freedom to optimize network links –Not constrained by need to distribute/generate multiple clock frequencies. Can exploit high-frequency narrow links. –Dynamic latency/throughput trade-offs (adaptive pipeline depth) –Exploit dynamic optimizations on links (e.g. DVS) Reduced design time –Easy to use interfaces, modularity. –Robust and simple implementation Some arguments for reduced power

34/67 Asynchronous Circuit Basics Control in asynchronous circuits often relies on simple handshaking protocols (req/ack event cycles) Delay-insensitive event- driven system - every signal transition is acknowledged The C-element is a fundamental building block of many asynchronous circuits –Can be thought of as a AND- gate for events

35/67 Simple Pipelines Event FIFO Micropipeline I. E. Sutherland, “Micropipelines”, Communications of the ACM, Vol. 32, Issue 6 (June 1989).

36/67 Arbitration

37/67 Tree Arbiter Element M. B. Josephs and J. T. Yantchev, “CMOS Design of the Tree Arbiter Element”, IEEE Trans. On VLSI Systems 4(4), pp , Dec J. Bainbridge, “Asynchronous System-on-Chip Interconnect”, Ph.D. Thesis, Dept. of Computer Science, University of Manchester.

38/67 Multiway Arbiters

39/67 Static Priority Arbiters “Priority Arbiters” Bystrov/Kinniment/Yakovlev (ASYNC’00) First stage samples/locks current request vector Static or dynamic priority Original design updated to tackle performance and QoS issues Felicijan/Bainbridge/Furber (ICM’03)

40/67 Delay-Insensitive Communication 4-phase dual-rail protocol D0=0 D1=1 1 1.REQ+ 2.ACK+ 1 ACK_in+ 3.REQ- D0=0 D1=0 1 4.ACK- 1 ACK_out+ 0 0 ACK_in- D0=0

41/67 Delay-Insensitive Switched Interconnect The basic DI latch can be extended to support steering, multiplexing and arbitration J. Bainbridge and S. Furber, “CHAIN: A Delay-Insensitive Chip Area Interconnect”, IEEE Micro, Vol. 22, No. 5, 2002

42/67 CHAIN Basic link is 6 wires –2-bits of data (1-of-4) + end of packet + ack any N-of-M code could be used –around 1Gbps (0.18um, 160Mbps per wire) –Links may be ganged together Route information tapped off and used to steer remainder of packet If arbitration is required, arbiter grant is retained for duration of packet (no fragmentation of packets)

43/67 Asynchronous on-chip networks How do we build more complex on-chip routers? –Support for virtual-channels –QoS Challenges –Multi-way & prioritised arbitration –Control overheads Arbitration and DI circuits can be slow! How can control overheads be hidden?

44/67 Overview of Some Published Asynchronous On-Chip Networks “Quality-of-Service (QoS) for Asynchronous On-Chip Networks” T. Felicijan (Ph.D. 2004, Manchester) “An Asynchronous Router for Multiple Service Levels Networks on Chip”, R. Dobkin et al, ASYNC’05. (QNoC Group) MANGO Clockless Network-on-Chip –“A Scheduling Discipline for Latency and Bandwidth Guarantees in Asynchronous Network-on-Chip”, T. Bjerregaard and J. Spars Ø, ASYNC’05. –“A router Architecture for Connection-Orientated Service Guarantees in the MANGO Clockless Network-on-Chip”, T. Bjerregaard and J. Spars Ø, DATE’05

45/67 Virtual Channels Best Effort Routers –Virtual-Channel allocation is performed at each router –any free VC (at the required output) may be assigned to a new packet Significant performance gains over simpler static schemes –Can also prioritize packets QoS Routers based on Static VC allocation –Packets retains the same VC throughout the network. –Each VC is assigned a static priority level Connection-Orientated Router –VCs are reserved at each router along a path to create a connection –Hard QoS guarantees possible

46/67 QoS Support All these asynchronous networks provide QoS support MANGO –Guaranteed Service (GS) connections –A connection is a reserved sequence of VCs through the network –Hard latency and bandwidth guarantees are provided

47/67 Static VC assignments [Felicijan][Dobkin] implement QoS through static VC assignments –i.e. packet is assigned VC and uses this VC at all routers –May need to contend with other packets assigned the same VC –Packets with same VC cannot be interleaved –VC is reserved for duration of packet (reserved rather than allocated from pool of free VCs)

48/67 Felicijan/Manchester

49/67 Felicijan/Manchester Implementation style: –QDI, 1-of-4 encoded data with RTZ signalling Simplest switching network of asynchronous designs (multiplexed crossbar) 8-bit data flits Performance Results (0.18um) –Maximum router frequency ~300MHz –Minimum router latency ~5ns? Two constraints on provision of QoS –First due to multiplexed crossbar –Second related to minimum buffer requirements

50/67 Dobkin/Technion

51/67 Dobkin/Technion 4 service levels (statically assigned VCs) Implementation style: –bundled data –Significant area reduction over QDI approach 8-bit data flits Synchronous versus Asynchronous router study –Throughput is reported to be similar –Minimum Latency (head flit) input to output (0.35um, typ. PVT) Synchronous 3.7ns Asynchronous 13.0ns (x3.5)

52/67 MANGO Clockless Network-on-chip

53/67 MANGO Clockless Network-on-chip Non-blocking switching network means link access arbitration is all that must be considered for hard QoS guarantees VCs are assigned statically (no contention) –Simple BE router used to program GS router (not shown) Basic Static Priority Arbiter (SPA) is preceded by admission control logic –Part of Asynchronous Latency Guarantee (ALG) scheduling algorithm (see ASYNC’05 paper) –Prevents lower priority flits being stalled more than once by each higher priority flit

54/67 MANGO Clockless Network-on-chip 515MHz port speed (WC, 0.13um) 32-bit data flits Implementation style: –Internally uses a bundled-data (RTZ) circuit style –Links use a DI two-phase encoding Router Latency ~5.2ns –Switch ~2.1ns, VC Buffers/Control ~1.2ns –VC merge ~1.6ns MANGO provides hard latency/throughput guarantees unlike other VC prioritization based schemes

Low-Latency Best-Effort Asynchronous Networks

56/67 Improving Network Latency Asynchronous router latency can be high –Fine-grain pipelining can provide good throughput figures but control overheads can extend latency Completion detection, RTZ phase, H/S Fast combinational matrix arbiters have also been replaced by cascaded MUTEXes or complex priority arbiters Overheads even greater in a BE router that must allocate VCs dynamically Approaches to reduce latency? –Speculation –Decoupled control and data networks

57/67 Low-Latency Asynchronous Routers Exploit speculation? –Use Priority arbiter organisation –Assume only a single grant will be present after lock is asserted –Use MUTEX grant outputs to steer data immediately Issues –Complex abort procedure? –Invalid data and DI encoding? –Careful not to make common-case slower

58/67 1.Control Network: Simple/fast and lightly loaded 2. Data Network: Supporting virtual channels, packets, wide datapath Decoupled Control and Data Networks Idea: Operate two independent networks:

59/67 Decoupled Control and Data Networks Control network runs ahead of data network, hiding latency of scheduling logic –In an asynchronous environment, each network will operate at its natural rate Control network latency will be much lower compared to data network –Narrower links and simpler datapath No virtual channels - little arbitration, less switching –Less traffic, single control flit per packet only –Could also exploit ‘fat’ wires and early requests to send packet Separate control and data networks can also be exploited in synchronous network [Peh/Dally] L. Peh and W. J. Dally, “Flit-Reservation Flow Control”, In Proceedings HPCA’00.

60/67 Decoupled Control and Data Networks Schedule is queued and steers incoming data flits (data flits contain no routing information) Scheduler could perform VC allocation or both VC and switch allocation in advance Control network could also control power- gating of data network, waking network/links as needed from sleep mode.

61/67 Decoupled Control and Data Networks Design Decisions –Design can be simplified by keeping input port VC requests in order –Has obvious implications for performance –Out-of-order VC allocation scheme also possible –Performing switch allocation ahead of time could be inefficient Order data actually arrives could be different Decoupled control and data networks may help hide scheduling overheads. More appropriate than speculation for asynchronous NoCs?

Synchronous or Asynchronous NoCs?

63/67 Comparing Approaches Little published work on asynchronous routers and networks –Single latency/throughput figures don’t tell whole story –Detailed comparative studies with real traffic are required Comparing synchronous and asynchronous designs has always been difficult –Often difficult to isolate impact of choice of system timing style, many things tend to be different: Technology, circuit style, architecture –Difficult to reproduce and simulate asynchronous designs from published work. No notion of cycle- accurate model. Published work often lacks detailed control and datapath delays.

64/67 Questions about Asynchronous design? Testing asynchronous circuits –An asynchronous circuit replaces the clock with a large number of distributed state holding elements –Large area overhead associated with test –Testing of non-deterministic elements (MUTEX) Performance Guarantees –““Asynchronous circuits avoid issues of timing closure, they are correct-by-construction” – But performance guarantees are still required. Slow synchronous circuits are easy to build! –Value safe versus time safe –Less predictable, non-deterministic –Predicting performance is more complex EDA Tool Requirements Perhaps on-chip communication is an application where such characteristics can be tolerated?

65/67 Synchronous or Asynchronous? A clockless on-chip network appears to be an elegant solution although some questions remain: –Test –Performance concerns Shouldn’t asynchronous designs offer latency advantages? –Fast local control, path/data dependent delays, DI interconnects Perhaps asynchronous routers mimic synchronous architectures too closely? –Exploit flexibility, novel architectures, different topologies Overheads for data-driven clocking or GALS currently look small in comparison Synchronous design has advantages too –Predictability and determinism can be exploited fast single cycle routers possible –Global snapshot of state is good for scheduling Still lots of interesting research to be done –Need more data points

66/67 Conclusions High cost associated with both global synchrony and delay- insensitive circuits –Can relax constraints in both directions Which techniques achieve the best cost/benefit mix for on-chip networks? –Data-driven clocks look promising ? SYNCHRONOUS ASYNCHRONOUS

67/67 Thank You Comments/Questions? Talk abstract, slides, notes and full bibliography at: