King Fahd University of Petroleum and Minerals CCSE – COESOC 2006 – Tampere, 14-16 Nov. 2006Abdelhafid Bouhraoua A High Throughput Network-on-Chip Architecture.

Slides:



Advertisements
Similar presentations
Lecture 4. Topics covered in last lecture Multistage Switching (Clos Network) Architecture of Clos Network Routing in Clos Network Blocking Rearranging.
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Chapter 10 Switching Fabrics. Outline Physical Interconnection Physical box with backplane Individual blades plug into backplane slots Each blade contains.
King Fahd University of Petroleum and Minerals CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua A High Throughput Network-on-Chip Architecture.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Issues in System-Level Direct Networks Jason D. Bakos.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Connecting LANs, Backbone Networks, and Virtual LANs
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Network Topologies.
Switching, routing, and flow control in interconnection networks.
Computer Networks Switching Professor Hui Zhang
Interconnect Network Topologies
Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
On-Chip Networks and Testing
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
Connectivity Devices Hakim S. ADICHE, MSc
TO p. 1 Spring 2006 EE 5304/EETS 7304 Internet Protocols Tom Oh Dept of Electrical Engineering Lecture 9 Routers, switches.
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
1 Dynamic Interconnection Networks Miodrag Bolic.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Network-on-Chip Energy-Efficient Design Techniques for Interconnects Suhail Basit.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Veronica Eyo Sharvari Joshi. System on chip Overview Transition from Ad hoc System On Chip design to Platform based design Partitioning the communication.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
MASCON: A Single IC Solution to ATM Multi-Channel Switching With Embedded Multicasting Ali Mohammad Zareh Bidoki April 2002.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Data and Computer Communications 8 th and 9 th Edition by William Stallings Chapter 10 – Circuit Switching and Packet Switching.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Lecture 23: Interconnection Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Azeddien M. Sllame, Amani Hasan Abdelkader
Cache Memory Presentation I
Deadlock Free Hardware Router with Dynamic Arbiter
Router Construction Outline Switched Fabrics IP Routers
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Abdelhafid Bouhraoua and M.E.S El-Rabaa
Multiprocessors and Multi-computers
Presentation transcript:

King Fahd University of Petroleum and Minerals CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua A High Throughput Network-on-Chip Architecture for System-on-Chip Interconnect Abdelhafid Bouhraoua and M.E.S El-Rabaa Computer Engineering Department (COE) College of Computer Science and Engineering (CCSE) King Fahd University of Petroleum and Minerals (KFUPM) Dhahran, Eastern Province, Saudi Arabia

King Fahd University of Petroleum and Minerals 2 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 3 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 4 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Semiconductor Industry Future Technology Evolution is Faster than Design Evolution

King Fahd University of Petroleum and Minerals 5 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Chip Design Methodology Chip Complexity: 100M – 3B. ASIC Methodology Not Suitable RTL  Synthesis  Back End Impossible to handle Full RTL Design for the whole project IP-Reuse Based Methodology Opens a Wide Range of Possibilities IP Blocks Together On a Chip  “System-on-Chip” From ASIC  System-on-Chip Era

King Fahd University of Petroleum and Minerals 6 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua SoC Constraints Very Short Time-To-Market Compressed Schedule Very Short Lifecycle Low Development Cost Small Team High Complexity Available Silicon Resources to Produce Cost-Effective Highly Integrated SoCs. Broad Range of IP Blocks Impossibility to know them all

King Fahd University of Petroleum and Minerals 7 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua SoC Methodology Main Task: Integration of The IP Blocks System Level Integration Data Formatting and Conversion Protocol Interfacing Control Interfacing Interconnection Level Integration Signal Interfacing Data Transfer Interfacing Wire Interconnect and Back-end Integration

King Fahd University of Petroleum and Minerals 8 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Interconnecting The IPs Brute Force Method: Design The Interface Block for Every Pair of IPs in the SoC Point-to-Point Communication between IPs Problems: Design Effort Similar to that of a new IP Block If 20 Different Blocks  Around 400 New Designs !!!! Point-to-Point Communication  Wiring Mess 50 Blocks; 8bits bidirectional  More than 20,000 Globally Routed Wires Should Look For a Better Way

King Fahd University of Petroleum and Minerals 9 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Networks-on-Chips “Route Packets NOT Wires”, William J. Dally Idea: Build a Complete on-Chip Network Unified Communication Model (Similar to OSI Stack) No Ad-hoc Effort Standardized Interfacing (May be provided by IP Vendors) Unified Network Elements (Routers, Link Interfaces) No Design required by the SoC Teams Flexible Interconnect and Reduced Global Wiring

King Fahd University of Petroleum and Minerals 10 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua NoC Requirements Performance How fast packets are moved across the network? How much traffic is carried at the same time and for how long? Overhead How Big is its required Size (in Gates) ? Adaptivity Does it Adapt Easily to new Designs ? Complexity How Easy is Interfacing to it ?

King Fahd University of Petroleum and Minerals 11 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 12 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Previous Work Majority directly derived from other research (Interconnection Networks for Parallel Architectures) Reproduce what has been learned in the area of inter-chip networks, Focus on the router architecture alone to achieve certain goals in latency Asynchronous design of NoCs, mainly GALS Circuit switching techniques introduced to provide a certain guarantee for the latency. Did not fully take advantage of the fact that the network is on- chip where the main gain is no-pin limitation. Router architectures directly derived from inter-chip architectures where the routers were implemented on a single chip  substantial overhead. Added complexity to achieve guaranteed latency is an overkill in the on-chip context.

King Fahd University of Petroleum and Minerals 13 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Which Network? Most Straightforward  Crossbar Good Throughput (maxes at 66%) Non Scalable (Quadratic) Complexity Of Implementation for Higher Number of I/Os.

King Fahd University of Petroleum and Minerals 14 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua 2-D Mesh Very Popular Topology in NoCs. Very Suitable for the 2D nature of Chip Floorplanning (Tiling) Very High Constraints Inefficient routing algorithms (deadlock-free by construction) Efficient routing algorithms (Complex implementation) Poor performance: Saturation reached at 30 %.

King Fahd University of Petroleum and Minerals 15 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Analysis Low throughput. Means: latency cannot be guaranteed above the maximum throughput levels Low throughput cause by contention over the output ports of routers among several incoming packets Cannot prevent contention from happening. Contention makes router architectures more complex because they need to integrate buffering and prioritization logic. Routers that implement both packet and circuit switching makes the architecture even more complex.

King Fahd University of Petroleum and Minerals 16 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Methodology Take advantage of the On-Chip Context: Design frozen before tape out No internal IO limitations Aim for a High Throughput Architecture Circuitry used at 30% of its maximum is NOT an optimal Solution (Clock frequency, power). Reduced router size Integrate a large number of routers Wormhole routing vs. Store and Forward Reduce required buffers in routers

King Fahd University of Petroleum and Minerals 17 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Fat Tree RRRR RRRR RRRR CCCCCCCC Bidirectional multistage or folded multistage networks Bidirectional multistage are two entities: The Fat Tree (FT) The butterfly. Fat Tree better than butterfly (previous work) What topology resembles a crossbar? Banyans or Multistage Interconnection Networks. n+1 Stages (or rows) Size is Routers = n x 2 n Clients = 2 n+1 Diameter = 2logk + 1; n = log k

King Fahd University of Petroleum and Minerals 18 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 19 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Routing in Fat Tree Routing reduced to routing in a binary tree. Binary Trees Three Routing Directions UP RIGHT LEFT Router UP LEFT RIGHT

King Fahd University of Petroleum and Minerals 20 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Routing in Fat Tree Router (r,c) [0, l[ U [u, 2 n+1 -1] [l, l+2 r-1 [[l+2 r-1, u[ Lower bound l : smallest address reached from the router (r,c). Smallest address within the range obtained by clearing the lowest r bits of the column c. l = (c/2 r ) x 2 r. Upper bound u: largest address reached from the router (r,c). Largest address obtained by adding 2 r to the lower bound l. u = l+ 2 r. Matrix n rows x 2 (n-1) columns. Router (r,c) r : row index (rows are indexed from 0 to n-1) c: column index (columns are indexed from 0 to 2 (n+1) -1) Size of the clients’ address space reachable using the downside ports is equal to 2 r It is always a continuous interval of addresses of the form [l, u[.

King Fahd University of Petroleum and Minerals 21 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Routing In Fat Tree RRRR RRRR RRRR CCCCCCCC “Summit” Routers Routing UP: Adaptive Routing Down: Deterministic Alternate Paths

King Fahd University of Petroleum and Minerals 22 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 23 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Contention in Fat Tree Packets coming from the UP links are never routed up Only packets coming from the bottom links are routed up. Since the number of UP links is equal to the number of bottom links, there cannot be any contention when routing up. Contention occurs only when going down. Bottom links are split in RIGHT and LEFT links, deterministic routing of packets will lead to contention. UP LEFTRIGHT Many Choices for Going UP Contention on the way down

King Fahd University of Petroleum and Minerals 24 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Modified Fat Tree Doubling of downward links eliminates contention

King Fahd University of Petroleum and Minerals 25 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 26 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Router Architecture No Crossbar No Buffers (Pushed to the Clients) Every downstream input simultaneously connected to two outputs. Contention eliminated between the inputs going downstream. Number of outputs is 2k+2 for k inputs (case of when the router is a summit) Router models differ from each other only by two items: Number of input and output ports on the down link Routing function constants (r,c)

King Fahd University of Petroleum and Minerals 27 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Routing Circuitry All network elements are constants and frozen at design time. All lower bound and upper bound values, used to generate the routing functions, are constants for each router. These constants are entered as inputs into the routing function Routing Function implemented using comparators. Constants needed by the routing function are: l L = l + 2 r -1 u Address ≥ <≥ < ABAB ABAB ABAB l L u ≥ <≥ < ≥ <≥ < LEF T RIGH T UP

King Fahd University of Petroleum and Minerals 28 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Client Interface Buffers pushed to the Client Interfaces Each incoming link is terminated with a FIFO memory. The different FIFO memories connected to the client through a single shared bus. Client/IP Block FIFO Down Links (from router) Up Link FIFO Bus can be wider to perform data transfers faster than what is received in the FIFOs. The size of FIFOs customizable by design team according to the specifications

King Fahd University of Petroleum and Minerals 29 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 30 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Simulation Conditions Uniform Traffic Generation Uniform Distribution of Destinations Traffic Rate constant fraction of Maximum Link Bandwidth Variable Packet Size (within a predetermined range; eg. 64 bytes +/- 10%) Simulation Platform: Cycle-based C-based. Developed for this purpose

King Fahd University of Petroleum and Minerals 31 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Throughput More than 90% Throughput achieved Compare with Regular Fat Tree

King Fahd University of Petroleum and Minerals 32 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Latency

King Fahd University of Petroleum and Minerals 33 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Area and Speed Buffer-less architecture less costly

King Fahd University of Petroleum and Minerals 34 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Client Buffer Utilization Buffers pushed to the client interfaces. Considerable number of buffer lanes is necessary for every client interface. Simulations shows a linear progression of the maximum number of lanes used during operation. Obtained figures are an order of magnitude lower than the number imposed by the architecture. Number of buffer lanes in the client interface can be tailored to suit the class of applications at hand while reducing buffering area.

King Fahd University of Petroleum and Minerals 35 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Outline Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion

King Fahd University of Petroleum and Minerals 36 CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua Conclusion A contention-free modified FT architecture is proposed. Proposed architecture achieves maximum theoretical throughput and has smaller latency than conventional FTs. Latency increases linearly with input load. Achieved performance is actual performance using a contention-free network. The area of the network is kept small because of the absence of buffers in the router architecture. Number of buffer lanes in the client interfaces can be tailored for a specific platform to suit the class of applications at hand while reducing buffering area.