Gaussian Interconnections for On-Chip Networks Ramón Beivide and Enrique Vallejo University of Cantabria, Spain

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

Shantanu Dutt Univ. of Illinois at Chicago
Misbah Mubarak, Christopher D. Carothers
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
A DAPTIVE R OUTING David Ouellet-Poulin CEG 4136 – Computer Architecture III November 16 th, 2010.
ECE 8813a (1) Non-minimal Routing Non-minimal routing  Wormhole degrades performance while VCT has less secondary effects  Fault tolerance is the main.
1/14 Ad Hoc Networking, Eli M. Gafni and Dimitri P. Bertsekas Distributed Algorithm for Generating Loop-free Routes in Networks With Frequently.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Announcements List Lab is still under construction Next session we will have paper discussion, assign papers,
Storage area network and System area network (SAN)
Dragonfly Topology and Routing
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Anya Apavatjrut, Katia Jaffres-Runser, Claire Goursaud and Jean-Marie Gorce Combining LT codes and XOR network coding for reliable and energy efficient.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
Interconnect Networks
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
Computing in the RAIN: A Reliable Array of Independent Nodes Group A3 Ka Hou Wong Jahanzeb Faizan Jonathan Sippel.
Network Aware Resource Allocation in Distributed Clouds.
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
IEEE Globecom 2010 Tan Le Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of NYU Opportunistic Overlay Multicast in Wireless.
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Network-Coding Multicast Networks With QoS Guarantees Yuanzhe Xuan and Chin-Tau Lea, Senior Member, IEEE IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19,
In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
1 University of California, Irvine Done By : Ala Khalifeh (Note : Not Presented)
YEAR 2006 The University of Auckland | New Zealand PRESENTATION Computer Science 703 Advance Computer Architecture 2006 Semester 1 Preparation for Test.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Data Communications and Networks Chapter 1 - Classification of network topologies Data Communications and Network.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
The Underlying Technologies
Advanced Computer Networks
BroadNets 2004, October 25-29, San Jose
Network Layer.
Distributed P2P File System
Indirect Networks or Dynamic Networks
ECE453 – Introduction to Computer Networks
CS 6290 Many-core & Interconnect
Birds Eye View of Interconnection Networks
Presentation transcript:

Gaussian Interconnections for On-Chip Networks Ramón Beivide and Enrique Vallejo University of Cantabria, Spain

R. Beivide, E. Vallejo2 Microgrid Workshop Index Introduction: Why a Topology? Dense Gaussian Networks and other topologies Different layouts Routing:  ideas: Adaptive routing, deadlock avoidance, fault tolerance  Unicast routing  Broadcast Routing Perfect placement of resources Expansibility:  Increasing number of nodes in a Gaussian network  Hierarchical Gaussian networks Some ideas about cache coherence Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo3 Microgrid Workshop Introduction Future trends: many PE on a chip Possible interconections: bus, MIN, direct network Bus-based interconnections do not scale – they do not provide a sufficient bandwith when there are many PEs. MIN hard to implement in a chip. Direct networks with a given topology: The way to connect different routers in the chip Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo4 Microgrid Workshop Mesh Network Number of Nodes N: N = b x b = b 2 Diameter k: k = (b-1) + (b-1) = 2b-2 Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo5 Microgrid Workshop Number of Nodes N: N = (b+1) 2 + b 2 Diameter k= b + b = 2b i 1+i 2+i -1+i -2+i 2i 1+2i -1+2i -i 1-i 2-i -1-i -2-i -2i 1-2i -1-2i 3i -3i 2 b+1 Diamond Network Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo6 Microgrid Workshop Torus Network Number of Nodes N: N = b x b = b 2 Diameter k = b -1 b b Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo7 Microgrid Workshop i 1+i 2+i 2i 1+2i 2+i -i 2+i -2i 2+i 3i -3i 2b+1 Number of Nodes N: N = (b+1) 2 + b 2 Diameter k = b Dense Gaussian Network Same # links as torus, with peripheral links. Lower mean distance and Diameter. Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo8 Microgrid Workshop Topological properties comparative TopologyNodesDiameterAprox. Diam. Average Distance Aprox. Aver. Dist 2-D Mesh 2-D Torus Dense Gaussian Lower average distance and diameter Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo9 Microgrid Workshop Area comparative Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo10 Microgrid Workshop 3i -2-i -1-i 3 2 -i 1 2i 1+2i 1-2i -1+2i 2-i -2i 1-i i 1+i 2+i -1+i -2+i -3i 1-2i Different Layouts Different layouts for the same network: Mesh-like layout Without peripheral links, bounded link length Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo11 Microgrid Workshop Routing ideas Adaptive routing: in-fligh packets can choose their (minimal) path from info in the Routing Record (jumps in each direction), depending on congestion or other parameters. Deadlock avoidance: Bubble routing proposed as a cost-effective deadlock avoidance mechanism (used in IBM’s Blue Gene). Only 2 virtual channels needed per link. Fault-tolerant routing: Inmunet proposed as a fast, efficient mechanism to detect link failures and restore network performance. Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo12 Microgrid Workshop Unicast Routing i 2+i -1+i -2+i 1+2i -1+2i -i 1-i 2-i -2-i -2i 1-2i -1-2i 3i -3i Route from a to b: Routing record generated From the difference: dest-source (x, y) -1-i i Example: i – (-1-i) = 1+2i (x=1, y=2) 1 jump to the right, 2 jumps up Movement from source node to the origin (node 0) generates routing record. Example 2: The movement makes the arrow fall outside the original network  Peripheral links used Translations from surrounding replicas of the network are considered, to obtain an optimal RR 2i Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo13 Microgrid Workshop P i 1+i 2+i -1+i -2+i 2i -1+2i -i 1-i 2-i -1-i -2-i -2i 1-2i -1-2i 3i -3i 1+2i NW NE SW SE Broadcast Routing Triangle-based broadcast Minimum number of steps The same for any node (due to peripheral links) Balanced use of resources Simple routing based on labels (see abstract) Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo14 Microgrid Workshop Perfect placement of resources Resource distribution over the network. All nodes have resources within a given distance (example: distance 1) Resource example: I/O ports Processing elements Memory banks... Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo15 Microgrid Workshop Expansibility: Increasing # nodes Increasing Gaussian network: Network can be expanded with the number of nodes necessary to increase diameter in 1 unit: 4k +4. Alternatively, hierarchical Gaussian networks have been proposed, joining several Gaussian networks with another gaussian pattern. Useful for CMPs sceneries, for example (different latency, link length, etc. in each hierarchical level):  Lower level: interconnection between different cores inside a chip. Fast and reliable, with low latency  Higher level: interconnection between different chips. Slower and with higher latency. Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo16 Microgrid Workshop Expansibility: Increasing # nodes Gaussian Interconnections for On-chip Networks Lower level (on-chip) with a dense Gaussian pattern. Higher level, with the same pattern. Central routers will have 8 links: 4 internal links 4 external links Route from one node to another: 1) Route to the central router of the same gaussian 2) Route in the higher level to the desired gaussian. 3) Route from the central router of the dest. Gaussian, to the destiny node.

R. Beivide, E. Vallejo17 Microgrid Workshop Cache coherence in Gaussian networks Recent proposals based in broadcast, such as TokenB (M. Hill) can beneficiate from a Gaussian interconnection: Broadcast block requests (latency optimized with Gaussian interconection). Unicast response with grants (Tokens) to use memory blocks, between different nodes and main memory. There is no need to maintain a directory for coherence. Broadcast network can work as a bus with a snoopy-like protocol. Gaussian Interconnections for On-chip Networks

R. Beivide, E. Vallejo18 Microgrid Workshop

R. Beivide, E. Vallejo19 Microgrid Workshop Additional comments (not presented) Dense Gaussian Networks are isomorphic to Dense Midimew Networks. However, these two topologies are not isomorphic in the general case (not dense). In this work, related to Dense Gaussian networks, properties studied for both Gaussian and Midimew topologies are presented. References in the next slide will be thus referred to both Midimew and Gaussian networks

R. Beivide, E. Vallejo20 Microgrid Workshop Commented References (I) Midimew networks were first introduced in: R. Beivide, E. Herrada, J.L. Balcázar, Agustín Arruabarrena, “Optimal Distance Networks of Low Degree for Parallel Computers”. IEEE Transactions on Computers, Vol. 40, No 10, Oct 1991, pp This paper introduces properties, analysis and some rectangular (mesh-like) layouts. Unicast routing is also proposed, but based on labeling nodes with integer labels (instead of Gaussian numbers). Bounded link-length layouts were introduced in: E. Vallejo, R. Beivide y C. Martínez, “Practicable Layouts for Optimal Circulant Graphs”. Proceedings of the “13th Euromicro Conference on Parallel, Distributed and Network-based Processing”, Lugano, Switzerland, Feb A previous work on Midimew folding, which obtained a worse result (maximum link length 4) is the following one: Francis C. M. Lau, Guihai Chen, “Optimal Layouts of Midimew Networks”. IEEE Transactions on Parallel and Distributed Systems, Vol 7, No 9, pp

R. Beivide, E. Vallejo21 Microgrid Workshop Commented References (II) Gaussian Networks will be introduced in: C. Martínez, R. Beivide, J. Gutierrez and E. Gabidulin. "On the Perfect t- Dominating Set Problem in Circulant Graphs and Codes over Gaussian Integers". Accepted for presentation at ISIT’05, September, Australia. This paper also deals with perfect resource placement. Broadcasting in Dense Gaussian Networks will be introduced in: R. Beivide, C. Martínez, E. Vallejo, J. Gutierrez, C. Izu, “Gaussian Interconnection Networks”. Accepted for presentation at the Spanish Parallelism Conferences, Sept. 05, Granada, Spain. This paper also introduces unicast routing in terms of the Gaussian numbers (instead of integer labels) Hierarchical Gaussian Networks will be introduced in: Miquel Moretó, Carmen Martínez, Enrique Vallejo, Ramón Beivide, Mateo Valero, “Hierarchical Topologies for Large-scale Two-level Networks”, Accepted for presentation at the Spanish Parallelism Conferences, Sept. 05, Granada, Spain.

R. Beivide, E. Vallejo22 Microgrid Workshop Commented References (III) Bubble routing is described in V. Puente, C. Izu, R. Beivide, J.A. Gregorio, F. Vallejo and J.M. Prellezo, “The Adaptative Bubble Router”, Journal of Parallel and Distributed Computing. Vol 61 - nº 9, September 2001 Inmunnet was introduced in V. Puente, J.A. Gregorio, F. Vallejo and R. Beivide. "Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism". 31th Annual International Symposium on Computer Architecture (ISCA- 31), pp , Token Coherence was presented in: M. M. K. Martin, M. D. Hill, and D. A. Wood. "Token Coherence: Decoupling Performance and Correctness". 30th Annual International Symposium on Computer Architecture (ISCA-30), pp , 2003.