Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

QuT: A Low-Power Optical Network-on-chip
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies Jesse G. Beu Michael C. Rosier Thomas M. Conte Tinker Research Georgia Institute.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Parallel Computer Architecture: Essentials for Both Computer Scientists and Engineers Edward F. Gehringer †* Yan.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
Interconnect Network Topologies
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Interconnect Basics 1. Where Is Interconnect Used? To connect components Many examples  Processors and processors  Processors and memories (banks) 
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Birds Eye View of Interconnection Networks
Traffic Steering Between a Low-Latency Unsiwtched TL Ring and a High-Throughput Switched On-chip Interconnect Jungju Oh, Alenka Zajic, Milos Prvulovic.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone.
Super computers Parallel Processing
TOPICS INTRODUCTION CLASSIFICATION CHARACTERISTICS APPLICATION RELATED WORK PROBLEM STATEMENT OBJECTIVES PHASES.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
Intro Wireless vs. wire-based communication –Costs –Mobility Wireless multi hop networks Ad Hoc networking Agenda: –Technology background –Applications.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
March University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
ESE532: System-on-a-Chip Architecture
Interaction of NoC design and Coherence Protocol in 3D-stacked CMPs
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Interconnection Network Design Lecture 14
Impact of Interconnection Network resources on CMP performance
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
CMP Design Choices Finding Parameters that Impact CMP Performance
Presentation transcript:

Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor: Mikko Lipasti 6th May 20091Interconnect Evaluation

Outline Introduction Topologies Assessed Evaluation Methodology Results Constraints Conclusion 6th May 2009Interconnect Evaluation2

Outline Introduction Topologies Assessed Evaluation Methodology Results Constraints Conclusion 6th May 2009Interconnect Evaluation3

NoC Challenges –Performance requirement Low Latency Maximum Concurrent Communication –Tight energy and area constraints –Reliability requirements –Low cost 6th May 2009Interconnect Evaluation4

Outline Introduction Topologies Assessed Evaluation Methodology Results Constraints Conclusion 6th May 2009Interconnect Evaluation5

Topologies Assessed 6th May 2009Interconnect Evaluation6 RingCrossbarMeshTorusTreeButterfly Bi-directional RingFull Connected2D Clos k x n Array Hierarchical Ring (2 level) 3DFat TreeFlatten-c1 Hierarchical Ring (3 level) Flatten-c2

Ring 6th May 2009Interconnect Evaluation

Crossbar 6th May 2009Interconnect Evaluation8

Mesh 6th May 2009Interconnect Evaluation9

Torus 6th May 2009Interconnect Evaluation10

Folded Clos 6th May 2009Interconnect Evaluation11

Fat Tree 6th May 2009Interconnect Evaluation12

Butterfly 6th May 2009Interconnect Evaluation13

Flattened Butterfly 6th May 2009Interconnect Evaluation14

Outline Introduction Topologies Assessed Evaluation Methodology Results Constraints Conclusion 6th May 2009Interconnect Evaluation15

Evaluation Methodology Minimal Dimension order routing Parameters –No of nodes(16,32) –Flit size(128,256) CMP protocols MOESI_CMP_token MSI_MOSI_CMP_ MOESI_CMP_directory Benchmark –Oltp, Apache, Jbb, Ocean Comparison –Latency, bandwidth, power, area. 6th May 2009Interconnect Evaluation16 Simics Ruby- Garnet Orion Latency, Through-put Area, Power

Outline Introduction Topologies Assessed Evaluation Methodology Results Constraints Conclusion 6th May 2009Interconnect Evaluation17

Results (1) Latency 6th May 2009Interconnect Evaluation18 **Latency in Cycles  2d torus is better than 2d mesh and 3d mesh.  Torus had enough path diversity.  Token protocol are equally good compared to directory protocols on Mesh/Torus/Butterfly

Results (1) Latency 6th May 2009Interconnect Evaluation19 **Latency in Cycles  HRing is not scalable compared to simple ring.  For small no of processor they add no of hops.  Token protocol works not well on rings.  Token cause congestion in the network.

Results (1) Latency 6th May 2009Interconnect Evaluation20 **Latency in Cycles  Butterfly is next competent to Mesh/Torus.  Flatten Butterfly are better then k-ary butterfly.  Increase in concentration for routers works very well.

Results (1) Latency 6th May 2009Interconnect Evaluation21 **Latency in Cycles  Token protocol does no perform well on tree.  Tokens overhead creates congestion in the network..

Results (5) Area Distribution 6th May 2009Interconnect Evaluation22

Results (4) Area 6th May 2009Interconnect Evaluation23 **Area in mm^2  2d Torus, clos,HRing are well scalable in terms of area.  Ring has less area compared to Mesh/Torus.

Results (3) Power Distribution 6th May 2009Interconnect Evaluation24  Crossbar consumes majority of power.

Results (2) Power 6th May 2009Interconnect Evaluation25 **Power in Watts  Token protocol takes more power due to high network traffic.  Ring has low power compared to Mesh/Torus.  Router area is decreased in ring.  Butterfly has lowest power consumption.  Refer to kanchan what is this binaray*

Results (6) Throughput 6th May 2009Interconnect Evaluation26 **Throughput in flits/cycle  Token cause congestion that leads to higher link utilization.

Results (6) Throughput 6th May 2009Interconnect Evaluation27 **Throughput in flits/cycle  HRing avoid congestion so token protocol have low link utilization.

Results (6) Throughput 6th May 2009Interconnect Evaluation28 **Throughput in flits/cycle  MSI gives less link utilization as it reduce the coherence traffic.  Increase in concentration leads to high link utilization in butterfly.

Results (6) Throughput 6th May 2009Interconnect Evaluation29 **Throughput in flits/cycle Refer to kanchan* why binary is giving high link uti* for MSI

Results (7) Latency 6th May 2009Interconnect Evaluation30 **Latency in Cycles  Higher latency compared to Oltp.  High cache-to-cache misses.  Token performs well.  Token is able to remove the cache to cache indirection latency which is the dominating factor..

Results (7) Throughput 6th May 2009Interconnect Evaluation31 **Throughput in flits/cycle  Token protocol highly utilize link(for tokens).  MSI protocol generates less traffic.  Ring /Tree has higher utilization.

Results (8) Power 6th May 2009Interconnect Evaluation32 **Power in Watts  Butterfly and concentrate butterfly consume less power.  Token consumes more power in Mesh/Torus.

Results (10) Scalability 6th May 2009Interconnect Evaluation33 **Latency in Cycles  Refer to janaki why 2d torus give low latency for 32p vs 16p*.  2D Torus/Butterfly are well scalable for all the protocol.  Token protocol doesn’t scale well compared to directory protocol.  MSI_MOSI protocol scalability is best.

Results (11) 2D v/s 3D 6th May 2009Interconnect Evaluation34 **Latency in Cycles  16% increase in performance.  17% increase in power.

Conclusion  Protocol  Token protocol can be an alternative for directory protocol(15% speedup).  Token protocol doesn’t work well for Oltp.  Token protocols are power hungry(link power).  Area  Ring /Torus scales well on area with increase in nodes.  Butterfly are next competent for area after Ring/Torus and scales very well.  Power-  Increase in concentration is always cost efficient as in butterfly.  Performance  Butterfly might be better choice given power/performance metric.  2D torus might be a better choice than 2D Mesh(16% speedup).  Scalability  Token protocol does not scale well as compared to directory protocol.  MSI_MOSI has higher scalability. 6th May 2009Interconnect Evaluation35

Thank You! Questions? 6th May 2009Interconnect Evaluation36