1 Keith D. Underwood, Eric Borch May 16, 2011 A Unified Algorithm for both Randomized Deterministic and Adaptive Routing in Torus Networks.

Slides:



Advertisements
Similar presentations
© 2008 Oracle Corporation – Proprietary and Confidential.
Advertisements

Chapter 7: Deadlocks Adapted by Donghui Zhang from the original version by Silberschatz et al.
0 - 0.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Università degli Studi di Firenze 08 July 2004 COST th MCM - Budapest, Hungary 1 Cross-layer design for Multiple access techniques in wireless communications.
11 Auto Regression Analysis Shuang He Intel Linux Graphics Validation Team Open Source Technology Center
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
Service Access Management Tool Tour: Contract Number
Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology.
Symantec Education Skills Assessment SESA 3.0 Feature Showcase
Flow Aware Networking © 2007 Katedra Telekomunikacji AGH Flow Aware Networking Router model lead by prof. dr hab. inż. Andrzej Jajszczyk.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Services Course Windows Live SkyDrive Participant Guide.
Addition 1’s to 20.
Services Course Windows Live SkyDrive Participant Guide.
Week 1.
Delay Analysis and Optimality of Scheduling Policies for Multihop Wireless Networks Gagan Raj Gupta Post-Doctoral Research Associate with the Parallel.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
1© Hitachi Data Systems Corporation All Rights Reserved.1 BROCADE FABRIC VISION TECHNOLOGY.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Intel® Education Fluid Math™
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
Lappeenrannan teknillinen yliopisto TITE Prof. Esa Kerttula Päivä 1: Luento 1-1-7: Maaliskuu © Esa Kerttula.
In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.
Yabin Liu Senior Program Manager Business Intelligence and Reporting.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
Scott Tucker Program Manager Customer and Loyalty.
Switching, routing, and flow control in interconnection networks.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
K-12 Blueprint Overview March An Overview The K-12 Blueprint offers resources for education leaders involved.
Copyright © 2013 Intel Corporation. All rights reserved. Digital Signage for Growing Businesses November 2013.
Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
Intel Confidential – For Use with Customers under NDA Only Revision - 01 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL®
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
A l a d d i n. c o m eSafe 6 FR2 Product Overview.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Josef Schauer Program Manager Previous version support.
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Only Use FD.io VPP to Achieve high performance service function chaining Yi Intel.
Many-core Software Development Platforms
Lecture: Interconnection Networks
Expanded CPU resource pool with
Presentation transcript:

1 Keith D. Underwood, Eric Borch May 16, 2011 A Unified Algorithm for both Randomized Deterministic and Adaptive Routing in Torus Networks

2 Legal Disclaimer Notice: This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Contact your local Intel sales office or your distributor to obtain the latest specification before placing your product order. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. All products, dates, and figures are preliminary for planning purposes and are subject to change without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The (Intel products discussed herein) may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling , or by visiting Intel's website at Intel® Itanium®, Xeon, Pentium®, Intel SpeedStep® and Intel NetBurst® are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2009, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

3 Motivation Torus topologies are popular, because they are simple and easy to build –Enable short links for power efficiency –Easy to integrate (e.g. BlueGene) Torus networks are particular susceptible to routing congestion –Some traffic patterns are extremely bad with deterministic routing Existing adaptive routing approaches have some limitations –Only work for virtual cut through flow control –Do not provide ordering for the programming models that require it

4 Current Adaptive Routing Algorithms for Torus Networks VCT adaptive routing –Used on the T3E –For each message class, use two deterministic channels plus one adaptive channel –If a deterministic channel is busy, can enter the adaptive channel –If the adaptive channel blocks, must re-enter the deterministic channel (and stay there) –Only works for VCT flow control Bubble adaptive routing –Used in BlueGene/L –Requires one packet worth of space to turn in the deterministic direction –Requires two packets worth of space to turn in the adaptive direction

5 Related Adaptive Routing Algorithms Turn model adaptive routing –Prevents deadlocks by making a specific subset of turns illegal –Example: negative first algorithm makes it illegal for a packet moving in any positive direction to turn into a negative direction Dimension reversal –Start with dimension ordered routing –Any time the preferred adaptive route would cross from a lower numbered to a high numbered dimension (e.g. moving in Y to moving in X), increment the VC

6 Limitations of Adaptive Routing Algorithms Two most recently deployed versions only work for virtual cut through flow control Turn-model based routing does not handle the torus link well Adaptive routing creates a specific challenge at the end-point: Requests are not in order Virtual Cut Through flow control creates a specific challenge at the end-point: messages are interleaved at a packet level granularity

7 New Algorithm Objectives Allow wormhole flow-control –Minimize message interleaving to simply the network end-point –Simplify router buffer control Provide deterministic variant that increases throughput –Eliminate requirement to use adaptive routing to achieve high throughput –Maintain compatibility with adaptive routing (e.g. of response messages) Make the route computation algorithmic, even if the end implementation might not do it that way

8 New Algorithm Overview (1) Based on turn model –Use turn model rules, except when crossing the torus link –Any of the turn model algorithms can be used Leverage concept from dimension reversal –Treat crossing the torus link as an illegal turn –Increment the virtual channel when crossing the torus link –Allows torus links to be treated as any other link in turn model Virtual channel requirement: number of dimensions plus one –Inject on VC0 –Can cross up to N torus links, where N is the number of dimensions –Four VCs per message class for 3D torus compares reasonably to 3 VCs per message class for other schemes

9 New Algorithm Overview (2) Both adaptive routing and deterministic routing use the same rules for which links are legal –Adaptive routing chooses the next link based on load –Deterministic routing chooses the next link based on a hash of the source, destination, and current router All VCs can be used for traffic injection –Leverage the T3E approach to VC spreading –Spread over all VCs based on destination –E.g. if a message only needs to cross 2 torus links, can inject on VC0 or VC1

10 Methodology Created a router model in ASIM Evaluated system size of 4K nodes/4K routers Simulated traditional traffic patterns (random, bit reverse, bit complement, transpose, shuffle) Traffic modeled as simple request/response pattern –Single Flit version: one flit request generates one flit response –VCT version: –16 flit write request / 2 flit write response –4 flit read request / 16 flit read response –50/50 mix of reads and writes –Long message version: 4 flit read request / 64K flit read response Simulated 500K cycles (results did not change from 250K cycle intermediate drop)

11 Router Architecture Modeled

12 Single Flit Throughput: Transpose Bit Reverse is similar

13 Single Flit Throughput: Random

14 Single Flit Throughput: Shuffle

15 Virtual Cut Through Throughput: Transpose Bit Reverse is similar

16 Virtual Cut Through Throughput: Random

17 Long Message Throughput: Transpose Bit Reverse is similar

18 Long Message Throughput: Random

19 Long Message Throughput: Bit Complement

20 Conclusions Current adaptive routing algorithms have two basic shortcomings –Only virtual cut through flow control is supported –Improving throughput requires message reordering Introduced an adaptive routing variant that: –Allows wormhole flow control –Includes a deterministic spreading variant –Achieves a high fraction of adaptive routing advantages with deterministic routing Future work: address limitations of deterministic spreading for some traffic patterns