Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree.

Slides:



Advertisements
Similar presentations
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Advertisements

Data and Computer Communications
Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
1 Routing Techniques in Wireless Sensor networks: A Survey.
1 Wide-Sense Nonblocking Multicast in a Class of Regular Optical Networks From: C. Zhou and Y. Yang, IEEE Transactions on communications, vol. 50, No.
Distributed Algorithms for Secure Multipath Routing
1 Complexity of Network Synchronization Raeda Naamnieh.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
December 20, 2004MPLS: TE and Restoration1 MPLS: Traffic Engineering and Restoration Routing Zartash Afzal Uzmi Computer Science and Engineering Lahore.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
1 A Distributed Delay-Constrained Dynamic Multicast Routing Algorithm Quan Sun and Horst Langendorfer Telecommunication Systems Journal, vol.11, p.47~58,
Multicast Routing in ATM Networks with Multiple Classes of QoS Ren-Hung Hwang, Min-Xiou Chen, and Youn-Chen Sun Department of Computer Science & Information.
Bluenet a New Scatternet Formation Scheme * Huseyin Ozgur Tan * Zifang Wang,Robert J.Thomas, Zygmunt Haas ECE Cornell Univ*
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Storage area network and System area network (SAN)
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Yanyan Yang, Yunhuai Liu, and Lionel M. Ni Department of Computer Science and Engineering, Hong Kong University of Science and Technology IEEE MASS 2009.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
Switching, routing, and flow control in interconnection networks.
Quasi Fat Trees for HPC Clouds and their Fault-Resilient Closed-Form Routing Technion - EE Department; *and Mellanox Technologies Eitan Zahavi* Isaac Keslassy.
Interconnect Network Topologies
Interconnect Networks
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.
Network Aware Resource Allocation in Distributed Clouds.
Infiniband subnet management Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.
Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.
1 Finding Constant From Change: Revisiting Network Performance Aware Optimizations on IaaS Clouds Yifan Gong, Bingsheng He, Dan Li Nanyang Technological.
1 Multicast Algorithms for Multi- Channel Wireless Mesh Networks Guokai Zeng, Bo Wang, Yong Ding, Li Xiao, Matt Mutka Michigan State University ICNP 2007.
IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
1 - CS7701 – Fall 2004 Review of: Detecting Network Intrusions via Sampling: A Game Theoretic Approach Paper by: – Murali Kodialam (Bell Labs) – T.V. Lakshman.
Routing Networks and Protocols Prepared by: TGK First Prepared on: Last Modified on: Quality checked by: Copyright 2009 Asia Pacific Institute of Information.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:
Teknik Routing Pertemuan 10 Matakuliah: H0524/Jaringan Komputer Tahun: 2009.
Peer to Peer Network Design Discovery and Routing algorithms
MPI implementation – collective communication MPI_Bcast implementation.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
Energy-Efficient Randomized Switching for Maximizing Lifetime in Tree- Based Wireless Sensor Networks Sk Kajal Arefin Imon, Adnan Khan, Mario Di Francesco,
Load Balanced Link Reversal Routing in Mobile Wireless Ad Hoc Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department RPI Costas Busch CSCI Department.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
A Comparison of Application-Level and Router-Assisted Hierarchical Schemes for Reliable Multicast Part 2 of the paper Pavlin Radoslavov, Christos Papadopoulos,
A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
ISP and Egress Path Selection for Multihomed Networks
Switching, routing, and flow control in interconnection networks
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Presentation transcript:

Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree Networks New Orleans, 2009 HPI-DC'09 (in conjunction with CLUSTER'09)‏

2 Summary ●We describe previously well known regular modulo-based routing algorithms for k-ary n-trees. ●We extend and analyze these algorithms for a broader class of networks: XGFTs, including cost-effective variants of k- ary n-trees ●We produce some combinatorial results that show that the two main variants for modulo-based algorithms perform equally well for a random distribution of traffic ●We identify two intrinsic flaws of oblivious modulo-based algorithms and propose a variant that improves over both.

3 ●XGFT topologies: ●k-ary n-trees and more cost-effective variants. ●Routing (State of the Art) ●Random ●Modulo-radix variants: Source-Mod-k and Destination-mod-k ●Experimental environment ●Analysis of Modulo-radix algorithms ●Proposal – random NCA up/down ●Evaluation ●Results ●Conclusion Outline

4 Extended Generalized Fat Trees I ●XGFT ( h ; m 1, …, m h ; w 1, …, w h ) ●Superclass of Multi-Trees ●k-ary n-trees [Petrini97] ●Slimmed trees [Navaridas07] ●h = height ●number of levels-1 ●levels are numbered 0 through h ●level 0 : compute nodes ●levels 1 … h : switch nodes ●m i = # children per node at level i, 0 < i ≤ h ●w i = # parents per node at level i-1, 0 < i ≤ h ●number of level 0 nodes =  i m i ●number of level h nodes =  i w i XGFT ( 3 ; 3, 2, 2 ; 2, 2,3 ) 0,0,00,0,10,0,20,1,00,1,10,1,21,0,01,0,11,0,21,1,01,1,11,1,2 0,0,01,0,00,1,01,1,00,0,11,0,10,1,11,1,1 0,0,00,1,01,0,01,1,00,0,10,1,11,0,11,1,1 0,0,00,0,10,0,20,1,00,1,10,1,21,0,01,0,11,0,21,1,01,1,11,1,2 4-ary 2-tree XGFT(3;4,4,4;1,4,1) – Slimmed tree Nearest Common Ancestors (NCA), Least Common Ancestors (LCA) or “roots” of a pair (s,d) or nodes are: The set of inner nodes at the lowermost level that are ancestors of both s and d.

5 Extended Generalized Fat Trees II XGFT ( 3 ; 3, 2, 2 ; 2, 2,3 ) 0,0,01,0,02,0,00,1,01,1,02,1,00,0,11,0,12,0,10,1,11,1,12,1,1 0,0,01,0,00,1,01,1,00,0,11,0,10,1,11,1,1 0,0,00,1,01,0,01,1,00,0,10,1,11,0,11,1,1 0,0,00,0,10,0,20,1,00,1,10,1,21,0,01,0,11,0,21,1,01,1,11,1, ary 2-tree XGFT(3;4,4,4;1,4,1) – Slimmed tree ●Number of nodes at level i, 0 < i < h ●Each node can be labeled as a h-tuple:, 0 ≤ M i ≤ m i, 0 ≤ W i ≤ w i which in combination with the level number i uniquely determines a node in the whole network (first W’s, then M’s) ●Equivalent variations in the labeling schemes have been proposed [Lin04,Gomez07]

6 XGFTs and Contention ●XGFTs provide multiple paths for every pair of nodes: ●Proportional to the “number of parents” (w i ) parameters up to the Least/Nearer Common ancestors of Source s and Destination d. ●Increasing the number of parents increases the cost. ●k-ary n-trees provide full-bisection and set a well-known trade-off between cost and performance ●Slimmed trees (with w i ≤ k) become more important with the increasing number of nodes ●Our analysis and proposal works better for slimmed trees than previous algorithms.

7 Related Work: Routing schemes ●Main Oblivious routing schemes for Fat Trees ●Random [Valiant81][Greenberg85] selection of upward paths ●Either Source [Leiserson92][Ohrin95][Kariniemi06] modulo assignment of upward links ●or Destination [Lin04][Gomez07][Johnson08] modulo assignment of upward links ●Pattern-aware (used in this work) ●Colored Heuristic [Rodriguez09] ●We use it as a base-line for comparison

8 Random Routing I ●The assignments of links to reach an NCA is totally random ●Idea: a random distribution should equally distribute the probability of having contention ●At each step choose a random parent until an NCA is reached, ●Then, follow the unique deterministic path down S Node 1Node 10

9 Regular Routings (s mod k, d mod k)‏ ●“Self-routing” approach ●At each step, choose the parent by getting doing a modulo operation (k) ●Difference: The label of the source or destination is used to go up to the tree only Node Node 10 = Dest 26 = mod 3 = (port) 0 mod 3= (port) 1 mod 3 = (port) 0 mod 3 = (port) 0 Node Dest mod 3 = (port) 2 mod 3 = (port) 2 source mod k destination mod k

10 Combinatorial Analysis of Modulo-based algorithms: An interesting question arises: is any of the two variations (source or destination) of the modulo-based algorithms intrinsically better? Number of permutations routed ●By s-mod-k, by d-mod-k ●The same; why? ●Idea: For every P, exists Inverse (P) / if P has c conflicts with s-mod-k, Inverse of P has c conflicts with d-mod-k (details in the paper) Number of general patterns (no permutations) routed ●By s-mod-k, by d-mod-k ●The same; why? ●Idea: decompose the pattern in all possible permutations ●Compute the maximum c of all possible permutations for s-mod-k ●Invert the decomposed permutations and apply the previous result, the union of the inverted permutations have the same maximum c for d-mod-k ●Look for more details in the paper

11 Experimental Setup ●Collection of application traces and pattern extraction ●Co-simulation approach [Minkenberg09]: ●Dimemas replays the MPI activity of the trace of an application ●Venus simulates the transmission of the messages with a detailed model of the network statistics Venus Simulator routes mapping topology Config File: Adapter, Switch parameters, BW, Link delay statistics map2ned Myrinet’s route files Myrinet’s map files routereader Traffic Generator traces Dimemas Simulator Config File: Links, Bandwidth, #buses, latency, Eager/rendez-vous, etc. traces Execution of an Application Visualization, Analysis Validation (Paraver)‏ ServerModClientMod Detailed level of simulation Applications/MPI

12 Applications ●WRF ●256 processors ●Each process sends 2 outstanding sends to destinations +/- 16 nodes away (except the first and the last 16 processes) ●CG ●128 processors

13 Results: WRF Progressive tree slimming ●Removing a single switch degrades the performance by 2 ●Removing 7 more middle switches has no impact for 3 routing schemes ●Regular modulo routings work very well (as good as the baseline), while Random does not.

14 Modulo-based Algorithms look good ●A word about contention: ●Two main types: endpoint contention, and network fabric contention ●Endpoint contention arises because a node is performing multiple outstanding sends or receives and has less adapters than it needs. ●Network fabric contention arises because there are not enough network resources or the routing algorithm is not using them adequately. ●Modulo-based routing algorithms work by using node labels to go up to the tree, concentrating endpoint contention for every particular node to a specific NCA ●S-mod-k uses the source label – endpoint contention at the source is concentrated ●D-mod-k uses the destination label – endpoint contention at the destination is concentrated However, modulo-based algorithms do not always work well...

15 Results: CG ●Oblivious routings cannot achieve the best performance ●It’s a pathological case for modulo-based oblivious algorithms ●Random routing does not achieve good performance ●The oblivious strategies do not match the baseline

16 Results: CG Communication Pattern ●Colored ●All phases take the same time ●Destination Mod K ●Non-local phase takes 8 times longer?

17 Results: CG Communication Pattern congruent with the modulo algorithm ●Why do oblivious algorithms work badly with CG? ●Only a phase in CG is non-local in our experiment: ●Each source sends to: ●destination = (source/2) * 16 + (source mod 2)‏ ●Modulo-based routing algorithms in radix 16 networks ●OutputPort (destination) = ((source/2) * 16 + (source mod 2)) mod 16 == 0 or 1 ●Map the 16 outgoing communications to either port 0 or 1 ●8 to each – 8 contending communications ●14 unused ports in the switch…

18 Proposal: Random NCA up/down Oblivious algorithms: What does d-mod-k or s-mod-k do? Make certain “roots” responsible to route a collection of sources or destination. The distribution of roots is even (for a k- ary n-tree, but not for slimmed trees). Tries to concentrate endpoint contention either in the path up to the root (souce mod k) or down from the root (destination mod k)‏ We can relabel the nodes and apply modulo-based algorithms to the new sources or destinations labels and define two families of algorithms: Random NCA up (using source labels) Random NCA down (using d labels) Idea: Each root is responsible to concentrate endpoint contention of a number of leaf nodes. Even distribution of leaf nodes to roots should lead to good performance.

19 A word on the results plots ●In each of the graphs there is a data point for: ●Source-mod-k (triangle up, centered) ●Destination-mod-k (triangle down, centered) ●And three boxes with (minimum,1 st quartile, median, 2 nd quartile and maximum) for: ●Random ●Random NCA up ●Random NCA down ●Note that although the random algorithms results are based on the statistical collection of 20 to 60 experiments with different seeds, the variance in the performance might not be noticeable, thus a single horizontal line is the whole “box”

20 Results: WRF Random-NCA-up and Random-NCA-down are almost as good as S-mod-K and D-mod-k

21 Results: CG Random-NCA-up and Random-NCA-down are mid-way between S-mod-K and D-mod-k and the baseline.

22 Routes per NCA ●Distribution of routes per NCA for several routing schemes ●X axis is the NCA number ●Left – non-slimmed ●Small variance of routes per NCA per routing and across ports ●Right – slimmed topology ●Source and destination modulo-based algorithms show a huge difference of routes assigned per NCA ●Random and the proposed family of random assignment of NCAs exhibit less variance across NCAs

23 Conclusions ●Conclusions ●There are no fundamental differences in performance for typical communication patterns between source and destination modulo-based algorithms ●Modulo-based algorithms present an intrinsic flaw for slimmed trees ●Non-balanced distribution of routes per NCA can lead to increased network contention ●A hybrid approach (randomly selecting NCAs that become “endpoint-contention” concentrators) helps and could be used as a better oblivious approach for both non-slimmed and slimmed networks.

24 THANKS HPIDC’09

25 Q & A

26 Q & A

27

28 Routing in XGFTs ●Selecting a link up-wards further limits the choice of links a the upper levels. ●In pink: the switches that can be visited after selecting the first leftmost parent of level 1 and the second leftmost link up of level

29 XGFTs I ● Superclass of Fat Tree topologies: ● XGFT( h ; m 1,..., m h ; w 1,..., w h )‏ ● h is the height of the tree. ● m i is the number of children per node at level i. ● w i is the number of parents per node at level i. XGFT(1;4,1) ‏ XGFT(1;4,2) ‏ XGFT(1;4,3) ‏ XGFT(1;4,4) ‏ 4-ary tree 4-ary 1-tree

30 Random Routing I ●The assignments of links to reach an NCA is totally random ●Idea: a random distribution should equally distribute the probability of having contention ●Drawback I: Suboptimal link assignment given a pattern

31 Random Routing II ●Drawback II ●Even a single conflict halves performance Links, 2 conflicts for 3 pairs of nodes6 Links, No conflicts 22

32 Coupled effects Topology Routing Communication Pattern Mapping Performance Contention Results