Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
A Centralized Scheduling Algorithm based on Multi-path Routing in WiMax Mesh Network Yang Cao, Zhimin Liu and Yi Yang International Conference on Wireless.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
June 3, A New Multipath Routing Protocol for Ad Hoc Wireless Networks Amit Gupta and Amit Vyas.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Dynamic routing – QoS routing Load sensitive routing QoS routing.
Semester 4 - Chapter 3 – WAN Design Routers within WANs are connection points of a network. Routers determine the most appropriate route or path through.
A Transmission Control Scheme for Media Access in Sensor Networks Alec Woo, David Culler (University of California, Berkeley) Special thanks to Wei Ye.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
Wireless “ESP”: Using Sensors to Develop Better Network Protocols Hari Balakrishnan Lenin Ravindranath, Calvin Newport, Sam Madden M.I.T. CSAIL.
Storage area network and System area network (SAN)
Modeling and Evaluation of Fibre Channel Storage Area Networks Xavier Molero, Federico Silla, Vicente Santonia and Jose Duato.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
1 A Topology Control Approach to Using Directional Antennas in Wireless Mesh Networks Umesh Kumar, Himanshu Gupta and Samir R. Das Department of Computer.
Switching, routing, and flow control in interconnection networks.
Quasi Fat Trees for HPC Clouds and their Fault-Resilient Closed-Form Routing Technion - EE Department; *and Mellanox Technologies Eitan Zahavi* Isaac Keslassy.
Interconnect Network Topologies
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
Interconnect Networks
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
A Case Study in Understanding OSPFv2 and BGP4 Interactions Using Efficient Experiment Design David Bauer†, Murat Yuksel‡, Christopher Carothers† and Shivkumar.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Packet Dispersion in IEEE Wireless Networks Mingzhe Li, Mark Claypool and Bob Kinicki WPI Computer Science Department Worcester, MA 01609
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Energy-Efficient Randomized Switching for Maximizing Lifetime in Tree- Based Wireless Sensor Networks Sk Kajal Arefin Imon, Adnan Khan, Mario Di Francesco,
UNIT 2 LESSON 8 CS PRINCIPLES. UNIT 2 LESSON 8 OBJECTIVES Students will be able to: Describe how routers develop routing tables to determine how to send.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Doc.: IEEE /2200r2 Submission July 2007 Sandesh Goel, Marvell et alSlide 1 Route Metric Proposal Date: Authors:
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
A CASE STUDY IN USING MASSIVELY PARALLEL SIMULATION FOR EXTREME- SCALE TORUS NETWORK CO-DESIGN Misbah Mubarak, Rensselaer Polytechnic Institute Christopher.
Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)
Route Metric Proposal Date: Authors: July 2007 Month Year
Yiting Xia, T. S. Eugene Ng Rice University
Architecture and Algorithms for an IEEE 802
Interconnect Networks
Semester 4 - Chapter 3 – WAN Design
Traffic Engineering with AIMD in MPLS Networks
Storage area network and System area network (SAN)
Kevin Lee & Adam Piechowicz 10/10/2009
Jellyfish: Networking Data Centers Randomly
Route Metric Proposal Date: Authors: July 2007 Month Year
Dragonfly+: Low Cost Topology for scaling Datacenters
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
Presentation transcript:

Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations in interconnect topology and routing design is essential for future generation ultra-scale supercomputers. ‒ Current methods for evaluating topology and routing design are not ideal.

Department of Computer Science at Florida State Current methods for evaluating interconnect topology and routing design Topology and routing are evaluated separately Topology ‒ Diameter, bisection bandwidth, nodal degree, etc ‒ Not directly related to application level performance Routing with topology ‒ Simulation to get throughput and packet latency ‒ Limited network sizes and numbers of scenarios ‒ Simulation sees the tree, but not the forest. Two kinds of metrics: simple metrics that do not directly relate to performance and detailed metrics that are too expensive to obtain.

Department of Computer Science at Florida State Impact of evaluation methods Evaluation methods set the design optimization objective Recently proposals (dragonfly, jellyfish) all have large bisection bandwidth and support certain traffic patterns effectively. –Think of how the designs are justified!! ‒ Excellently designs with traditional metrics. ‒ Are these designs good for typical HPC workloads? ‒ There is no metric that can be used to compare across different topology and routing designs for HPC workloads.

Department of Computer Science at Florida State What kind of metrics are we looking for? Desirable properties: o Reflect overall network performance o Simple enough that it can be computed quickly – we do not want to do simulation. A related attempt -- effective bisection bandwidth: summarize network performance by the average performance for all bisection communication patterns. ‒ Is this metric reflective?

Department of Computer Science at Florida State LFTI: LANL-FSU throughput indices A metric for throughput performance High level ideas −Use modeling the obtain the average throughput for one communication pattern. −Find the set of representative communication patterns to be used in the metrics ‒ Summary the overall network performance using the average throughput performance for a large number of communication patterns common to HPC applications

Department of Computer Science at Florida State LFTI: LANL-FSU throughput indices High level ideas ‒ Once the patterns to be included is determined, LFTI can be derived from most topology and routing specifications without detailed simulation. If an interconnect can achieve high overall performance for many common HPC patterns, it is likely that it will provide high performance for HPC workloads. −Unlike some other metrics, LFTI is much harder to cheat.

Department of Computer Science at Florida State LFTI: LANL-FSU throughput index LFTI is the summary of the throughput of an interconnect for a large number of common communication patterns in HPC applications. ‒ For each communication pattern, a metric (sustained throughput) is used that is closely related to the application level performance for that pattern to quantify the performance of the interconnect. ‒ For a class of patterns (e.g. 2DNN patterns), the expected sustained throughput is used to quantify the performance. ‒ LFTI is the aggregate of the performance of many classes of patterns.

Department of Computer Science at Florida State Computing the sustained throughput for a pattern (single path routing) Compute the link load (number of flows going through each link) The sustained throughput for each flow is its share of the throughput on the bottleneck link or Max-Min fairness. The sustained throughput for the pattern is the aggregate throughput of all flows in the pattern. ‒ Normalized with per flow throughput divided by the input link bandwidth.

Department of Computer Science at Florida State Computing the throughput index for a class of patterns A throughput index for a class of patterns (e.g. 2DNN patterns) is the expected sustained throughput across all patterns of that class. ‒ The index can be obtained by randomly sampling of a large number of patterns (e.g patterns) ‒ May apply some statistical method to obtain the index with confidence without sampling a large number of patterns.

Department of Computer Science at Florida State Communication Patterns in LFTI indices ‒ Patterns with history ‒ All to all, ‒ Bisect – effective bisection bandwidth ‒ Low-dimensional stencil patterns 2DNN, 2DNN_DIAG, 3DNN, 3DNN_DIAG ‒ Random patterns – for applications with unstructure mesh, adaptive mesh refinement methods RANDOM 50, RANDOM N50 ‒ Commonly used sub-communication patterns Permutation, shift

Department of Computer Science at Florida State LFTI categories Trying to reflect how the machine is used Whole system direct map LFTI Whole system random map LFTI Job allocation trace-based LFTI Largest job based on some job traces

Department of Computer Science at Florida State Evaluating interconnect using LFTI Fat-tree (ftree), dragonfly (dfly), hypercube(hcube) 6D torus (6D), 3D torus (3D), jellyfish (jfish) of 25K-35K nodes – the size of the next generation supercomputer.

Department of Computer Science at Florida State Throughput index and communication time

Department of Computer Science at Florida State Whole system direct map LFTI

Department of Computer Science at Florida State Whole system direct map LFTI

Department of Computer Science at Florida State Whole system random map LFTI

Department of Computer Science at Florida State Whole system random map LFTI

Department of Computer Science at Florida State Job allocation based

Department of Computer Science at Florida State Job allocation based

Department of Computer Science at Florida State LFTI summary

Department of Computer Science at Florida State Conclusion Traditional performance metrics such as bisection bandwidth and effective bisection bandwidth are not indicative for interconnect’s performance. Optimizing for BB and EBB may not lead to high performance interconnects. LFTI is indicative of application level performance, yet can be derived rapidly without detailed simulation. ‒ It is a much better metric than the current metrics.

Department of Computer Science at Florida State LFTI weakness Communication patterns and weights ─Heavily concentrating on simulation types of applications ─Not much for data intensive applications ─Calls for performance characterization work ─To find the truly “representative” workload to be included in the index.

Department of Computer Science at Florida State LFTI weakness LFTI relies on fast modeling of throughput performance from each communication patterns o Depending on the routing algorithm, the modeling can be problematic Indirect adaptive routing is an example – no effective model method than simulation. o Needs to develop new models for all existing and future routing schemes, and whatever can affect the “sustained throughput”