EE382C Final Project Crouching Tiger, Hidden Dragonfly

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Traffic Control and the Problem of Congestion within the Internet By Liz Brown and Nadine Sur.
Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Misbah Mubarak, Christopher D. Carothers
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
Optical Networks BM-UC Davis122 Part III Wide-Area (Wavelength-Routed) Optical Networks – 1.Virtual Topology Design 2.Wavelength Conversion 3.Control and.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
A Practical Approach to QoS Routing for Wireless Networks Teresa Tung, Zhanfeng Jia, Jean Walrand WiOpt 2005—Riva Del Garda.
1 Estimating Shared Congestion Among Internet Paths Weidong Cui, Sridhar Machiraju Randy H. Katz, Ion Stoica Electrical Engineering and Computer Science.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Reconfigurable Network Topologies at Rack Scale
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Dynamic Internet Congestion with Bursts Stefan Schmid Roger Wattenhofer Distributed Computing Group, ETH Zurich 13th International Conference On High Performance.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
Traffic Engineering Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
December 20, 2004MPLS: TE and Restoration1 MPLS: Traffic Engineering and Restoration Routing Zartash Afzal Uzmi Computer Science and Engineering Lahore.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Dynamic routing – QoS routing Load sensitive routing QoS routing.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Power saving technique for multi-hop ad hoc wireless networks.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Roadmap-Based End-to-End Traffic Engineering for Multi-hop Wireless Networks Mustafa O. Kilavuz Ahmet Soran Murat Yuksel University of Nevada Reno.
NOBEL WP5 Meeting Munich – 14 June 2005 WP5 Cost Study Group Author:Martin Wade (BT) Lead:Andrew Lord (BT) Relative Cost Analysis of Transparent & Opaque.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Wireless Access Research Congestion Avoidance in Source Routed Ad-Hoc Networks Bryan Hogan, Michael Barry, Ronan Skehill, Sean McGrath
Enhancing TCP Fairness in Ad Hoc Wireless Networks using Neighborhood RED Kaixin Xu, Mario Gerla UCLA Computer Science Department
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Draft-deoliveira-diff-te-preemption-02.txt J. C. de Oliveira, JP Vasseur, L. Chen, C. Scoglio Updates: –Co-author: JP Vasseur –New preemption criterion.
Wavelength-Routed Optical Networks: Linear Formulation, Resource Budgeting Tradeoffs, and a Reconfiguration Study Dhritiman Banergee and Biswanath Mukherjee,
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
William Stallings Data and Computer Communications
Congestion Control in Data Networks and Internets
How to Train your Dragonfly
Architecture and Algorithms for an IEEE 802
Presented by Tae-Seok Kim
Interconnection Networks: Topology
ECE 544: Traffic engineering (supplement)
Datacenter Interconnection Network Design
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Interconnection Network Design Lecture 14
COS 561: Advanced Computer Networks
Title: An Adaptive Queue Management Method for Congestion Avoidance in TCP/IP Networks Presented By: Frank Posluszny Vishal Phirke Matt Hartling 12/31/2018.
Route Metric Proposal Date: Authors: July 2007 Month Year
Dragonfly+: Low Cost Topology for scaling Datacenters
Horizon: Balancing TCP over multiple paths in wireless mesh networks
Presentation transcript:

EE382C Final Project Crouching Tiger, Hidden Dragonfly Alexander Neckar Camilo Moreno Matthew Murray Ziyad Abdel Khaleq

Outline Topology, consideration and layout Routing solution Mirroring and simulation Results and conclusion

Dragonfly Topology Fully-connected local groups Low hop count Fast access to global links

Dragonfly Topology Load balance: Endpoints/router >= global links per router ~All traffic is bound for other groups. BW should fit. Local links per router >= endpoints+global links ~All traffic needs to traverse local link before,after global. Adaptive Routing helps deal with adversarial traffic. As long as overall BW is sufficient And we have good backpressure

Considerations Costs Optical links drive cost Minimize number, good utilization Local links much cheaper Overprovisioning helps feed global links Physical layout fully-connected group size limit (5m cables)

Considerations Power Traffic Links dominate power Mostly limited in throughput by send window(RPC). some (RDMA) very large packets. hotspots. So... what?

Layout Considerations Maybe as many as 60 racks per group!

Layout Considerations Realistically, 34ish

Layout Considerations Maximize racks per group? routers on bottom slots, wire diagonally Actually not a constraint Balance / cost issues with very large groups. 100m optical cables ~70m square: 147 x 50 racks: >200K rack slots

Chips Channels: Chips size is perimeter-driven 5GB/s = 4 diff. Pairs @10Gb/s 1 optical cable 4 elec. cable pairs each direction Chips size is perimeter-driven buffers+crossbar are only a few mm2. High-radix requires large perimeter for I/O

Exploring options Lots of guesstimation!

Basic >114k nodes Balanced for uniform random TOPOLOGY 13x26x13 Cost 6.16M Power 68Kw Router Radix 51 Opt. Links 57291 Elect. Links 110175 Groups 339 Endpts/group 338

Cheaper, better? Fewer optical cables Overprovisioned in- group links 4% higher power TOPOLOGY 10x32x10 Cost 5.64M Power 70.7Kw Router Radix 51 Opt. Links 51360 Elect. Links 159216 Groups 321 Endpts/group 320

A little more savings 90% of normal global links Overprovisioned in- group links Even cheaper Any good? TOPOLOGY 10x34x9 Cost 5.22M Power 70.5Kw Router Radix 52 Opt. Links 46971 Elect. Links 172227 Groups 307 Endpts/group 340

What if...? Half the “necessary” global links Very overprovisioned in-group links Otherwise not 100K Almost half the price! TOPOLOGY 10x45x5 Cost 3.11M Power 65.9Kw Router Radix 59 Opt. Links 25425 Elect. Links 223740 Groups 226 Endpts/group 450

Improving Global Adaptive Routing I feel the need…the need for speed.

Challenges Quick congestion detection Quick and accurate return to minimal Tricks with credits, etc., can provide stiff backpressure How do we avoid incorrectly taking the non- minimal route?

Solution idea Use the rate of change of the queue to provide quick congestion detection and quick return to minimal Potential advantages: More accurate representation of network performance Rapid detection Potential problems: Sensitivity to burstiness

Our Work ROC = 0.99*prev_ROC + 0.01*cur_ROC Developed two new routing algorithms: Min_queue_rate < 2*nonmin_queue_rate || min_queue_rate < 0 Old algorithm || min_queue_rate < 0

Results 1024 nodes, 2*p = 2*h = a = 8, injection Uniform: Bad_dragon: 2% increase in average, 5% increase in max for both ROC and combo Bad_dragon: ROC = 69% ave. latency, 82% max Combo = 72% ave., 90% max

Bad Dragon Results

Simulation Challenge Booksim's cycle-accurate nature is at odds with simulating our very large system std::bad_alloc...

Solution: Slicing Do a fraction of the work and get all of the results! How do we not include components in our simulation and still effectively simulate the entire network?

Slicing idea 1: Scaledown A = 8, H = 2

Idea: Relationships

Forget about hotspots for a minute...

Slicing Idea 2: Mirroring

Routing

Mirroring with Hotspots

Results for Different topologies p/a/h p: Endpoints per switch a: Switches per group h: Global links per switch 100,000 nodes with “Project Traffic” Best from 10/32/10 @ 3.0277 Million Cycles

Simulation Results For 13 / 26 / 13

Simulation Results For 10 / 32 / 10

Simulation Results For 10 / 32 / 10 WITH 10 Hotspots

Other Simulation Results 16 / 28 / 8: Runtime 4,130,224 Average Latency 519.74 (too big) 10 / 45 / 5 (half global links) Runtime 4,190,192 Average latency 528.51

Conclusion ROC always wins in average latency and runtime cycles. At a small cost of additional power (4%) over the basic 13 / 26 / 13. We can get higher performance cheaper with the 10 / 32 / 10 topology. Simulated hotspots scenario is pessimistic, numbers are fine.

Questions