Feedback Request Strawman: Using.1Qau for Load Monitoring in DC Networks Mitch Gusat IBM ZRL, Aug. 28, 2008.

Slides:



Advertisements
Similar presentations
CSE 413: Computer Networks
Advertisements

Ch. 12 Routing in Switched Networks
Quality of Service CCDA Quick Reference.
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Advanced satellite infrastructures in future global Grid computing: network solutions to compensate delivery delay Blasco Bonito, Alberto Gotta and Raffaello.
Finishing Flows Quickly with Preemptive Scheduling
Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Data and Computer Communications
Advantage Century Telecommunication Corp. AIL: Actively Intelligent Link-Layer Handoff Guo-Yuan Mikko Wang
Spring 2000CS 4611 Introduction Outline Statistical Multiplexing Inter-Process Communication Network Architecture Performance Metrics.
EE 4272Spring, 2003 Chapter 12 Congestion in Data Networks Effect of Congestion Control  Ideal Performance  Practical Performance Congestion Control.
TELE202 Lecture 8 Congestion control 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »X.25 »Source: chapter 10 ¥This Lecture »Congestion control »Source:
William Stallings Data and Computer Communications 7 th Edition Chapter 13 Congestion in Data Networks.
Fundamentals of Computer Networks ECE 478/578
Traffic Shaping Why traffic shaping? Isochronous shaping
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
1 Updates on Backward Congestion Notification Davide Bergamasco Cisco Systems, Inc. IEEE 802 Plenary Meeting San Francisco, USA July.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Routing Strategies Fixed Routing
Internet Control Message Protocol (ICMP)
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
Teknik Routing Pertemuan 20 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
Energy-Efficient Design Some design issues in each protocol layer Design options for each layer in the protocol stack.
5/12/05CS118/Spring051 A Day in the Life of an HTTP Query 1.HTTP Brower application Socket interface 3.TCP 4.IP 5.Ethernet 2.DNS query 6.IP router 7.Running.
An Active Reliable Multicast Framework for the Grids M. Maimour & C. Pham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday,
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
A Scalable, Commodity Data Center Network Architecture.
Performance Management (Best Practices) REF: Document ID
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Distributed Process Implementation Hima Mandava. OUTLINE Logical Model Of Local And Remote Processes Application scenarios Remote Service Remote Execution.
Guide to TCP/IP, Third Edition
Data Transfer Case Study: TCP  Go-back N ARQ  32-bit sequence # indicates byte number in stream  transfers a byte stream, not fixed size user blocks.
Itrat Rasool Quadri ST ID COE-543 Wireless and Mobile Networks
Exaquantum/RDS Copyright © Yokogawa Electric Corporation 8 December, 2008 Remote Data Synchronization.
Performance Management (Best Practices) REF: Document ID
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
Mobile Routing protocols MANET
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 4. Active Monitoring Techniques.
Existing PBX Existing Phone Handsets Numbering Plan to digit Internal extensions 9 for an outside line 3 digits.
The Network of Information: Architecture and Applications SAIL – Scalable and Adaptable Internet Solutions Bengt Ahlgren et. al Presented by wshin.
Data and Computer Communications Circuit Switching and Packet Switching.
IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.
COP 4930 Computer Network Projects Summer C 2004 Prof. Roy B. Levow Lecture 3.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.
Packet Drop in HP Switches Guoming. Cause: packet based hashing in F10 LAG + HP switch buffer Assumption: link utilization 50% In the hashing, several.
1 IEEE Meeting July 19, 2006 Raj Jain Modeling of BCN V2.0 Jinjing Jiang and Raj Jain Washington University in Saint Louis Saint Louis, MO
Impact of memory size on ECM and E 2 CM Single-Hop High Degree Hotspot Cyriel Minkenberg & Mitch Gusat IBM Research GmbH, Zurich May 10, 2007.
Mitigating Routing Misbehavior in Mobile Ad Hoc Networks Sergio Marti, T.J. Giuli, Kevin.
Muhammad Niswar Graduate School of Information Science
1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.
Data Transfer Case Study: TCP  Go-back N ARQ  32-bit sequence # indicates byte number in stream  transfers a byte stream, not fixed size user blocks.
LECTURE 12 NET301 11/19/2015Lect NETWORK PERFORMANCE measures of service quality of a telecommunications product as seen by the customer Can.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Networks, Part 2 March 7, Networks End to End Layer  Build upon unreliable Network Layer  As needed, compensate for latency, ordering, data.
Improving Fault Tolerance in AODV Matthew J. Miller Jungmin So.
Week #8 OBJECTIVES Chapter #5. CHAPTER 5 Making Networks Work Two Networking Models –OSI OPEN SYSTEMS INTERCONNECTION PROPOSED BY ISO –INTERNATIONAL STANDARDS.
The Network Layer Congestion Control Algorithms & Quality-of-Service Chapter 5.
Corelite Architecture: Achieving Rated Weight Fairness
Connecting Network Components
Revisiting Ethernet: Plug-and-play made scalable and efficient
Multipath QUIC: Design and Evaluation
An NP-Based Router for the Open Network Lab Overview by JST
Dynamic Management for End-to-end IP QoS
Centralized Arbitration for Data Centers
Muhammad Niswar Graduate School of Information Science
Presentation transcript:

Feedback Request Strawman: Using.1Qau for Load Monitoring in DC Networks Mitch Gusat IBM ZRL, Aug. 28, 2008

2 What is Fb_Rq? On demand status info Problem: Monitoring, app-level (L4+) performance profiling, runtime load balancing, adaptive routing... Solution: Build on the investment in.1Qau-compliant switches => Deliver the full/available feedback to sources ! (before congestion arises) Benefits 1.Speed: L2 feedback 2.Accuracy: Q info is already known to CP for QCN Fb. Ship it to RP! 3.Communicate Fb up the L3-7 stack. Use Flow-/RP-ID (?).

3 Fb_Rq Basics Monitoring options 1.Proactive: RP-initiated => RP autonomosly issues Fb_Rq 2.[Reactive: CP-initiated => RP begins to ping after QCN CNM] 3.Single CP (reflect) vs. path (reflect reply & fwd request ) Fb 4.Stateless (anon.) vs. statefull CP (pings counted per FlowID)

4 Fb_Rq Strawman’s Steps 1.RP: injects Fb_Rq pkt w/ L2 flag and Seq./Flow/RP-ID 2.CP: receives Fb_Rq 1.sets Psample=1 (or disregards if busy or in “silent” mode) 2.dumps queue status info (see next) 3.sends Fb_Rp (CNM-like) back to originating SRC 4.optionally also forwards Fb_Rq if DST != local CP => path profiling (multi-pathing issues)

5 Extended Queue Status (EQS) 1.Prio, Qsize, Qeq, Qoff, Qdelta + options 2.PingCnt: # of pings (from any FlowID) RX-ed since the last change of q’ sign  marks one monotonic episode (of Q qrowth or drain)  If aggregate per CP, it can provide HSD/no-sharers (if RP maintains its own PingCnt) 3.TXCnt: # of pkts forwarded since the last change of q’ sign  as proxy hint for avg. service rate [additional info, e.g. pointer to a complete CP “brain dump”]

IBM Research GmbH, Zurich 6 Reactive Fb_Rq Operation (animated) Reactive probing is triggered by QCN frames  hence only rate-limited flows are probed  Insert one Fb_Rq ping every n KB of data sent per flow, e.g. n = 750 KB  Single CP probing: CPID of probes = destination MAC Pro-active probing needs no CNM, but n should be based on actual load and delay Switch 2 Switch 1 Switch 3 CNM Fb_Rq 1.Q eq exceeded 2.Send QCN Fb to RP src dst 1.Fb_Rq CP 2.Insert EQS data 3.Return Fb_Rq to source 1.Initial CNM RP 2.Install RL 3.Inject Fb_RQ 1.Fb_Rq reply RP 2.EQS feedback is extracted and sent up to L3

IBM Research GmbH, Zurich 7 Conclusion Q: What is being enabled? A: Anticipate overload “see it coming” Potential for *early* custom response to congestion thru application specific logic:  e.g. app-driven adaptive routing,  task migration,  collectives (MPI mcast, combining ops, locks),  LB-ing engines,  scheduling hints: "optimize for latency" or "optimize for throughput“,  control of new session admittance (postpone a bkup after the trading rush),  redundant data placement, ...

8 BKUP

IBM Research GmbH, Zurich 9 Why bother about Q’? Delay: queuing delay-dominated RTT destabilizes CM in large DC’s  additionally the RP delay further reduces RTT budget (see 21 st Aug. call) Oscillations  with quick On/Off congestion episodes false recoveries are frequent (presented in 2007)  when RTT > 0.5-1ms Qoff is (much) less significant than Qdelta Q’ provides additional info Luckily Q’ (aka Qdelta) is CP Q’ potential usage 1.Q’ marks monotonical periods: queue backlogging / draining 2.can provide HSD (N sharers) Fb 3.extends the state space {q,q’} for tighter RL RP 1.enables RTT compensation

10 Pkt. Format TBD...  Enhanced CNM: add EQS to CNM’s Fb (QCN)