G22.3250-001 Robert Grimm New York University Receiver Livelock.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.
The Network Layer Functions: Congestion Control
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Buffered Data Processing Procedure Version of Comments MG / CCSDS Fall Meeting 2012 Recap on Previous Discussions Queue overflow processing.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Eliminating Receive Livelock in an Interrupt-driven Kernel Jeffrey C. Mogul K. K. Ramakrishnan AT&T Bell Laboratories
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Device Management.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Evaluating System Performance in Gigabit Networks King Fahd University of Petroleum and Minerals (KFUPM) INFORMATION AND COMPUTER SCIENCE DEPARTMENT Dr.
1 Lecture 10: Uniprocessor Scheduling. 2 CPU Scheduling n The problem: scheduling the usage of a single processor among all the existing processes in.
FreeBSD Network Stack Performance Srinivas Krishnan University of North Carolina at Chapel Hill.
I/O Systems CSCI 444/544 Operating Systems Fall 2008.
Layer 2 Switch  Layer 2 Switching is hardware based.  Uses the host's Media Access Control (MAC) address.  Uses Application Specific Integrated Circuits.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
CPU Scheduling Chapter 6 Chapter 6.
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Oppenheimer.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Make Hosts Ready for Gigabit Networks. Hardware Requirement To allow a host to fully utilize Gbps bandwidth, its hardware system must be ready for Gbps.
Quality of Service Karrie Karahalios Spring 2007.
IBM OS/2 Warp Mike Storck Matt Kerster Mike Roe Patrick Caldwell.
1 Lecture 14 High-speed TCP connections Wraparound Keeping the pipeline full Estimating RTT Fairness of TCP congestion control Internet resource allocation.
Eliminating Receive Livelock in an Interrupt-Driven Kernel J. C. Mogul and K. K. Ramakrishnana Presented by I. Kim, 01/04/13.
Chapter 13: I/O Systems. 13.2/34 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware.
TCP Trunking: Design, Implementation and Performance H.T. Kung and S. Y. Wang.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Chapter 23 Congestion Control and Quality of Service.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
CONGESTION CONTROL.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Networks, Part 2 March 7, Networks End to End Layer  Build upon unreliable Network Layer  As needed, compensate for latency, ordering, data.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
UDP : User Datagram Protocol 백 일 우
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
An End-to-End Service Architecture r Provide assured service, premium service, and best effort service (RFC 2638) Assured service: provide reliable service.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Soft Timers : Efficient Microsecond Software Timer Support for Network Processing - Mohit Aron & Peter Druschel CS533 Winter 2007.
Chapter 13: I/O Systems.
Instructor Materials Chapter 6: Quality of Service
REAL-TIME OPERATING SYSTEMS
Multimedia Systems Operating System Presentation On
Module 12: I/O Systems I/O hardware Application I/O Interface
QoS & Queuing Theory CS352.
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Congestion Control, Quality of Service, and Internetworking
Presented by Kristen Carlson Accardi
Transport Protocols over Circuits/VCs
CS 286 Computer Organization and Architecture
Real-time Software Design
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 6: Quality of Service Connecting Networks.
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
IT351: Mobile & Wireless Computing
Operating systems Process scheduling.
Congestion Control, Quality of Service, & Internetworking
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CS 143A Principles of Operating Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Module 12: I/O Systems I/O hardwared Application I/O Interface
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

G Robert Grimm New York University Receiver Livelock

Altogether Now: The Three Questions  What is the problem?  What is new or different?  What are the contributions and limitations?

Motivation  Interrupts work well when I/O events are rare  Think disk I/O  In comparison, polling is expensive  After all, CPU doesn’t really do anything when polling  To achieve same latency as with interrupts need to poll tens of thousands of times per second  But, the world has changed: It’s all about networking  Multimedia, host-based routing, network monitoring, NFS, multicast, broadcast all lead to higher interrupt rates  Once the interrupt rate is too high, system becomes overloaded and eventually makes no progress

Avoiding Receive Livelock  Hybrid design  Poll when triggered by interrupt  Interrupt only when polling is suspended  Result  Low latency under low loads  High throughput under high loads  Additional techniques  Drop packets early (those with the least investment)  Connect with scheduler (give resources to user tasks)

Requirements for Scheduling Network Tasks  Acceptable throughput  Keep up with Maximum Loss Free Receive Rate (MLFRR)  Keep transmitting as you keep receiving  Reasonable latency, low jitter  Avoid long queues  Fair allocation of resources  Continue to service management and control tasks  Overall system stability  Do not impact other systems on the network  Livelock may look like a link failure, lead to more control traffic

Interrupt-Driven Scheduling: Packet Arrival  Packet arrival signaled through an interrupt  Associated with fixed Interrupt Priority Level (IPL)  Handled by device driver  Placed into queue, dropped if queue is full  Protocol processing initiated by software interrupt  Associated with lower IPL  Packet processing may be batched  Driver processes many packets before returning  Gives absolute priority to incoming packets  But modern systems have larger network buffers, DMA

Interrupt-Driven Scheduling: Receive Livelock  If packets arrive too fast, system spends most time processing receiver interrupts  After all, they have absolute priority  No resources left to deliver packets to applications  After reaching MLFRR, throughput begins to fall again  Eventually reaches 0 (!)  But, doesn’t batching help?  Can increase MLFRR  But cannot, by itself, avoid livelock

Interrupt-Driven Scheduling: Impact of Overload  Packet delivery latency increases  Packets arriving in bursts are processed in bursts  Link-level processing (followed by queue)  Packet dispatch (followed by queue)  Scheduling of user process  Transmits may starve  Transmission performed at lower IPL than reception  Why do we need interrupts for transmission? Don’t we just write the data to the interface and say “transmit”?  But system is busy servicing packet arrivals

Better Scheduling  Limit interrupt arrival rate to prevent saturation  If internal queue is full, disable receive interrupts  For the entire system?  Re-enable interrupts once buffer space becomes available or after timeout  Track time spent in interrupt handler  If larger than specified fraction of total time, disable interrupts  Alternatively, sample CPU state on clock interrupts  When to use this alternative?  Why does it work?

Better Scheduling (cont.)  Use polling to provide fairness  Query all sources of packet events round-robin  Integrate with interrupts  Reflects duality of approaches  Polling works well for predictable behavior: overload  Interrupts work well for unpredictable behavior: regular load  Avoid preemption to ensure progress  Do most work at high IPL  Do hardly any work at high IPL  Integrates better with rest of kernel  Sets “service needed” flag and schedules polling thread  Gets rid of what?

Livelock in BSD-Based Routers: Experimental Setup  IP packet router built on Digital Unix (DEC OSF/1)  Bridges between two Ethernets  Runs on DECstation 3000/300  Slowest available Alpha host  Load generator sends 10,000 UDP packets  4 bytes of data per packet

Livelock in BSD-Based Routers: Unmodified Kernel  With screend, peak at 2000 psec, livelock at 6000 psec  Without, peak at 4700 psec, livelovk at 14,880 psec

Livelock in BSD-Based Routers: Unmodified Kernel in Detail  Packets only discarded after considerable processing

Livelock in BSD-Based Routers: The Modified Path Modified path  Where are packets dropped and how?  Why retain the transmit queue?

Forwarding Performance Without screend  Why do we need quotas?

Forwarding Performance With screend  Why is polling not enough?  What additional change is made?

Effect of Packet-Count Quotas Without screend With screend What causes the difference?

What About Other User-Space Tasks?  So far, they don’t get any cycles—Why?  Solution: Track cycles spent in polling thread  Disable input handling if over threshold

Diggin’ Real Deep: Kernel Traces For 3 Packet Burst  What’s wrong with this picture?  How did they fix the problem?

Another Application: Network Monitoring  What is different from the previous application?  Where are the MLFRR and the saturation point?

What Do You Think?