Scalable Kernel Performance for Internet Servers under Realistic Loads. Gaurav Banga, etc... Western Research Lab : Research Report 1998/06 (Proceedings.

Slides:



Advertisements
Similar presentations
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.
Advertisements

Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
CS533 Concepts of Operating Systems Jonathan Walpole.
CS 623 Lecture #9 Yen-Yu Chen Utku Irmak. Papers to be read Better operating system features for faster network servers.Better operating system features.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Sockets and concurrency Spring 2010, Recitation 3 Your Awesome TAs.
Behaviour and Performance of Interactive Multi-player Game Servers Ahmed Abdelkhalek, Angelos Bilas, and Andreas Moshovos.
Event-Driven Programming Vivek Pai Dec 5, GedankenBits  What does a raw bit cost?  IDE  40GB: $100  120GB: $180  32MB USB Pen: $38  FireWire:
Computer Science Scalability of Linux Event-Dispatch Mechanisms Abhishek Chandra University of Massachusetts Amherst David Mosberger Hewlett Packard Labs.
1 Data Communications and Networking Socket Programming Part II: Design of Server Software Reference: Internetworking with TCP/IP, Volume III Client-Server.
Client Server Design Alternatives© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid.
Operating Systems Operating System Support for Multimedia.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Understanding Factors That Influence Performance of a Web Server Presentation CS535 Project By Thiru.
TCP Servers: Offloading TCP/IP Processing in Internet Servers
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Apache Architecture. How do we measure performance? Benchmarks –Requests per Second –Bandwidth –Latency –Concurrency (Scalability)
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Flash An efficient and portable Web server. Today’s paper, FLASH Quite old (1999) Reading old papers gives us lessons We can see which solution among.
User Side Factors. Download Speed Download speed from a user’s side, is how long it takes a webpage to load, once requested. The measurement for time.
CORE KAIST EECS Computer Engineering Research Lab A General Purpose Proxy Filtering Mechanism Applied to the Mobile Environment Bruce Zenel Jupyung Lee.
A Novel Adaptive Distributed Load Balancing Strategy for Cluster CHENG Bin and JIN Hai Cluster.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Chapter 2 (PART 1) Light-Weight Process (Threads) Department of Computer Science Southern Illinois University Edwardsville Summer, 2004 Dr. Hiroshi Fujinoki.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
On the Performance of TCP Splicing for URL-aware Redirection Ariel Cohen, Sampath Rangarajan, and Hamilton Slye The 2 nd USENIX Symposium on Internet Technologies.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Testing… Testing… 1, 2, 3.x... Performance Testing of Pi on NT George Krc Mead Paper.
Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Peer-to-Peer Supported Cache System for File Transfer Joonbok Lee
“Load Testing Early and Often” By Donald Doane Presentation to the Rockville MDCFUG.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
Texas Forty-Two By: Nate Normandin CS 470. What is Texas Forty-Two? A Domino Game A way for people to learn the game A way for people to play others A.
Authors: Haowei Yuan and Patrick Crowley Publisher: 2013 Proceedings IEEE INFOCOM Presenter: Chia-Yi Chu Date: 2013/08/14 1.
Providing Differentiated Levels of Service in Web Content Hosting Jussara Almeida, etc... First Workshop on Internet Server Performance, 1998 Computer.
Analyzing the efficiency of Ajax Liang Zhou
VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,
NETWORKING BASICS.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Large-scale Virtualization in the Emulab Network Testbed Mike Hibler, Robert Ricci, Leigh Stoller Jonathon Duerig Shashi Guruprasad, Tim Stack, Kirk Webb,
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
LAIO: Lazy Asynchronous I/O For Event Driven Servers Khaled Elmeleegy Alan L. Cox.
Threads versus Events CSE451 Andrew Whitaker. This Class Threads vs. events is an ongoing debate  So, neat-and-tidy answers aren’t necessarily available.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
Introduction to Networking
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
Trickles :A stateless network stack for improved Scalability, Resilience, and Flexibility Alan Shieh,Andrew C.Myers,Emin Gun Sirer Dept. of Computer Science,Cornell.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Web Proxy Caching: The Devil is in the Details Ramon Caceres, Fred Douglis, Anja Feldmann Young-Ho Suh Network Computing Lab. KAIST Proceedings of the.
Providing Differentiated Levels of Service in Web Content Hosting J ussara Almeida, Mihaela Dabu, Anand Manikutty and Pei Cao First Workshop on Internet.
Web Caching File System Jonathan Ledlie Matt McCormick.
Studies of LHCb Trigger Readout Network Design Karol Hennessy University College Dublin Karol Hennessy University College Dublin.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
KAIST CORE LAB. Chul Lee Performance Issues in WWW Servers Erich Nahum, Tsipora Barzilai, and Dilip Kandlur IBM T.J Watson Research Center SIGMETRICS Feb.
Diskpool and cloud storage benchmarks used in IT-DSS
Chapter 4: Multithreaded Programming
Computer Systems Summary
Monkey See, Monkey Do A Tool for TCP Tracing and Replaying
Presented by: Eric Carty-Fickes
ColdFusion Performance Troubleshooting and Tuning
Mid Term review CSC345.
Communications & Computer Networks Resource Notes - Introduction
Presentation transcript:

Scalable Kernel Performance for Internet Servers under Realistic Loads. Gaurav Banga, etc... Western Research Lab : Research Report 1998/06 (Proceedings of the 1998 USENIX Annual Technical Conference) Computer Architecture Lab. CS Dept. KAIST 2000/11/ Kim, Sung-Wan

1/16 Contents Introduction Problems of select() & ufalloc() in event-driven servers Scalable select() & ufalloc() Experimental evaluation Performance of a live system Conclusions

2/16 Introduction Event-driven servers –A single thread manage all connections –Lower context-switching & synchronization overhead faster than a thread-per-connection or pre-forked system –But, perform poorly under real conditions select() & ufalloc() select() –Asynchronous I/O ufalloc() –Allocation of a new file descriptor for a process

3/16 Problems in select() & ufalloc() WAN environments –Larger round-trip time and packet losses than LAN environments –Many open connections select() –select() -> do_scan() -> selscan() -> soo_select() –select_wakeup() -> do_scan() -> selscan() -> soo_select() –soo_select() check to see if the condition is true Linear search for all opened socket ufalloc() –Single bitmap (first lower descriptor number) –Too cost

4/16 Environment Server –AlphaStation 500(400Mhz), 192 MB of main memory –Digital UNIX 4.0B –Squid , NetCache 3.1.2c-OSF Client –AlphaStation 500(333Mhz) –Digital UNIX 3.2C –S-Client Network –100Mbps FDDI Profiling –DCPI

5/16 CPU times in unmodified kernel

6/16 Scalable select() & ufalloc() select() –READY, INTERESTED, HINTS set –sowakeup() Records a hint in the HINTS sets of each of the threads in the referencing processes for which this socket is present in the INTERESTED set of the thread. ufalloc() –2-level bitmap Level 0 map Level 1 map INTERESTED new = SELECTING U INTERESTED old READY new = C (INTERESTED new ^ (!INTERESTED old U READY old U HINTS)) READY to_user = SELECTING ^ READY new

7/16 Experimental Evaluation - Scalability with respect to connection rate * 750 infinitely slow connections

8/16 Experimental Evaluation - Scalability with respect to connection rate

9/16 Experimental Evaluation - Scalability with respect to connection count

10/16 Performance of a live system Server –A Web proxy system at DEC –AlphaStation 500 (500 MHz), 512 MB of RAM –Running the system for an entire day –Proxy Squid NetCache

11/16 Performance of a live system - NetCache with caching disabled

12/16 Performance of a live system - NetCache with caching disabled

13/16 Performance of a live system - NetCache with caching enabled

14/16 Performance of a live system - NetCache with caching enabled

15/16 Performance of a live system - Squid with caching disabled

16/16 Performance of a live system - Squid with caching disabled

17/16 Performance of a live system - Squid with caching disabled

18/16 Conclusions WAN delays Linear scaling in the select() & ufalloc() –lead to excessive kernel CPU computation Scalable versions –improve the performance of Web servers and proxies

19/16 select(maxfd, &readfds, &writefds, …, …); 1008 for (i = 0; i < maxfd; i++) { 1009 /* Check each open socket for a handler. */ 1010 if (fd_table[i].read_handler) { 1011 if (fd_table[i].stall_until <= squid_curtime) { 1012 nfds++; 1013 FD_SET( i, &readfds); 1014 } 1015 } 1016 if (fd_table[i].write_handler) { 1017 nfds++; 1018 FD_SET(i, &writefds); 1019 } 1020 }