1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.

Slides:



Advertisements
Similar presentations
Distributed Packet Rewriting and its Application to Scalable Server Architectures The 6 th IEEE International Conference on Network Protocol, Oct
Advertisements

The Effects of Wide-Area Conditions on WWW Server Performance Erich Nahum, Marcel Rosu, Srini Seshan, Jussara Almeida IBM T.J. Watson Research Center,
High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
SCTP v/s TCP – A Comparison of Transport Protocols for Web Traffic CS740 Project Presentation by N. Gupta, S. Kumar, R. Rajamani.
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
Flash: An efficient and portable Web server Authors: Vivek S. Pai, Peter Druschel, Willy Zwaenepoel Presented at the Usenix Technical Conference, June.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
Exploiting SCI in the MultiOS management system Ronan Cunniffe Brian Coghlan SCIEurope’ AUG-2000.
Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
LDU Parametrized Discrete-Time Multivariable MRAC and Application to A Web Cache System Ying Lu, Gang Tao and Tarek Abdelzaher University of Virginia.
Proxy Cache Leonid Romanovsky Olga Fomenko Winter 2003 Instructor: Konstantin Sinyuk.
Protocol Implementation An Engineering Approach to Computer Networking.
Understanding Factors That Influence Performance of a Web Server Presentation CS535 Project By Thiru.
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
TCP Servers: Offloading TCP/IP Processing in Internet Servers
Network Simulation Internet Technologies and Applications.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Network Server Performance and Scalability June 9, 2005 Scott Rixner Rice Computer Architecture Group
1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
5th e-VLBI Workshop, September 2006, Haystack Observatory 1 A Simulation model for e-VLBI traffic on network links in the Netherlands Julianne Sansa*
The NE010 iWARP Adapter Gary Montry Senior Scientist
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Srihari Makineni & Ravi Iyer Communications Technology Lab
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Network Emulation for the Study and Validation of Traffic Models, Congestion and Flow Control in TCP/IP Networks Cheryl Pope Lecturer Department of Computer.
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.
Critical Path Analysis of TCP Transactions Authors:Paul Barford (University of Wisconsin-Madison) Mark Crovella (University of Boston) Member, IEEE Source:IEEE/ACM.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
1. Introduction REU 2006-Packet Loss Distributions of TCP using Web100 Zoriel M. Salado, Mentors: Dr. Miguel A. Labrador and Cesar D. Guerrero 2. Methodology.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
CATNIP – Context Aware Transport/Network Internet Protocol Carey Williamson Qian Wu Department of Computer Science University of Calgary.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
1 COMP 431 Internet Services & Protocols HTTP Persistence & Web Caching Jasleen Kaur February 11, 2016.
Making the “Box” Transparent: System Call Performance as a First-class Result Yaoping Ruan, Vivek Pai Princeton University.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
SCTP v/s TCP – A Comparison of Transport Protocols for Web Traffic
Monkey See, Monkey Do A Tool for TCP Tracing and Replaying
Xen Network I/O Performance Analysis and Opportunities for Improvement
Performance Issues in WWW Servers
Presentation transcript:

1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering University of Notre Dame

2 Large web site  Complex design and interaction  Multiple tiers  Appliance  Web, app, & DB servers  Study performance of web server  Cached pages  Most testing  Simulated load  LAN environment  Our evaluation adds  Simulated WAN environment  Small MTU, BW limits, latency  Shows some optimization aren’t

3 Evaluating a web server  Three parts  Measuring the server  Loading the server  Supporting the server Net Server load Server demand Tiers 2&3

4 Two ways to load server  Synthetic load  Controlled  Reproducible  Flexible  Only as good as assumptions, mechanisms  Hard to replicate real world  Real-world load  Uncontrolled  Not reproducible (can use traces)  Accurate model of system  Hard to produce extreme or rare conditions  Discussion  Need both  Validate simulations with real-world tests Net

5 Loading the server  Our tests use synthetic load  Three load-generating tools  Micro-benchmarking tool  Requests a single object at a constant rate  Tests delivery of static, cached documents  Establishes base line Net

6 Modified SURGE  SURGE  Scalable URL reference generator  Barford & Crovella, U Boston  Emulates statistical distribution  Object & request size  Object popularity  Embedded object references  Temporal locality  Use idle periods  Modifications  Converted from process based to event based To increase number of clients  Server-throttling problem eliminated Net

7 Delays and limits  Emulate WAN parameters in a LAN  Network delays  Bandwidth limits  Modified kernel and protocol stack  Separate delay queue per TCP connection  Necessary for accurate emulation  More accurate than Dummynet & NISTnet (per interface) Net

8 Measuring a web server OS Network HTTP requestreply drivers TCP/IP Apache, TUX

9 Measuring a web server OS Network HTTP requestreply Measure utilization using HW performance counters

10 Test environment  OS: Linux  Node: (server & clients)  Pentium III, 650MHz  512MB main memory  NIC:  3COM 3C590  100 Mbps ethernet  Direct connect  Software:  Client: microbenchmarking, SURGE, delay/limits  Server: Apache, Tux  Warmed client  No cache misses Client Server NIC

11 Cost breakdown – file size, Apache Majority of time in interrupt (recv’g) But most data is sent. MTU = 536 bytes Delay = 200 ms BW = 56 Kbps Data send rate = 3MB/s

12 Cost breakdown - file size, TUX Twice data send rate as Apache. Essentially all cost in interrupts. MTU = 536 bytes Delay = 200 ms BW = 56 Kbps Data send rate = 6 MB/s

13 Apache versus TUX ApacheTUX Server send rate3.0 MB/s6.0 MB/s Packets rec’d / s573811,991 Packets sent / s615611,878 Interrupts / s748213,974 Concurrent connections

14 Cost breakdown vs. MTU Surge parameters Size = 10 KB Delay = 200 ms BW = 56 Kbps Data send rate = 6 MB/s

15 Effects of network delay Surge parameters MTU = 536 bytes Size = 10 KB BW = 56 Kbps Data send rate = 6 MB/s

16 Effects of bandwidth limits Surge parameters MTU = 536 bytes Size = 10 KB Delay = 200 ms Data send rate = 6 MB/s 20% decrease in overhead from 28kbps to infinity

17 Persistent connections Surge parameters MTU = 536 bytes Size = 10 KB Delay = 200 ms Size = 10 KB Data send rate = 6 MB/s 10% decrease going from 1 to 16 requests per connection

18 Copy and checksumming Surge parameters MTU = 536 bytes Size = 10 KB Delay = 200 ms Size = 10 KB Data send rate = 6 MB/s

19 Re-assess value of some optimizations  Copy & checksumming avoidance  LAN: % copy or 21-33% copy & 10-15% checksum  WAN: 10% combined  Select optimization  LAN: 28%  WAN: < 10%  Connection open/close avoidance (HTTP 1.1)  LAN: “greatly”, “significantly”  WAN: < 10%

20 Conclusion  Most processing in protocol stack and drivers  Small MTU size increases processing cost  Little effect from  Network delay  Bandwidth limitations  Persistent connections  End-user request latency depends  Primarily on connection bandwidth  Secondarily on network delay  Future  Dynamic & uncached pages  Add packet loss Work supported by IBM UPP & NSF CCR

21 End

22 Persistent connections - packets/s

23 Number of Packets vs. MTU

24 Web (HTTP) servers Apache  Largest install base  User space  Process-based model TUX  Niche server  Kernel space  Event-based model  Aggressive optimizations  Copy/checksum avoidance  Object, name caching

25 Measuring a web server OS Network HTTP requestreply

26 Interrupt coalescing  Decreases interrupt scheduling overhead  Interrupt every 2 ms