1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.

Slides:



Advertisements
Similar presentations
Cost-Based Cache Replacement and Server Selection for Multimedia Proxy Across Wireless Internet Qian Zhang Zhe Xiang Wenwu Zhu Lixin Gao IEEE Transactions.
Advertisements

Virtual Memory: Page Replacement
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
1 Cache and Caching David Sands CS 147 Spring 08 Dr. Sin-Min Lee.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
Virtual Memory and I/O Mingsheng Hong. I/O Systems Major I/O Hardware Hard disks, network adaptors … Problems related with I/O Systems Various types of.
IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Drusche Willy and Zwaenepoel 산업공학과 조희권.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Chapter 7 Protocol Software On A Conventional Processor.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
CS 623 Lecture #9 Yen-Yu Chen Utku Irmak. Papers to be read Better operating system features for faster network servers.Better operating system features.
Improving IPC by Kernel Design Jochen Liedtke Shane Matthews Portland State University.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Figure 1.1 Interaction between applications and the operating system.
A Case for Delay-conscious Caching of Web Documents Peter Scheuermann, Junho Shim, Radek Vingralek Department of Electrical and Computer Engineering Northwestern.
1 I/O Management in Representative Operating Systems.
Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.
Maninder Kaur CACHE MEMORY 24-Nov
Graybox NFS Caching Proxy By: Paul Cychosz and Garrett Kolpin.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Toolbox for Dimensioning Windows Storage Systems Jalil Boukhobza, Claude Timsit 12/09/2006 Versailles Saint Quentin University.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
15 Maintaining a Web Site Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Accessing to Spatial Data in Mobile Environment Presented By Jekkin Shah.
PORTING A NETWORK CRYPTOGRAPHIC SERVICE TO THE RMC2000 : A CASE STUDY IN EMBEDDED SOFTWARE DEVELOPMENT.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
C-Hint: An Effective and Reliable Cache Management for RDMA- Accelerated Key-Value Stores Yandong Wang, Xiaoqiao Meng, Li Zhang, Jian Tan Presented by:
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Full and Para Virtualization
Project 4. “File System Implementation”
Hint-based Acceleration of Web Proxy Cache Daniela Rosu Arun Iyengar Daniel Dias IBM T.J.Watson Research Center Unversity of Yuan Ze,Syslab Mike Tien
UNIT-3 Performance Evaluation UNIT-3 IT2031. Web Server Hardware and Performance Evaluation Key question is whether a company should host their own Web.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Unit - I Real Time Operating System. Content : Operating System Concepts Real-Time Tasks Real-Time Systems Types of Real-Time Tasks Real-Time Operating.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
TensorFlow– A system for large-scale machine learning
Web Server Load Balancing/Scheduling
Performance directed energy management using BOS technique
Web Server Load Balancing/Scheduling
Memory Management for Scalable Web Data Servers
System Calls.
DISTRIBUTED COMPUTING
Outline Midterm results summary Distributed file systems – continued
Fast Communication and User Level Parallelism
Performance Issues in WWW Servers
CSE 542: Operating Systems
Presentation transcript:

1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99

2 Outline Introduction System Architecture Present Performance Measurements Present an Analysis of the SPECweb96 Conclusions

3 Introduction (1/5) The perform of Web Server is limited by 1. Copy several times across layers of software - the file system and the application - during transmission to O.S. kernel - at the device driver level 2. O.S. scheduler and interrupt processing  Too many overheads and add further inefficiencies

4 Introduction (2/5) One technique for Improving the Performance of Web Site  to cache frequently requested data at the site - Know as [Httpd Accelerators] or [Web Server Accelerators]

5 Introduction (3/5) - Httpd Accelerators differ from proxy caches -> Proxy Cache – speed up access to remote Web site -> Httpd Accelerator – speed up access to local Web site - It ’ s Possible both a proxy cache and an httpd accelerator

6 Introduction (4/5) Authors ’ Accelerator - Run under an [embedded O.S.] -> Serve up to 5000 pages/second from its cache on a 200 MHz PowerPC 604 -> High-performance - Highly optimized communications stack

7 Introduction (5/5) - Provides an API -> Allows application programs to explicitly add, delete, and update cached data. -> Allows dynamic Web pages be cached.

8 System Architecture (1/5)

9 System Architecture (2/5) The Cache Operates in One or a Combination of Two Modes: - Automatic Mode -> Cached automatic after cache misses. -> Webmaster set cache policy parameters. - Dynamic Mode -> Explicitly controlled by application programs. [ API functions ]

10 System Architecture (3/5) -> API for explicitly invalidating cached objects often makes it feasible to cache dynamic Web pages Use the Least Recently Used (LRU) algorithm for cache replacement

11 System Architecture (4/5) Key Software Elements 1. Packet queue on the system card memory 2. Without performing task scheduling, task switching, or interrupts 3. No data copying takes place 4. No buffer linking is necessary. This saves the overhead if buffer linking.

12 System Architecture (5/5) TCP Stack was Modified

13 Web Server Accelerator Performance (1/7)

14 Web Server Accelerator Performance (2/7)

15 Web Server Accelerator Performance (3/7) For a 200 MHz PowerPC 604 Processor, the theoretical capability would be 200,000,000 / 32,823 for an 8 Kbyte page which is 6093 requests per second

16 Web Server Accelerator Performance (4/7) WebStone clients / node a. Web Server b. Client Ran up to 100 WebStone clients

17 Web Server Accelerator Performance (5/7)

18 Web Server Accelerator Performance (6/7)

19 Web Server Accelerator Performance (7/7)

20 Accelerator Performance on SPECweb96 (1/4)

21 Accelerator Performance on SPECweb96 (2/4)

22 Accelerator Performance on SPECweb96 (3/4)

23 Accelerator Performance on SPECweb96 (4/4)

24 Conclusions Authors ’ presented the design, key issue in the implementation and the performance of a Web Accelerator. The Accelerator can provide high hit ratios and excellent performance for workloads similar to SPECweb96 benchmark.