Download presentation
Presentation is loading. Please wait.
Published byFrank Cameron Modified over 8 years ago
1
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture Group http://www.cs.rice.edu/CS/Architecture/
2
2 Anatomy of a Web Request Static content web server CPU Network Interface Main Memory Interconnect Request File Headers Request Headers File Headers File Network Request File Headers File Headers File Request File Headers Request 95 % utilization
3
3 Problem Inefficient use of local interconnect –Repeated transfers –Transfer every bit of data sent out to network Local interconnect bottleneck Transfer overhead exacerbates inefficiency –Overhead reduces available bandwidth –E.g. Peripheral Component Interconnect (PCI) 30 % transfer overhead
4
4 Solution Network interface data caching –Cache data in network interface –Reduces interconnect traffic –Software-controlled cache –Minimal changes to the operating system Prototype web server –Up to 57% reduction in PCI traffic –Up to 31% increase in server performance –Peak 1571 Mb/s of content throughput Breaks PCI bottleneck
5
5 Outline Background Network Interface Data Caching Implementation Experimental Prototype / Results Summary
6
6 Network Interface Cache Network Interface Data Cache Software-controlled cache in network interface CPU Main Memory Interconnect Request File Headers Request Headers File X Network
7
7 Web Traces Five web traces –Realistic working set / file distribution Berkeley computer science department IBM NASA Kennedy Space Center Rice computer science department 1998 World Cup
8
8 Content Locality Block cache with 4KB block size 8-16MB caches capture locality
9
9 Outline Background Network Interface Data Caching Implementation –OS modification / NIC API Experimental Prototype / Results Summary
10
10 Unmodified Operating System Transmit data flow Device Driver Network Stack File Page 1. Identify pages Page 2. Protocol processing Break into packets Packet Page Packet Page 3. Inform network interface Packet Page
11
11 Modified Operating System OS completely controls network interface data cache Minimal changes to the OS Device Driver Network Stack File Page 1. Identify pages (Unmodified) Cache Directory Page 2. Annotate (New step) Page 3. Protocol processing Break into packets (Unmodified) Packet Page Packet Page 4. Query directory (New step) Packet Page 5. Inform network interface (Unmodified) Packet Page
12
12 Operating System Modification Device Driver –Completely controls cache –Makes allocation/use/replacement decisions Cache directory (in device driver) –An entry is a tuple of file identifier offset within file file revision number flags –Sufficient to maintain cache coherence
13
13 Network Interface API Initialize Insert data into the cache Append data to a packet Append cached data to a packet TX Buffer Cache Main Memory Network Interface Inter- connect Append Append cached data
14
14 Outline Background Network Interface Data Caching Implementation Experimental Prototype / Results Summary
15
15 Prototype Server Athlon 2200+ processor, 2GB RAM 64-bit, 33 MHz PCI bus (2 Gb/s) Two Gigabit Ethernet NICs (4 Gb/s) –Based on programmable Tigon 2 controller –Firmware implements new API FreeBSD 4.6 –850 lines of new code/150 lines of kernel changes thttpd web server –High performance lightweight web server –Supports zero-copy sendfile
16
16 Results: PCI Traffic ~1260 Mb/s is limit! ~60 % Content traffic PCI saturated 60 % utilization 1198 Mb/s of HTTP content 30 % Overhead
17
17 Results: PCI Traffic Reduction Low temporal reuse Low PCI utilization Good temporal reuse CPU bottleneck 36-57 % reduction with four traces
18
18 Results: World Cup Temporal reuse (84 %) PCI utilization (69 %) 57 % traffic reduction 7% throughput increase 794 Mb/s w/o caching 849 Mb/s w/ caching CPU bottleneck
19
19 Results: Rice Temporal reuse (40 %) PCI utilization (91 %) 40 % traffic reduction 17% throughput increase 1126 Mb/s w/o caching 1322 Mb/s w/ caching Breaks PCI bottleneck
20
20 Results: NASA Temporal reuse (71 %) PCI utilization (95 %) 54 % traffic reduction 31% throughput increase 1198 Mb/s w/o caching 1571 Mb/s w/ caching Break PCI bottleneck
21
21 Summary Network interface data caching –Exploits web request locality –Network protocol independent –Interconnect architecture independent –Minimal changes to OS 36-57% reductions in PCI traffic 7-31% increase in server performance Peak 1571Mb/s of content throughput –Surpasses PCI bottleneck
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.