Download presentation
Presentation is loading. Please wait.
1
Performance Issues in WWW Servers
Erich Nahum, Tsipora Barzilai, Dilip D. Kandlur IEEE/ACM Transactions on Networking, Vol. 10, No.1, Feb, 2002 Hi, everyone. The paper I am going to present today is Performance Issues in WWW Servers. Presented By Kunhao Zhou
2
Motivation Problems of WWW Why we need improvement? Overloaded server
Congestion in network Poorly-behaved client Why we need improvement? Customer satisfaction Save money$$$ … I am pretty sure that every is very familiar with WWW, either use it everyday or actually know how the WWW server runs. There are many problems in existing WWW server. Because the dramatic increase of user, many servers are overload. For example, there are more than 75,000,000 search perform on Google everyday. Because the Internet is so crowed, so congestion happens quite often. Also, poorly- behaved client make this problem even worse. What can we get through improving the performance of WWW server? First, is the customer satisfaction, people may wait less time to get response. Second, save money. The same hardware can handle more request! Thus, eliminate the need to upgrade the Hardware!
3
Outline Overview WWW transactions WWW optimizations Tests Conclusion
This is the outline of the presentation.
4
Overview New socket functions Per-byte optimizations
acceptex(), transmitfile() in NT, send_file() in HPUX Per-byte optimizations Zero-copy(eliminating copies), eliminating checksums Per-connection optimizations Piggybacking FIN’s Delaying ACK In this paper, the authors evaluated these three approaches to improve the WWW performance. These new function provide the necessary semantic support to eliminate copies and checksums. Per-connection optimizations are some techniques to reduce the exchanged packets number in each connection.
5
Outline WWW transactions Overview WWW optimizations Tests Conclusion
Now, I will talk about the socket operation in a typical www transactions.
6
WWW Transactions Each transaction in WWW server include:
1. accept() new connection 2. getsockname() get peer name 3. read() the HTTP request 4. setsockopt() disable the Nagle algorithm 5. gettimeofday() determine the time 6. Parse the request 7. stat() obtain file status The Nagle algorithm here is to restrict sending the packet if the segment available to send is less than a full MTU size in order to reduce transmission of small packets. Therefore, we can improve network utilization. However, researchers have presented evidence that the Nagle algorithm should be disable to reduce the latency for HTTP persistence connection.
7
WWW Transactions (cont.)
8. open() the requested file 9. read() file descriptor 10. write() socket to send HTTP header 11. write() socket to send file 12. close() file 13. close() socket 14. write() log file
8
Outline Overview WWW transactions WWW optimizations Tests Conclusion
9
WWW Optimizations The new socket function acceptex() combines steps 1-3 together. Because a considerate percentage of user access the same file for a website, so we can cache those files and relevant information, and save computation on stap 6-8. The other new socket function send_file combines steps 9-11.
10
Outline Tests Test-bed Proposed socket function Per-byte optimizations
Overview WWW transactions WWW optimizations Tests Test-bed Proposed socket function Per-byte optimizations Per-connection optimizations Conclusion
11
Experiment Test-bed Hardware OS Web Server
IBM 43P RS/6000 with 128MB RAM, 200-MHz PowerPC 604e processor, 4-100mb Ethernet network cards. 1 Server, 3 clients. OS AIX 4.3.1 Web Server Flash-Poll Here is the test-bed the authors used. When choosing the web server, the authors want to find the best performance WWW server as baseline to show that their techniques can still make improvement. They authors compared the performance of Apache, the most popular WWW server in use, the Zeus, a commercial WWW server well-known for its performance, and Flash-Poll. They found out the Flash-Poll perform the best, and also, Flash-Poll is open-source, it is available from Rice University. Every test conducted here was compared to Flash-Poll’s.
12
Proposed Socket Functions
acceptex() First the authors test the new socket function acceptex(), it doesn’t improved a lot, because the acceptex() has a more complex kernel implementation. This doesn’t help. Minor improvement!
13
Proposed Socket Functions
send_file() (single_copy) The other new socket function. The performance actually degraded up to 18%. Because the network subsystem and the file system buffer are in separate address space, the copying is very expensive. And send_file() entail copying the file from file system to network subsystem. Performance degraded!
14
Per-Byte Optimizations
Zero-copy send_file() Cache mbuf If files are in cache, send the cache Separate buffer management from VM Eliminate checksum. AIX feature So, the author change the implementation of send_file() to include a caching mechanism within the kernel that is separate from VM system. He called it zero-copy version of send_file(). The cache is managed using Least-recently used (LRU) policy. Because, each data need to be read into the CPU calculated the checksum before it is sent. If we eliminate the checksum, the performance maybe improve even more. And the AIX provide such feature.
15
Per-Byte Optimizations
Zero-copy Here is the result. As we can see, the improvement is up to 80%. The bigger size the file, the more improvement we get. The bigger the file, the more we save on copying the file. Improved a lot!
16
Per-Byte Optimizations
Eliminate checksum Additional improvement when we eliminate checksum. The same as zero-copy, the bigger the file, the bigger improvement we get. Improved!
17
Per-Connection Optimizations
Now, I talk about per-connection optimizations. This is a packets exchange sequence taken fro TCP-dump. TCP-dump is a software that use to see the TCP packet content. As we can see packet 6 here is just a FIN bit, telling the client that the server are done sending data. If we can piggyback this FIN back in packet 5, as the right hand side sequence, we can save a packet transfer. FIN can be sent earlier
18
Per-Connection Optimizations
Combine the FIN With data. send_file() offers a close option Add option PRU_SEND_DISCONNECT to pru_usrreq(), allowing TCP layer send and close in one function call Save one packet transfer How can we do that? The send_file() function offer a close option, and the author add another option PRU_SEND_DISCONNECT to lower level of socket function to pru_ursreq(), which allow TCP layer send and close in one function call. Then we can piggyback the FIN bit.
19
Per-Connection Optimizations
Combine the FIN with data So, here is the result, the small file benefit more. And no change for large file. The reason for that is because for larger files, the data queued in the send buffer waiting for the congestion control window, and during the time, server close connection, and FIN is piggybacked. Throughput HTTP ops/sec Small transfer improved!
20
Per-Connection Optimizations
Let see the example again, notice, here packet 4 is redundant because the ACK bit is in packet 3. If we can delay this ACK, then we can same one more packet transfer. ACK can be delayed
21
Per-Connection Optimizations
TCP cumulative ACK Change tcp_input() How to do that. We can make use TCP’s cumulative ACK mechanism. Also, we need to change another low level TCP function tcp_input().
22
Per-Connection Optimizations
Delay ACK Here is the performance. The same as piggyback FIN bit, the small transfer benefit. Small transfer improved!
23
Total Performance Increase
Considerable improvement!
24
Outline Conclusion Overview WWW transactions WWW optimizations
Experiments Conclusion
25
Conclusion These features were implemented in AIX 4.3.2
New socket functions Little increase Per-byte optimizations Big improvement Per-connection optimizations These features were implemented in AIX 4.3.2 Now it is conclusion. In this paper, the author study three approach to improve the WWW server performance, and the per-byte optimization and per-connection optimization improve the performance a lot. And as a result, these features are implemented in AIX Because these authors worked with IBM.
26
Thank You! That is it. Thank you everyone!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.