A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum & Minerals Dhahran Saudi Arabia April 14, 2003

Motivation CPU – Memory speed gap  CPU speed doubles in about 18 months (Moore’s Law)  Memory access time improves by only one-third in 10 years Hierarchical memory architecture introduced to alleviate CPU–memory speed gap  It works on locality of reference of data temporal locality spatial locality Network bandwidth has improved significantly  Gigabit per second already deployed on LAN NIC operates up to 10 Gbps Ethernet switches also available in that range

Motivation For applications utilizing these type of data, hierarchical memory architecture becomes ineffective Is it all applications that benefit from memory hierarchy?  some data have poor temporal locality (continuous data)  working set might be too large to fit into cache even if the data has good spatial locality  some data are never reused SO WHERE IS EXACTLY THE BOTTLENECK?

Memory Access Streaming Media Servers Streaming media content is a continuous data  working set is normally large, cannot fit into cache  it has very poor temporal locality (data reuse is poor) A typical scenario of streaming media transaction (RTSP) clientserver RTSP client server RTP client server RTCP

The transaction has:  stringent timing requirement  high bandwidth requirement  CPU intensive  high memory requirement Typical data flow in streaming using RTP

Memory Access Web Servers Memory Access Web content is normally a set of small files that make a web document  working set is normally composed of small files (average aggregate size is 10k)  poor temporal locality  little or no data reuse Web transaction (HTTP) clientserver HTTP

Typical data flow in HTTP transaction The transaction has:  relaxed timing requirement  but also high bandwidth requirement  high connection rate (as connections are established and torn within a short time – HTTP/1.0)

Memory Access IP Forwarding IP packets are generally small (maximum is 65536 bytes). Due to datagram fragmentation by routers, the packets are typically less than 15 KB (MTU issue).  packets are just forwarded, no data associated with any packet is reused.  apart from the need for high speed, no strict timing needs to be maintained  At high throughput, a lot of memory copying is involved: moving a lot of data (IP headers) into cache for processing.

Typical data flow in IP forwarding

Server Platform Pentium 4 processor (2.0 GHz):  L1 cache 8 KB  L2 cache 512 KB Peripherals:  1 Gbps NIC  40 GB EIDE hard drive (Western Digital WD400)  Main memory: 256 MB Operating systems:  Linux Red Hat 7.2 (kernel 2.4.7-10)  Windows 2000 server Network (LAN):  1 Gbps layer II switch

Memory Transfer Test ECT (extended copy transfer) Characterizing the memory performance to observe what might be the impact of OS on memory performance Locality of reference:  temporal locality – varying working set size (block size)  spatial locality – varying access pattern (strides)

Performance of streaming media servers L1 C ache Performance L1 cache misses (56kpbs)L1 cache misses (300kbps) L1 cache misses are mostly influenced by number of streams Worst-case performance when the number of streams is high: 300kbps encoding rate and multiple media contents are requested by clients

Memory Performance and throughput Page fault rate (300kbps) Requests for unique media object does not incur much page faults since object can easily be served from memory Requests for multiple objects leads to high page fault rate since a lot of data blocks will have to be fetched from the disk High page fault rate leads to client’s timeout due to long delay Throughput (300kbps) Performance of streaming media servers

Performance of Web servers Number of transactions per second Throughput in Mbytes/sec Smaller files are transferred within short time, hence more connections are established and released at a high rate. For larger files, throughput is high even though, the transactions/sec is low (less connection made) Transactions and Throughput

Performance of Web servers L1 cache misses L2 cache misses L1 and L2 cache performance is poor when the document size is small. WHY?  Cache Performance

Performance of Web servers Page fault rate Latency Unlike a small file, a large file will have to be continuously fetched from disk, leading to more page faults Large files significantly increase the average latency of the server. As clients wait for too long, they may time out Page Fault and Latency

Performance of IP forwarding Experimental setup Routing (creating and updating of routing table) is done by ‘routed’ IP forwarding – Linux kernel space

Performance of IP forwarding Routing configuration 1-1 communication (simplex and duplex) Double 1-1 communication (simplex and duplex) 1-4 communication (simplex and duplex) Ring communication (simplex and duplex) 1 and 2 3 and 4 5 and 6 7 and 8

Performance of IP forwarding Bandwidth

Performance of IP forwarding Maximum bandwidth: 449 Mbps  at configuration 2 – only two NICs involved in router  CPU utilization (system) – mere 19.04%  context switching – 1312 (only two NICs switched)  Active page – 1006.48 (highest observed) Very small packet size (64 bytes) degrades performance. Accounts for highest context switching Fairly uniformly distributed active page figures indicates that memory activity is not very intensive.

Performance of IP forwarding Other metrics

Conclusion Streaming servers: Performance highly degraded due to cache misses and page faults. Uses continuous data with large working set and poor temporal locality (no data reuse) Web servers: Small working set does not help much as frequent connection setup and tear down degrades performance significantly When the document is large in size, the server delay becomes unacceptably high, leading to client timeout Large document size also leads to high page fault rate

IP forwarding: Conclusion Memory performance is not the main factor in the overall performance of IP forwarding in Linux kernel Context switching overhead is highly significant, and a key factor in performance degradation. The more interface involved in the forwarding of packets, the more the contention for resources (bus contention). All CPU activity (kernel space only) is below 100 %. If we resolve bus contention, we can obtain more throughput

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

Similar presentations

Presentation on theme: "A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

Similar presentations

Presentation on theme: "A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum."— Presentation transcript:

Similar presentations

About project

Feedback