Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Management for Scalable Web Data Servers

Similar presentations


Presentation on theme: "Memory Management for Scalable Web Data Servers"— Presentation transcript:

1 Memory Management for Scalable Web Data Servers
S. Venkataraman, M. Livny, J. F. Naughton

2 Motivation Popular Web sites have heavy traffic 2 main bottlenecks
CPU overhead of servicing TCP/IP connections I/O of fetching the many pages serviced CPU solution: cluster of servers connected to one File Server If each node had own subset of web site, the cluster is succeptible to skew.

3 Motivation (continued)
Paper’s Goal: Develop buffer management techniques to best utilize the aggregate memory of the machines in the cluster to reduce I/O bottleneck.

4 Outline Web Server Architecture 3 Memory Management Techniques
Client-Server Duplicate Elimination Hybrid Performance Evaluation Discussion

5 Web Server Architecture
Cluster of cooperating servers connected by a fast network Each node has own disks and memory Each node runs the same copy of the server Round robin router distributes requests

6 Web Server Architecture: Part Deux
Primary server: Server where the client request is serviced. Owner: Server that manages a persistent data item. Owners maintain a directory of copies of the data pages they own in global memory. Paper considers algorithms for read-only workloads.

7 Memory Management Memory Hierarchy:
Primary server memory Owner memory memory at other servers disk Each request is broken into page-sized units. If Primary has page in memory, then done. Otherwise, ask Owner for it.

8 More on Memory Management
Owner gets another node to forward page if possible. Otherwise, it gets it from disk and keeps a copy in memory. The page in Owner memory is labeled as hated. Eviction Policy: hated pages are evicted first. When primary server receives a page, it must choose another page to evict. Three algorithms (no hated pages) : Client-Server Duplicate Elimination Hybrid

9 Client-Server An LRU list of pages is kept. Very simple.
Increases local hits. Lots of duplication possible.

10 Duplicate Elimination
Considers the cost difference between evicting a single page (singlet) and a duplicate page. Duplicate pages eliminated first since they are cheep to fetch. 2 LRU lists: singlets and duplicates Increases the percentage of database in memory. Main Drawback: hot duplicate page is replaced before a cold singlet. But how do we keep track of duplicates?

11 Keeping Track of Duplicates
When a page goes from singlet to duplicate: Happens during a forward, so this is trivial. (no additional messages) When a page goes from duplicate to singlet: Owner receives a message that a page was evicted. If only one copy remains, owner sends a message to that server. message can be piggy backed. (no additional messages)

12 Hybrids are Our Friends
Estimate performance impact of eviction on the next page reference. Consider both likelihood of reference and cost of re-access. Latency of fetching page p back into memory: C(p). cost of going to disk vs. cost of going to remote memory. Likelihood that page p will be accessed next: W(p) W(p) = 1 / (Elapsed time since last reference) Expected cost: E(p) = W(p) * C(p)

13 More on Hybrid Algorithm
Two LRU lists maintained just like Duplicate Elimination At eviction, heads of the two lists are compared. Replace page with lower Expected Cost

14 Simulation Model and Workload
8 nodes of 64 MB each. Message cost: 1ms/page Link Bandwidth 15MB / sec All files are the same size. Access frequency: Zipfian distribution. zipf parameter controls wide range of skews zipf of zero means access frequencies are uniform Pages of files are declustered among all servers.

15 And the survey says . . . Duplication is good at high skews, but bad at low skews At low skews (uniform access frequency): Duplicate elimination has good global memroy utilization. At high skews (zipf over 1) Client-Server keeps hot pages at all nodes in memory duplication is good Hybrid nears the performance of the better choice in both scenarios. Varying database size, using diverse file sizes, and more nodes gave similar results.

16 Discussion Web sites have predictable hit rates, can that be used somehow? Can we recycle evicted pages?


Download ppt "Memory Management for Scalable Web Data Servers"

Similar presentations


Ads by Google