Download presentation
Presentation is loading. Please wait.
Published byErick Price Modified over 9 years ago
1
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University ASPLOS 1998 *** Figures adapted from original presentation ***
2
Time Warp to 1998 Rapid Internet growth Bandwidth limitations “Cheap” PCs and “fast” LANs Need for increased throughput 2
3
Clustered Servers Back-End Node Clien t 3
4
Weighted Round Robin (WRR) 4
5
Pure Locality-Based Distribution 5
6
Motivation for Change Weighted Round Robin Disregards content on back-end nodes Many cache misses Limited by disk performance Pure Locality-Based Distribution Disregards current load on back-end nodes Uneven load distribution Inefficient use of resources 6
7
LARD Concepts L ocality- A ware R equest D istribution Goal: improve performance Higher throughput Higher cache hit rates Reduced disk access Even load distribution + content-based distribution The best of both algorithms 7
8
Outline Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing 8
9
Outline Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing 9
10
Basic LARD Algorithm Front-end maps target content to back-end nodes 1-to-1 mapping First request for each target is assigned to the least-loaded back-end node Subsequent requests are distributed to the same back-end node based on target content mapping Unless overloaded… Re-assigns target content to a new back-end node 10
11
Flow of Basic LARD Client 11
12
Determining Load in Basic LARD Ask the server? Introduces unnecessary communication Current load = number of open connections Tracked in the front-end node Use thresholds to determine when to re-balance Low, High, and Limit Re-balance when (load > T limit ) or (load > T high and there is a “free” node with load < T low ) 12
13
Outline Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing 13
14
LARD Needs Improvement Only one back-end node per target content Working set is a single node Front-end must limit total connections Still need to increase throughput One node per content type is unrealistic …add more back-end nodes? 14
15
LARD/R LARD with R eplication Maps target content to a set of back-end nodes Working set is several nodes with similar cache content Sends new requests to least-loaded node in set Moves nodes to/from sets based on load imbalance Idle nodes in a low-load set are moved to higher-load set 15
16
Flow of LARD/R Client 16
17
LARD Outline Basic LARD Algorithm Improvements to LARD Request Handoff Protocol Simulation and Results Prototype Implementation and Testing 17
18
Determining Content Type How do we determine content in the front-end? Front-end must see network traffic Standard TCP Assumptions Requests are small and light Responses are big and heavy How do we forward requests? 18
19
Potential TCP Solutions Simple TCP Proxy Everything must flow through front-end node Can inspect all incoming content Cannot respond directly from back-end to client But front-end can also inspect all outgoing content Better for persistent connections 19
20
TCP Connection Handoff Front-end connects to client Inspects content Forwards request to back-end node Returned directly back to client from back-end node 20
21
LARD Outline Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing 21
22
Evaluation Goals Throughput Requests/second served by entire cluster Hit rate (Requests that hit memory cache) / (total requests) Underutilization time Time that a node’s load is ≤ 40% of T low 22
23
Simulation Model 300MHz Pentium II 32MB Memory (cache) 100Mbps Ethernet Traces from web servers at Rice and IBM 23
24
Simulation Results – Prior Work Weighted Round Robin Lowest throughput Highest cache miss ratio But lowest idle time Pure Locality-Based An increase in nodes decrease in cache miss ratio But idle time increases (unbalanced load) Only minor improvement over WRR 24
25
Simulation Results – LARD & LARD/R Throughput ~4x better (8 nodes) WRR would need nodes with a 10x larger cache size CPU bound after 8 nodes Cache miss rate decreases Only 1% idle time on average 25
26
Simulation Results – Throughput 26
27
Simulation Results – Cache Misses 27
28
Simulation Results – Idle Time 28
29
What Affects Performance? WRR is disk-bound, LARD/R is CPU bound Increasing CPU speed improves LARD/R, not WRR Adding more disks improves WRR, not LARD/R LARD/R shows no improvement if a node has > 2 disks WRR is not scalable 29
30
LARD Outline Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing 30
31
Prototype Implementation One front-end PC 300MHz Pentium II, 128MB RAM 6 back-end PCs 7 client PCs 166MHz Pentium Pro, 64MB RAM 100Mb Ethernet, 24-port switch 31
32
Prototype Testing Results 32
33
Evaluation Shortcomings What influences the results more? LARD/R protocol? TCP handoff protocol? 33
34
Conclusion LARD and LARD/R significantly better than WRR Higher throughput Better CPU utilization More frequent cache hits Reduced disk access Benefits of Locality-Based and Load-Balanced Scalable at low cost 34
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.