Download presentation
Presentation is loading. Please wait.
2
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement? 3. Simulation and Results --- How it works? 4. Conclusions
3
Introduction and Motivation Cluster-based network server system The front-end (dispatcher) -- responsible for request distribution The back-end nodes -- responsible for request processing. The front-end makes the distributed nature of the server transparent to the clients. Two major approaches for request distribution: 1) Aiming load balance distribution (for example WRR): All back-end nodes are considered equally capable of serving a given request with considering the current load information of the back-end nodes. Advantage: good load balancing among the back-ends. Disadvantage: Working set > the size of main memory, frequent cache misses. Not scale well to larger working sets Q: Working set ?
4
Introduction and Motivation 2) Aiming locality distribution (for example LARD): The front-end consider the service/content requested and the current load on the back- end nodes when deciding which back-end node should serve a given request. Advantages: (1) increased performance due to improved hit rates in the back-end's main memory caches (Adding nodes, increasing caches size); (2) increased secondary storage scalability due to the ability to partition the server's database over the different back-end nodes; (3) the ability to employ back-end nodes that are specialized for certain types of requests (e.g., audio and video). Disadvantage: The load between different back-ends might become unbalanced, resulting in worse performance. Objective: Building a LARD cluster is therefore to design a practical and efficient strategy that achieves load balancing and high cache hit rates on the back-ends. TCP handoff protocol that allows the front-end to hand off an established client connection to a back-end node, in a manner that is transparent to clients and is efficient enough not to render the front-end a bottleneck.
5
Strategies System a. The front-end keep track of open and closed connections, and it can use this information in making load balancing decisions. The outgoing data is sent directly from the back-ends to the clients. b. The front-end limits the number of outstanding requests at the back-ends. This approach allows the front-end more flexibility in responding to changing load on the back-ends. c. Any back-end node is capable of serving any target. d. The front-end use TCP hand off protocol: (1) a client connects to the front-end; (2) the dispatcher at the front-end accepts the connection and hands it off to a back- end using the handoff protocol, The dispatcher is a software module that implements the distribution policy, e.g. LARD. (3) the back-end takes over the established connection received by the hand off protocols; (4) the server at the back-end accepts the created connection (5) the server at the back-end sends replies directly to the client.
6
Strategies Basic LARD Always assigns a single back-end node to serve a given target, thus making the idealized assumption that a single target cannot by itself exceed the capacity of one node. 1) Algorithms: while (true) fetch next request r; if server[r.target]=null then s, server[r.target] {least loaded node}; else s server[r.target]; if (s.load > T high && Exist node with load = 2*T high then s, server[r.target] {least loaded node}; send r to s;
7
Strategies.Basic LARD 2) Consideration: (1) Load: number of active connections. T low : the load below which a back-end is likely to have idle resources. T high : the load above which a node is likely to cause substantial delay in serving requests. (2) Load imbalance Not want greatly diverging load values on different back-ends; Not want re-assign targets because of minor or temporary imbalance. (3) The front-end limits the total connections handed to all back-end nodes to the value S = (n-1)*T high +T low -1, where n is the number of back-end nodes. Setting S to this value ensures that at most n-2 nodes can have a load >= T high while no node has load < T low.(how) (4) The load difference between old and new targets is at least T high -T low. The max load imbalance that can arise is 2T high -T low. (why) (5) The setting for T low depends on the speed of the back-end nodes. choosing T high involves a tradeoff. T high -T low should be low enough to limit the delay variance among the back-ends to acceptable levels, but high enough to tolerate limited load imbalance without destroying locality. (how)
8
Strategies LARD with Replication A single target causes a back-end to go into an overload situation, we should assign several back-end nodes to serve that document, and to distribute requests for that target among the serving nodes. Algorithms: while (true) fetch next request r; if server[r.target]=null then s, serverSet[r.target] {least loaded node}; else s {least loaded node in serverSet[r.target]}; m {most loaded node in serverSet[r.target]}; if (s.load > T high && Exist node with load = 2*T high then p {least loaded node}; add p to serverSet[r.target]; s P; if |serverSet[r.target]| > 1 && time() – serverSet[r.target].lastMod > K then remove m from serverSet[r.target]; // the degree of replication send r to s; if serverSet[r.target] changed in this iteration then serverSet[r.target].lastMod time();
9
Simulation and Results Simulation model Each back-end node consists of a CPU and locally-attached disk(s). Each node maintains its own main memory cache of configurable size and replacement policy. Caching is performed on a whole-file based. Processing a request requires the following steps: connection establishment --> disk reads (if needed) --> target data transmission --> connection teardown. The input to the simulator is a stream of tokenized target requests, where each token represents a unique target being served.
10
Simulation and Results Simulation Output Throughput: The number of requests that were served per second by the entire cluster, calculated as the number of requests divided by the simulated time it took to finish serving all the requests. Cache hit ratio: The number of requests that hit in a back-end node's main memory cache divided by the number of requests. Node underutilization: The time that a node's load is less than 40% of T low. The overall throughput is the best summary metric, since it is affected by all factors; The cache hit rate gives an indication of how well locality is being maintained; The node underutilization times indicate how well load balancing is maintained.
11
Simulation and Results Simulation results strategies thoughput Cache miss ratio Idle time Weight round-robin (WRR) lowest highest lowest Locality-based (LB) lower higher highest Basic LARD (LARD) higher lowest lower LARD with replication (LARD/R) highest lowest lower (why) WRR cannot benefit from added CPU at all, since it is disk bound on this trace. LARD and LARD/R, on the other hand, can make use of the added CPU power, because their cache aggregation makes the system increasingly CPU bound as nodes are added to the system. With LARD/R, additional disks do not achieve any further benefit. This can be expected, as the increased cache, LARD/R causes a reduced dependence on disk speed. WRR, on the other hand, greatly benefits from multiple disks as its throughput is mainly bound by the performance of the disk subsystem. (how)
12
Conclusions Conclusions: LARD strategy can achieve high cache hit rates and good load balancing in a cluster server. The performance of LARD is better than WRR. To prevent the front-end as bottleneck, TCP handoff protocol is implemented on the front-end and back-end. Q: What is the limits of LARD system? How to improve? The next paper will discuss in details. Let’s go to next ……
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.