Download presentation
Presentation is loading. Please wait.
Published byAneta Švecová Modified over 6 years ago
1
Cluster Load Balancing for Fine-grain Network Services
Shen, Yang, and Chu 1/3/2019 Cluster Load Balancing for Fine-grain Network Services Kai Shen, Tao Yang, and Lingkun Chu Department of Computer Science University of California at Santa Barbara IPDPS 2002
2
Cluster-based Network Services
Shen, Yang, and Chu 1/3/2019 Cluster-based Network Services Emerging deployment of large-scale complex clustered services. Google: 150M searches per day; index of more than 2B pages; thousands of Linux servers. Teoma search (powering Ask Jeeves search): a Sun/Solaris cluster of hundreds of processors. Web portals: Yahoo!, MSN, AOL, etc. Key requirements: availability and scalability. 1/3/2019 IPDPS 2002 IPDPS 2002
3
Architecture of a Clustered Service: Search Engine
Shen, Yang, and Chu 1/3/2019 Architecture of a Clustered Service: Search Engine Index servers (partition 1) Firewall/ Web switch Local-area network Index servers (partition 2) Web server/ Query handlers Doc servers 1/3/2019 IPDPS 2002 IPDPS 2002
4
“Neptune” Project http://www.cs.ucsb.edu/projects/neptune
Shen, Yang, and Chu 1/3/2019 “Neptune” Project A scalable cluster-based software infrastructure to shield clustering complexities from service authors. Scalable clustering architecture with load-balancing support. Integrated resource management. Service replication – replica consistency and performance scalability. Deployment: At Internet search engine Teoma for more than a year. Serve Ask Jeeves search since December (Serving 6-7M searches per day as of January 2002.) 1/3/2019 IPDPS 2002 IPDPS 2002
5
Neptune Clustering Architecture – Inside a Node
Shen, Yang, and Chu 1/3/2019 Neptune Clustering Architecture – Inside a Node Network to the rest of the cluster Service Access Point Service Load-balancing Subsystem Service Availability Directory Publishing Subsystem Service Consumers Service Runtime Services 1/3/2019 IPDPS 2002 IPDPS 2002
6
Cluster Load Balancing
Shen, Yang, and Chu 1/3/2019 Cluster Load Balancing Design goals: Scalability – scalable performance; non-scaling overhead. Availability – no centralized node/component. For fine-grain services: Already widespread. Additional challenges: Severe system state fluctuation more sensitive to load information delay. More frequent service requests low per-request load balancing overhead. 1/3/2019 IPDPS 2002 IPDPS 2002
7
Shen, Yang, and Chu 1/3/2019 Evaluation Traces Traces of two service cluster components from Internet search engine Teoma; collected during one-week of July 2001; the peak-time portion is used. Traces Number of requests Arrival interval Service time Total Peak Mean Std-dev MediumGrain 1,055,359 126,518 341.5ms 321.1ms 208.9ms 62.9ms FineGrain 1,171,838 98,933 436.7ms 349.4ms 22.2ms 10.0ms 1/3/2019 IPDPS 2002 IPDPS 2002
8
Broadcast Policy Broadcast policy:
Shen, Yang, and Chu 1/3/2019 Broadcast Policy Broadcast policy: An agent at each node collects the local load index and broadcasts it at various intervals. Another agent listens to broadcasts from other nodes and maintains a directory locally. Each service request is directed to the node with lightest load index in the local directory. Load index – number of active service requests. Advantages: Require no centralized component; Very low per-request overhead. 1/3/2019 IPDPS 2002 IPDPS 2002
9
Broadcast Policy with Varying Broadcast Frequency (16-node)
Shen, Yang, and Chu 1/3/2019 Broadcast Policy with Varying Broadcast Frequency (16-node) Mean response time (norm. to Cent.) Mean response time (norm. to Cent.) <A> server 50% busy <B> server 90% busy 10 10 MediumGrain 8 MediumGrain 8 FineGrain FineGrain Centralized Centralized 6 6 4 4 2 2 Mean response time (norm. to IDEAL) 31.25 62.5 125 250 500 1000 Mean response time (norm. to IDEAL) 31.25 62.5 125 250 500 1000 Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean broadcast interval (in ms) Mean broadcast interval (in ms) Too much dependent on frequent broadcasts for fine-grain services at high load. Reasons: load index staleness, flocking effect. 1/3/2019 IPDPS 2002 IPDPS 2002
10
Shen, Yang, and Chu 1/3/2019 Random Polling Policy For each service request, a polling agent on the service consumer node randomly polls a certain number (poll size) of service nodes for load information; picks the node responding with the lightest load. Random polling with a small poll size. Require no centralized components; Per-request overhead is limited by the poll size; Small load information delay due to just-in-time polling. 1/3/2019 IPDPS 2002 IPDPS 2002
11
Is a Small Poll Size Enough?
Service nodes are kept 90% busy in average Shen, Yang, and Chu 1/3/2019 Is a Small Poll Size Enough? <A> MediumGrain trace <B> FineGrain trace 1000 Random 100 Mean response time (in ms) Polling 2 Mean response time (in ms) Polling 3 800 80 Polling 4 Centralized 600 60 400 40 200 20 Mean response time (in milliseconds) Mean response time (in milliseconds) 50 100 50 100 Number of service nodes Number of service nodes In principle, it matches the analytical results on the supermarket model. [Mitzenmacher96] 1/3/2019 IPDPS 2002 IPDPS 2002
12
System Implementation of Random Polling Policies
Shen, Yang, and Chu 1/3/2019 System Implementation of Random Polling Policies Configurations: 30 dual-processor Linux servers connected by a fast Ethernet switch. Implementation: Service availability announcements made through IP multicast; Application-level services are loaded into Neptune runtime module as DLLs; run as threads; For each service request, polls are made concurrently in UDP. 1/3/2019 IPDPS 2002 IPDPS 2002
13
Experimental Evaluation of Random Polling Policy (16-node)
Shen, Yang, and Chu 1/3/2019 Experimental Evaluation of Random Polling Policy (16-node) <A> MediumGrain trace <B> FineGrain trace 700 Random Random Mean response time (in ms) 80 600 Polling 2 Mean response time (in ms) Polling 2 Polling 3 Polling 3 500 Polling 4 60 Polling 4 Polling 8 Polling 8 400 Centralized Centralized 300 40 200 20 100 Mean response time (in milliseconds) 50% 60% 70% 80% 90% Mean response time (in milliseconds) 50% 60% 70% 80% 90% Mean response time (in milliseconds) Mean response time (in milliseconds) Server load level Server load level For FineGrain trace, large polling size performs even worse due to excessive polling overhead and long polling delay. 1/3/2019 IPDPS 2002 IPDPS 2002
14
Discarding Slow-responding Polls
Shen, Yang, and Chu 1/3/2019 Discarding Slow-responding Polls Polling delay with a poll size of 3: 290us polling delay when service nodes are idle. In a typical run when service nodes are 90% busy: Mean polling delay – 3ms; 8.1% polls are not returned in 10ms. Significant for fine-grain services (service time in tens of ms) Discarding slow-responding polls – shortens the polling delay. 8.3% reduction in mean response time. 1/3/2019 IPDPS 2002 IPDPS 2002
15
Shen, Yang, and Chu 1/3/2019 Related Work Clustering middleware and distributed systems – Neptune, WebLogic/Tuxedo, COM/DCOM, MOSIX, TACC, MultiSpace. HTTP switching – Alteon, ArrowPoint, Foundry, Network Dispatcher. Load-balancing for distributed systems – [Mitzenmacher96], [Goswami93], [Kunz91], MOSIX, [Zhou88], [Eager86], [Ferrari85]. Low-latency network architecture – VIA, InfiniBand. 1/3/2019 IPDPS 2002 IPDPS 2002
16
Conclusions http://www.cs.ucsb.edu/projects/neptune
Shen, Yang, and Chu 1/3/2019 Conclusions Random-polling based load balancing policies are well-suited for fine-grain network services. A small poll size provides sufficient information for load balancing; while an excessively large poll size may even degrade the performance. Discarding slow-responding polls can further improve system performance. 1/3/2019 IPDPS 2002 IPDPS 2002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.