2/23/2004 Load Balancing February 23, 2004
2/23/2004 Assignments Work on Registrar Assignment
2/23/2004 Pieces of the Puzzle Load Balancing Servers Data Store
2/23/2004 Load Balancing Distribute load across multiple servers – why?
2/23/2004 Load Balancing Distribute load across multiple servers – why? –Service more requests per second –Reduce response time –Tolerate failures
2/23/2004 Simple Algorithm Round-robin –N servers –Request 1 goes to server 1 –Request 2 goes to server 2 –… –Request N goes to server N –Request N+1 goes to server 1
2/23/2004 Problems What are some problems with this strategy?
2/23/2004 Goals/Challenges Determining least loaded server Determining closest server Detecting and deflecting requests from failed server Transparency What about user state?
2/23/2004 Strategies DNS-based Dispatcher-based Server-based
2/23/2004 DNS: Domain Name System How do I identify a computer? Hostname –ID for a computer –Examples? IP Address –Why is an IP needed? –Examples? Mapping from hostname to IP address?
2/23/2004 DNS: Domain Name System Name servers –Store hostname to IP mapping –Organized in a hierarchy –Act as a distributed database Application-layer protocol –Defines communication between hosts and name servers
2/23/2004 Usage Scenario Which apps use DNS? HTTP –Browser extracts hostname –Sends hostname to DNS –DNS does lookup and returns IP address –Browser sends HTTP GET to IP address
2/23/2004 Simple DNS Example Host surf.eurecom.fr wants IP address of gaia.cs.umass.edu 1.contacts its local DNS server, dns.eurecom.fr 2.dns.eurecom.fr contacts root name server, if necessary 3.root name server contacts authoritative name server, dns.umass.edu, if necessary requesting host surf.eurecom.fr gaia.cs.umass.edu root name server authorititive name server dns.umass.edu local name server dns.eurecom.fr
2/23/2004 Strategy 1: DNS-based DNS server Web Servers
2/23/2004 Strategy 1: DNS-based Pros –Easy Cons –Simple algorithm may not distribute load best –Slow to deal with failures –Not transparent to client Caching may disrupt algorithm
2/23/2004 Strategy 2: Dispatcher-based Web Servers Load balancing switch
2/23/2004 Strategy 2: Dispatcher-based Pros –Transparent (NAT) –Reacts more quickly to failure –Apply more advanced scheduling algorithms Number of active connections Based on URI (/images, etc) Cons –Bottleneck/single point of failure –New piece of hardware
2/23/2004 Strategy 3: Server-based Web Servers DNS server
2/23/2004 Strategy 3: Server-based Pros –Reacts more quickly to failure –Apply more advanced scheduling algorithms Cons –Not transparent –Increased delay
2/23/2004 Data Replication/Partitioning Replication –All servers can serve all data Partitioning –Servers keep track of specific set of data In reality, both
2/23/2004 Example A-Z A-I J-R S-Z Replication Partitioning
2/23/2004 Consistency All copies of same data should be the same –A change to one means a change to all Performance/consistency tradeoff