CERN DNS Load Balancing Vladimír Bahyl IT-FIO. 26 November 2007WLCG Service Reliability Workshop2 Outline  Problem description and possible solutions.

Slides:



Advertisements
Similar presentations
Windows® Deployment Services
Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Database Architectures and the Web
Sergei Komarov. DNS  Mechanism for IP hostname resolution  Globally distributed database  Hierarchical structure  Comprised of three components.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 6 Managing and Administering DNS in Windows Server 2008.
© 2005 Global Knowledge Network, Inc. All rights reserved. Section 2: High Availability Clustering Network Load Balancing Geographical Clustering Remote.
ITIS 3110 Jason Watson. Replication methods o Primary/Backup o Master/Slave o Multi-master Load-balancing methods o DNS Round-Robin o Reverse Proxy.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
Distributed components
Dr. Zahid Anwar. Simplified Architecture of Linux Cluster Simplified Architecture of a Single Computer Simplified architecture of an enterprise cluster.
Module 8: Concepts of a Network Load Balancing Cluster
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Lesson 20 – OTHER WINDOWS 2000 SERVER SERVICES. DHCP server DNS RAS and RRAS Internet Information Server Cluster services Windows terminal services OVERVIEW.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 5 Introduction to DNS in Windows Server 2008.
Lesson 1: Configuring Network Load Balancing
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Load Sharing and Balancing - Saravanan Mathialagan Masters in Computer Science Georgia State University.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Ten Configuring Windows Server 2008 for High.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Understanding Active Directory
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Event Viewer Was of getting to event viewer Go to –Start –Control Panel, –Administrative Tools –Event Viewer Go to –Start.
1 © Talend 2014 Service Locator Talend ESB Training 2014 Jan Bernhardt Zsolt Beothy-Elo
Building a massively scalable serverless VPN using Any Source Multicast Athanasios Douitsis Dimitrios Kalogeras National Technical University of Athens.
10/02/2004ELFms meeting1 Linux Virtual Server Miroslav Siket FIO-FS.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Firewalls Paper By: Vandana Bhardwaj. What this paper covers? Why you need a firewall? What is firewall? How does a network firewall interact with OSI.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
1 Distributed Systems : Server Load Balancing Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
Presented by Xiaoyu Qin Virtualized Access Control & Firewall Virtualization.
1 Chapter 12: VPN Connectivity in Remote Access Designs Designs That Include VPN Remote Access Essential VPN Remote Access Design Concepts Data Protection.
Module 5: Designing a Terminal Services Infrastructure.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Windows Server 2003 La migrazione da Windows NT 4.0 a Windows Server 2003 Relatore: MCSE - MCT.
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
Chapter 3 - VLANs. VLANs Logical grouping of devices or users Configuration done at switch via software Not standardized – proprietary software from vendor.
CERN DNS Load Balancing VladimírBahylIT-FIO NicholasGarfieldIT-CS.
CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,
OpenDNSSEC Deployment Tianyi Xing. Roadmap By mid-term – Establish a DNSSEC server within the mobicloud system (Hopfully be done by next week) Successfully.
Network Management CCNA 4 Chapter 7. Monitoring the Network Connection monitoring takes place every day when users log on Ping only shows that the connection.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Introduction to Active Directory
Computer Network Architecture Lecture 6: OSI Model Layers Examples 1 20/12/2012.
VIRTUAL SERVERS Chapter 7. 2 OVERVIEW Exchange Server 2003 virtual servers Virtual servers in a clustering environment Creating additional virtual servers.
Firewalls. Overview of Firewalls As the name implies, a firewall acts to provide secured access between two networks A firewall may be implemented as.
Jean-Philippe Baud, IT-GD, CERN November 2007
CompTIA Security+ Study Guide (SY0-401)
NAT、DHCP、Firewall、FTP、Proxy
REPLICATION & LOAD BALANCING
Services DFS, DHCP, and WINS are cluster-aware.
Introduction to Load Balancing:
High Availability Linux (HA Linux)
F5 BIGIP V 9 Training.
Service Challenge 3 CERN
Network Load Balancing
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
CompTIA Security+ Study Guide (SY0-401)
APACHE WEB SERVER.
Deploying Production GRID Servers & Services
Presentation transcript:

CERN DNS Load Balancing Vladimír Bahyl IT-FIO

26 November 2007WLCG Service Reliability Workshop2 Outline  Problem description and possible solutions  DNS – an ideal medium  Round robin vs. Load Balancing  Dynamic DNS setup at CERN  Application Load Balancing system  Server process  Client configuration  Production examples  Conclusion

26 November 2007WLCG Service Reliability Workshop3 Problem description  User expectations of IT services:  100% availability  Response time converging to zero  Several approaches:  Bigger and better hardware (= increasing MTBF)  Redundant architecture  Load balancing + Failover  Situation at CERN:  Has to provide uninterrupted services  Transparently migrate nodes in and out of production  Caused either by scheduled intervention or a high load  Very large and complex network infrastructure

26 November 2007WLCG Service Reliability Workshop4 Possible solutions  Network Load Balancing  A device/driver monitors network traffic flow and makes packet forwarding decisions  Example: Microsoft Windows 2003 Server NLB  Disadvantages:  Not applications aware  Simple network topology only  Proprietary  OSI Layer 4 (the Transport Layer – TCP/UDP) switching  Cluster is hidden by a switch behind a single virtual IP address  Switch role also includes:  Monitoring of all nodes in the cluster  Keep track of the network flow  Forwarding of packets according to policies  Example: Linux Virtual Server, Cisco Catalyst switches  Disadvantages:  Simplistic tests; All cluster nodes should be on the same subnet  Expensive for large subnets with many services  Switch becomes single point of failure

26 November 2007WLCG Service Reliability Workshop5 Domain Name System – ideal medium Ubiquitous, standardized and globally accessible database Connections to any service have to contact DNS first Provides a way for rapid updates Offers round robin load distribution (see later)  Unaware of the applications  Need for an arbitration process to select best nodes  Decision process is not going to be affected by the load on the service  Application load balancing and failover

26 November 2007WLCG Service Reliability Workshop6 DNS Round Robin lxplus001 ~ > host lxplus lxplus.cern.ch has address (1) lxplus.cern.ch has address (2) lxplus.cern.ch has address (3) lxplus.cern.ch has address (4) lxplus.cern.ch has address (5) lxplus001 ~ > host lxplus lxplus.cern.ch has address (2) lxplus.cern.ch has address (3) lxplus.cern.ch has address (4) lxplus.cern.ch has address (5) lxplus.cern.ch has address (1)  Allows basic load distribution  No withdrawal of overloaded or failed nodes

26 November 2007WLCG Service Reliability Workshop7 lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address lxplus.cern.ch has address DNS Load Balancing and Failover  Requires an additional server = arbiter  Monitors the cluster members  Adds and withdraw nodes as required  Updates are transactional  Client never sees an empty list lxplus001 ~ > host lxplus

26 November 2007WLCG Service Reliability Workshop8 DNS 1 (e.g. internal view) DNS 2 (e.g. external view) Dynamic DNS at CERN Master DNS Server Slave 2A AXFR= full zone transfer IXFR= incremental zone transfer zone= [view, domain] pair (example: [internal, cern.ch]) Slave 2B Slave 1A Slave 1B AXFR or IXFR Network Database Server Arbiter Generated static data files Dynamic DNS

26 November 2007WLCG Service Reliability Workshop9 Application Load Balancing System

26 November 2007WLCG Service Reliability Workshop10 Load Balancing Arbiter – internals 1/2  Collects metric values  Polls the data over SNMP  Sequentially scans all cluster members  Selects the best candidates  Lowest positive value = best value  Other options possible as well  Round robin of alive nodes  Updates the master DNS  Uses Dynamic DNS  With transactional signature keys (TSIG) authentication  At most once per minute per cluster

26 November 2007WLCG Service Reliability Workshop11 Load Balancing Arbiter – exceptions 2/2 Exceptional state Description Solution applied by the system R < B Number of nodes that replied is smaller than the number of best candidates that should be resolved by the DNS alias. B = R R = 0 There were no usable replies from any of the nodes in the application cluster. System returns random B nodes from the group of N. Nnumber of all nodes in the application cluster Bconfigurable number of best candidates that will resolve behind the DNS alias Rnumber of nodes that returned positive non-zero metric value within time-out interval

26 November 2007WLCG Service Reliability Workshop12 Load Balancing Arbiter – architecture 3/2  Active and Standby setup  Simple failover mechanism  Heartbeat file periodically fetched over HTTP  Daemon is:  Written in Perl  Packaged in RPM  Configured by a Quattor NCM component  but can live without it = simple configuration file  Monitoring (by LEMON)  Daemon dead  Update failed  Collecting of metrics stuck

26 November 2007WLCG Service Reliability Workshop13 Application Cluster nodes  SNMP daemon  Expects to receive a specific MIB OID  Passes control to an external program  Load Balancing Metric  /usr/local/bin/lbclient  Examines the conditions of the running system  Computes a metric value  Written in C  Available as RPM  Configured by a Quattor NCM component

26 November 2007WLCG Service Reliability Workshop14 Load Balancing Metric  System checks – return Boolean value  Are daemons running (FTP, HTTP, SSH) ?  Is the node opened for users ?  Is there some space left on /tmp ?  System state indicators  Return a (positive) number  Compose the metric formula  System CPU load  Number of unique users logged in  Swapping activity  Number of running X sessions  Integration with monitoring  Decouple checking and reporting  Replace internal formula by a monitoring metric  Disadvantage – introduction of a delay  Easily extensible or replaceable by another site specific binary

26 November 2007WLCG Service Reliability Workshop15 Production examples  LXPLUS – interactive login cluster  SSH protocol  Users log on to a server and interact with it  WWW, CVS, FTP servers  CASTORNS – CASTOR name server cluster  Specific application on a specific port  LFC, FTS, BDII, SRM servers  … 40 clusters in total  Could be any application – client metric concept is sufficiently universal !

26 November 2007WLCG Service Reliability Workshop16 State Less vs. State Aware  System is not aware of the state of connections  State Less Application  For any connection, any server will do  Our system only keeps the list of available hosts up-to-date  Example: WWW server serving static content  State Aware Application  Initial connection to a server; subsequent connection to the same server  Our load balancing system can not help here  Solution: after the initial connection the application must indicate to the client where to connect  Effective bypass of the load balancing  Example: ServerName directive in Apache daemon

26 November 2007WLCG Service Reliability Workshop17 Conclusion  Dynamic DNS switching offers possibility to implement automated and intelligent load-balancing and failover system  Scalable  From two node cluster to complex application clusters  Decoupled from complexity of the network topology  Need for an Arbiter  Monitor the cluster members  Select the best candidates  Update the published DNS records  Built around OpenSource tools  Easy to adopt anywhere  Loosely coupled with CERN fabric management tools

26 November 2007WLCG Service Reliability Workshop18 Thank you.  (accessible from inside CERN network only) Vladimír Bahyl