Presentation is loading. Please wait.

Presentation is loading. Please wait.

Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Improving resilience of T0 grid services Manuel Guijarro.

Similar presentations


Presentation on theme: "Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Improving resilience of T0 grid services Manuel Guijarro."— Presentation transcript:

1 Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Improving resilience of T0 grid services Manuel Guijarro – IT/PES Steve Traylen– IT/PES EGI Community Forum 2012

2 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Outline Introduction One server, one application Virtualisation Service Consolidation DNS Load balancing Grid WMS Example Conclusion 2

3 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Introduction  Platform Support Section in IT-PES: Interactive Login Services and Batch Grid (mainly Computing) Services: –CEs, WMS, LB, VOMS, BDII, CVMFS, FTS, and LFC. Infrastructure Services: –Messaging Service –DNS Load Balancing Service –Service Consolidation Service –Internal Cloud Infrastructure 3

4 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Introduction II Grid Services not all HA by design. Need to increase their Availability Use in house infrastructure services: –Service Consolidation Service (Virtualisation) –DNS Load Balancing Service –Cheap solutions Do not provide real High Availability But greatly reduces down time of Grid Services 4

5 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES “one server, one application” Low Infrastructure Utilization –Typically one application per server to avoid the risk of vulnerabilities in one application affecting the availability of another application on the same server Increasing Physical Infrastructure Costs –Power consumption, cooling and facilities costs that do not vary with utilization levels Increasing IT Management Costs –Spend disproportionate time and resources on manual tasks associated with server maintenance, and thus require more personnel to complete these tasks Insufficient Failover and Disaster Protection –The threat of security attacks and natural disasters has elevated the importance of business continuity Operating System Application Server Operating System Application Server Operating System Application Server

6 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Virtualization Virtualization is the ability of running multiple independent virtual operating systems on a single physical computer

7 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Server consolidation Grid VOMS servers usage 7 CPU utilization –Grid VOMS cluster – March 2012

8 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Server consolidation Main advantages: –Multiple services in the same server –Hardware agnostic –No resources underutilization Operating System Application Server Operating System Application Server Operating System Application Server Computer Center (513) Hypervisor Server Hypervisor Server Hypervisor Server Computer Center (513) OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS ApAp ApAp ApAp ApAp ApAp ApAp ApAp ApAp OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS ApAp ApAp ApAp ApAp ApAp ApAp ApAp ApAp OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS ApAp ApAp ApAp ApAp ApAp ApAp ApAp ApAp 8

9 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Hypervisor Server Hypervisor Server Hypervisor Server Computer Center (513) OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS A1A1 A1A1 A1A1 A1A1 Ax OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS A1A1 A1A1 A2A2 A2A2 Hardware interventions Main advantages: –User transparent –No service degradation Hypervisor Server Hypervisor Server Hypervisor Server Computer Center (513) OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS A1A1 A1A1 A1A1 A1A1 A1A1 A1A1 A2A2 A2A2 OSOS OSOS OSOS OSOS Ax OSOS OSOS OSOS OSOS Operating System Application 1 Server Operating System Application 1 Server Operating System Application 1 Server Computer Center (513) 9

10 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Virtualization tools There are different virtualization technologies: –XEN –KVM –Microsoft Hyper-V –VMware ESXi PES-PS tested XEN and currently we are using KVM and Microsoft Hyper-V 10

11 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Cloud Orchestration tools There are several cloud orchestration tools to build private clouds: –Openstack, OpenNebula, Platform ISF, Eucalyptus, Nimbus, Microsoft SCVMM, VMware vSphere,... PES-PS test(ed) Platform ISF, OpenNebula, Microsoft SCVMM and OpenStack For Service Consolidation currently using Microsoft SCVMM 11

12 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Is this the silver bullet? 90% of PES Grid services run on VMs Still some on real HWD (until it expires) Other saving excuse: –5-10% lost in CPU performance –20% lost on disk I/O –Overall performance still OK for most services Still exposed to (partial) interruptions: –OS or Grid Application upgrades –….. 12

13 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES DNS Load Balancing

14 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES WMS Example: setup 14 3 load-balancing DNS aliases for different configuration classes (“subclusters”) –SAM monitoring (wmssam.cern.ch), CMS (wmscms.cern.ch), other VOs (wmsshared.cern.ch) –Identical configuration for all nodes in a same subcluster (using central configuration mgmt) Node load taken into account to select a set of “best nodes” to be exposed in each DNS alias –Using metrics specific to WMS –Highly loaded nodes stop receiving new jobs Well supported by client software (gLite UI) –Users specify a single server name in their config: the DNS alias –DNS server returns a list of IP addresses for the alias –Client software randomly tries IP addresses from the list

15 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Benefits & limits Benefits –Flexibility: nodes can be added or removed from a DNS alias without users changing their configuration –Resource optimization: even load distribution on WMS nodes –Availability: highly loaded or sick nodes automatically removed from DNS alias –Transparent maintenance: nodes undergoing maintenance are not exposed to users But does not replace a full HA solution –Each job remains tied to a specific node (we use WMS+LB co-hosting) –WMS node unavailable = no job status update 15

16 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Conclusion Service Consolidation via Virtualisation should become a common practise DNS Load balancing is cheap and helps The real challenge is ahead of us: –Running services in a(n) (internal) cloud –# of Nodes varies constantly –Dynamic Configuration becomes a must Will require service redesign for most of what we know. 16


Download ppt "Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t PES Improving resilience of T0 grid services Manuel Guijarro."

Similar presentations


Ads by Google