Presentation is loading. Please wait.

Presentation is loading. Please wait.

Castor services at the Tier-0

Similar presentations


Presentation on theme: "Castor services at the Tier-0"— Presentation transcript:

1 Castor services at the Tier-0
Jan van Eldik CERN Castor Operations team

2 Outline Castor in numbers 24 x 7 operations
Teams, workflows Building blocks for Castor services DNS loadbalancing, redundant hardware Case studies Castor service deployment SRM service deployment 2

3 Castor service challenge
Disk Cache Size Number of Servers Number of Disk Pools Number of files on disk Data on Tape Alice 261 TB 47 3 1,311,901 742 TB Atlas 505 TB 77 6 1,794,670 720TB CMS 472 TB 85 5 299,396 1,220 TB LHCb 197 TB 36 700,097 453 TB Total 1,435 TB 245 19 4.1 M 3,115 TB Numbers from Sept 2007 3

4 Castor service challenge - II
2007 2008 2009 2010 2011 2012 25 5 10 20 15 PB Alice Atlas CMS LHCb 2007: 2 PB, 400 servers 2012: 21 PB, 1000 servers 2007: 2 PB, 400 servers 2012: 21 PB, ~1000 servers 4

5 Support Infrastructure
HelpDesk, GGUS Operator Service Manager On Duty SysAdmin CASTOR Service Expert CASTOR Developer 5

6 System-level alarms CASTOR Service Expert Operator SysAdmin
1st level alarm handling 24 x 7 coverage on site Driven by procedures Operator 2nd level alarms handling 24 x 7 coverage, on-call out of working hours Problem determination Manage hardware repairs SysAdmin CASTOR Service Expert Service responsible Applies software upgrades, configuration changes and provides procedures Manages disruptive interventions Handles problematic situations 6

7 User support CASTOR Service Expert Service Manager On Duty HelpDesk,
1st level user support Handle common user questions Triage HelpDesk, GGUS 2nd level user support Handle common problems Procedure driven Service Manager On Duty CASTOR Service Expert Handle uncommon and complex problems Provide procedures 7

8 Number of calls per week
HelpDesk, GGUS Operator 127 18 Service Manager On Duty SysAdmin 6 5 Castor Service Expert Ops -> Sysadmin : 4187 tkts SAO -> us : 173 (94 direct, 79 via SMoD) HelpDesk -> SMoD : 605 SMoD -> Castor service manager : 200 CSM -> Dev : 19 Plus ~10 calls via support lists, direct s, phone calls, … 0.5 Castor Developer 8

9 Technologies used Use of DNS aliases Loadbalancing where possible
To provide user-friendly entry points to services Allows to change deployment layout transparently (failover to standby servers) Loadbalancing where possible Multiple (cheap) servers, split over network switches, power bars, … Scale service by adding servers Allows ‘cyclic upgrades’ Pre-requisite: stateless daemons Hardware with redundancy features Hot-swappable disks and power supplies, hardware RAID ‘Mid-range’ servers for core components NAS diskservers RAID-5 + SPARE configurations SPOFs: motherboards, RAID controllers, … Oracle databases on RAC Only component on ‘critical power’ 9

10 Castor central services
10

11 Castor diskcaches Five independent diskcaches Server cluster:
Alice, Atlas, CMS, LHCb, Public Server cluster: Stager, request handler, scheduler, rtcpclientd, … Most of them stateless Current deployment: midrange servers, with DNS aliases for all of them Near future: loadbalanced aliases (except scheduler) Diskserver pools: Configured and sized according to experiment needs Rely on hardware redundancy And on copies on tape  Fully automated box management 11

12 Castor SRM SRM v1 SRM v2 DNS aliases srm.cern.ch Shared by all VO’s
Loadbalanced over 10 CPU servers Deployment bug: on single switch SPOF: shared request spool SRM v2 Separate endpoints per VO Fixed bug in loadbalancing  Any SPOFs left? 12

13 SRM v2.2 production endpoints
13

14 Conclusion & Outlook Castor service = H/W + S/W + operations
Workflows are in place for 24 x 7 operations Teams, alarms, procedures We are actively hunting down SPOFs DNS aliases and loadbalancing Redundant hardware Castor software is rapidly maturing “Thanks!” to developers Ready to add 700 diskservers in 2008 And to operate them! 14

15 Castor nameserver disk cache Tape backend
Reconstruction farms Online farms Analysis facility WAN data exports 15


Download ppt "Castor services at the Tier-0"

Similar presentations


Ads by Google