FCT Follow-up Meeting 31 March, 2017 Fernando Meireles fernando.meireles@cern.ch
Outline Contextualization Infrastructure Services Batch Services Automation Workflows Other Assignments Conclusions
Contextualization Before IT reorganization After IT reorganization Supervised by Ulrich Schwickerath Batch team IT-PES-PS section After IT reorganization Supervised by Ben Jones IT-CM-IS section
Infrastructure Services Created to bring all the front-of-house compute-related and support services together in one team HPC MPI applications (on Linux) Engineering and Physics simulations Volunteering Computing Opportunistic resources LHC@Home LHC event simulations Batch High Throughput Physics event reconstruction Data analysis Physics simulations
Batch Service at CERN 500k Jobs/day 120k CPU cores over 2 instances Job is a program submitted to the Batch service to be processed by a worker node without further user interaction Service pattern: Waits for users to submit jobs The jobs wait in queues Execute the jobs in the platform Return the job results to the users 500k Jobs/day 120k CPU cores over 2 instances HTCondor and IBM LSF Local and WLCG (Worldwide LHC Computing Grid) Job submission
Batch Service Architecture Batch Cluster HTCondor LSF CERN OpenStack Cloud
Automation Tools What are? For what? Why? mmm… automation Group of workflows Integrated with the existing system tools Automate Operations tasks For what? Creation of the Batch resources Configuration of the Batch worker nodes Monitoring of the Batch cluster Why? Minimize operations cost Make our life easier mmm… automation
Automation Tools: Creation of Worker Nodes Batch Cluster Spare A Spare B Spare C Check for available resources VM VM VM VM VM VM VM VM VM VM VM Check Instance requirements Create VMs for Spare groups VMs will get a general config. CERN OpenStack Cloud VM VM VM
Automation Tools: Configuration of Worker Nodes Batch Cluster HTCondor HG a HG b LSF HG c HG d WN WN WN WN Check VM status in Spare WN WN OK VMs are moved NOK VMs wait Spare A Spare B Spare C Some VMs can be broken VMs in HG get specific config. VM VM VM VM VM VM VM VM VM VM VM Used as WNs in HTCondor and LSF
Other assignments Migration to Puppet 4 Syntax changes Test and Debugging Dual Stack configuration of services Enabling IPv6 Configuring Firewall Rotational support of all services
Conclusions Batch/cloud systems integration and monitoring Operations procedures General problem solving skills (services support) Trainings Agile Infrastructure & Puppet for Service Managers Developing secure software