Download presentation
Presentation is loading. Please wait.
Published byCassandra Hill Modified over 6 years ago
2
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
3
Outline Contextualization Infrastructure Services Batch Services
Automation Workflows Other Assignments Conclusions
4
Contextualization Before IT reorganization After IT reorganization
Supervised by Ulrich Schwickerath Batch team IT-PES-PS section After IT reorganization Supervised by Ben Jones IT-CM-IS section
5
Infrastructure Services
Created to bring all the front-of-house compute-related and support services together in one team HPC MPI applications (on Linux) Engineering and Physics simulations Volunteering Computing Opportunistic resources LHC event simulations Batch High Throughput Physics event reconstruction Data analysis Physics simulations
6
Batch Service at CERN 500k Jobs/day 120k CPU cores over 2 instances
Job is a program submitted to the Batch service to be processed by a worker node without further user interaction Service pattern: Waits for users to submit jobs The jobs wait in queues Execute the jobs in the platform Return the job results to the users 500k Jobs/day 120k CPU cores over 2 instances HTCondor and IBM LSF Local and WLCG (Worldwide LHC Computing Grid) Job submission
7
Batch Service Architecture
Batch Cluster HTCondor LSF CERN OpenStack Cloud
8
Automation Tools What are? For what? Why? mmm… automation
Group of workflows Integrated with the existing system tools Automate Operations tasks For what? Creation of the Batch resources Configuration of the Batch worker nodes Monitoring of the Batch cluster Why? Minimize operations cost Make our life easier mmm… automation
9
Automation Tools: Creation of Worker Nodes
Batch Cluster Spare A Spare B Spare C Check for available resources VM VM VM VM VM VM VM VM VM VM VM Check Instance requirements Create VMs for Spare groups VMs will get a general config. CERN OpenStack Cloud VM VM VM
10
Automation Tools: Configuration of Worker Nodes
Batch Cluster HTCondor HG a HG b LSF HG c HG d WN WN WN WN Check VM status in Spare WN WN OK VMs are moved NOK VMs wait Spare A Spare B Spare C Some VMs can be broken VMs in HG get specific config. VM VM VM VM VM VM VM VM VM VM VM Used as WNs in HTCondor and LSF
11
Other assignments Migration to Puppet 4
Syntax changes Test and Debugging Dual Stack configuration of services Enabling IPv6 Configuring Firewall Rotational support of all services
12
Conclusions Batch/cloud systems integration and monitoring
Operations procedures General problem solving skills (services support) Trainings Agile Infrastructure & Puppet for Service Managers Developing secure software
13
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.