Presentation is loading. Please wait.

Presentation is loading. Please wait.

FCT Follow-up Meeting 31 March, 2017 Fernando Meireles

Similar presentations


Presentation on theme: "FCT Follow-up Meeting 31 March, 2017 Fernando Meireles"— Presentation transcript:

1

2 FCT Follow-up Meeting 31 March, 2017 Fernando Meireles

3 Outline Contextualization Infrastructure Services Batch Services
Automation Workflows Other Assignments Conclusions

4 Contextualization Before IT reorganization After IT reorganization
Supervised by Ulrich Schwickerath Batch team IT-PES-PS section After IT reorganization Supervised by Ben Jones IT-CM-IS section

5 Infrastructure Services
Created to bring all the front-of-house compute-related and support services together in one team HPC MPI applications (on Linux) Engineering and Physics simulations Volunteering Computing Opportunistic resources LHC event simulations Batch High Throughput Physics event reconstruction Data analysis Physics simulations

6 Batch Service at CERN 500k Jobs/day 120k CPU cores over 2 instances
Job is a program submitted to the Batch service to be processed by a worker node without further user interaction Service pattern: Waits for users to submit jobs The jobs wait in queues Execute the jobs in the platform Return the job results to the users 500k Jobs/day 120k CPU cores over 2 instances HTCondor and IBM LSF Local and WLCG (Worldwide LHC Computing Grid) Job submission

7 Batch Service Architecture
Batch Cluster HTCondor LSF CERN OpenStack Cloud

8 Automation Tools What are? For what? Why? mmm… automation
Group of workflows Integrated with the existing system tools Automate Operations tasks For what? Creation of the Batch resources Configuration of the Batch worker nodes Monitoring of the Batch cluster Why? Minimize operations cost Make our life easier mmm… automation

9 Automation Tools: Creation of Worker Nodes
Batch Cluster Spare A Spare B Spare C Check for available resources VM VM VM VM VM VM VM VM VM VM VM Check Instance requirements Create VMs for Spare groups VMs will get a general config. CERN OpenStack Cloud VM VM VM

10 Automation Tools: Configuration of Worker Nodes
Batch Cluster HTCondor HG a HG b LSF HG c HG d WN WN WN WN Check VM status in Spare WN WN OK VMs are moved NOK VMs wait Spare A Spare B Spare C Some VMs can be broken VMs in HG get specific config. VM VM VM VM VM VM VM VM VM VM VM Used as WNs in HTCondor and LSF

11 Other assignments Migration to Puppet 4
Syntax changes Test and Debugging Dual Stack configuration of services Enabling IPv6 Configuring Firewall Rotational support of all services

12 Conclusions Batch/cloud systems integration and monitoring
Operations procedures General problem solving skills (services support) Trainings Agile Infrastructure & Puppet for Service Managers Developing secure software

13 Questions?

14


Download ppt "FCT Follow-up Meeting 31 March, 2017 Fernando Meireles"

Similar presentations


Ads by Google