FCT Follow-up Meeting 31 March, 2017 Fernando Meireles

Slides:



Advertisements
Similar presentations
SLA-Oriented Resource Provisioning for Cloud Computing
Advertisements

Using EC2 with HTCondor Todd L Miller 1. › Introduction › Submitting an EC2 job (user tutorial) › New features and other improvements › John Hover talking.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Architecture overview 6/03/12 F. Desprez - ISC Cloud Context : Development of a toolbox for deploying application services providers with a hierarchical.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Fabien Viale 1 Matlab & Scilab Applications to Finance Fabien Viale, Denis Caromel, et al. OASIS Team INRIA -- CNRS - I3S.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Machine/Job Features Update Stefan Roiser. Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN-IT Update Ian Bird On behalf of IT Multi-core and Virtualisation Workshop,
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Farming Andrea Chierici CNAF Review Current situation.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Taming Local Users and Remote Clouds with HTCondor at CERN
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
CMS Experience with Indigo DataCloud
Brief introduction about “Grid at LNS”
Daniele Bonacorsi Andrea Sciabà
Accessing the VI-SEEM infrastructure
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
Organizations Are Embracing New Opportunities
ALICE & Clouds GDB Meeting 15/01/2013
Volunteer Computing for Science Gateways
Elastic Computing Resource Management Based on HTCondor
Blueprint of Persistent Infrastructure as a Service
Kerberos token renewal & HTCondor
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ATLAS Cloud Operations
WLCG Manchester Report
Outline Benchmarking in ATLAS Performance scaling
Towards GLUE Schema 2.0 Sergio Andreozzi INFN-CNAF Bologna, Italy
IW2D migration to HTCondor
StratusLab Final Periodic Review
StratusLab Final Periodic Review
ETICS Pool for IPv6 tests
1 VO User Team Alarm Total ALICE ATLAS CMS
How to enable computing
The CREAM CE: When can the LCG-CE be replaced?
David Cameron ATLAS Site Jamboree, 20 Jan 2017
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
1 VO User Team Alarm Total ALICE ATLAS CMS
PES Lessons learned from large scale LSF scalability tests
Simulation use cases for T2 in ALICE
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
WLCG Collaboration Workshop;
What’s New from Platform Computing
Ivan Reid (Brunel University London/CMS)
Building and running HPC apps in Windows Azure
Introduction to High Performance Computing Using Sapelo2 at GACRC
Best practises and experiences in user support
Exploring Multi-Core on
The LHCb Computing Data Challenge DC06
EGI High-Throughput Compute
Presentation transcript:

FCT Follow-up Meeting 31 March, 2017 Fernando Meireles fernando.meireles@cern.ch

Outline Contextualization Infrastructure Services Batch Services Automation Workflows Other Assignments Conclusions

Contextualization Before IT reorganization After IT reorganization Supervised by Ulrich Schwickerath Batch team IT-PES-PS section After IT reorganization Supervised by Ben Jones IT-CM-IS section

Infrastructure Services Created to bring all the front-of-house compute-related and support services together in one team HPC MPI applications (on Linux) Engineering and Physics simulations Volunteering Computing Opportunistic resources LHC@Home LHC event simulations Batch High Throughput Physics event reconstruction Data analysis Physics simulations

Batch Service at CERN 500k Jobs/day 120k CPU cores over 2 instances Job is a program submitted to the Batch service to be processed by a worker node without further user interaction Service pattern: Waits for users to submit jobs The jobs wait in queues Execute the jobs in the platform Return the job results to the users 500k Jobs/day 120k CPU cores over 2 instances HTCondor and IBM LSF Local and WLCG (Worldwide LHC Computing Grid) Job submission

Batch Service Architecture Batch Cluster HTCondor LSF CERN OpenStack Cloud

Automation Tools What are? For what? Why? mmm… automation Group of workflows Integrated with the existing system tools Automate Operations tasks For what? Creation of the Batch resources Configuration of the Batch worker nodes Monitoring of the Batch cluster Why? Minimize operations cost Make our life easier mmm… automation

Automation Tools: Creation of Worker Nodes Batch Cluster Spare A Spare B Spare C Check for available resources VM VM VM VM VM VM VM VM VM VM VM Check Instance requirements Create VMs for Spare groups VMs will get a general config. CERN OpenStack Cloud VM VM VM

Automation Tools: Configuration of Worker Nodes Batch Cluster HTCondor HG a HG b LSF HG c HG d WN WN WN WN Check VM status in Spare WN WN OK VMs are moved NOK VMs wait Spare A Spare B Spare C Some VMs can be broken VMs in HG get specific config. VM VM VM VM VM VM VM VM VM VM VM Used as WNs in HTCondor and LSF

Other assignments Migration to Puppet 4 Syntax changes Test and Debugging Dual Stack configuration of services Enabling IPv6 Configuring Firewall Rotational support of all services

Conclusions Batch/cloud systems integration and monitoring Operations procedures General problem solving skills (services support) Trainings Agile Infrastructure & Puppet for Service Managers Developing secure software

Questions?