Machine/Job Features Update Stefan Roiser. Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual.

Slides:



Advertisements
Similar presentations
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Advertisements

Communicating Machine Features to Batch Jobs GDB, April 6 th 2011 Originally to WLCG MB, March 8 th 2011 June 13 th 2012.
Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.
WLCG Interaction Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.
The Middleware Readiness Working Group LHCb Computing Workshop LHCb Computing Workshop Maria Dimou IT/SDC 2014/05/22.
WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
HEPiX October 2009 Keith Chadwick. Outline Virtualization & Cloud Computing Physical Infrastructure Storage Monitoring Security ITIL HEPiX Conference.
OSG Public Storage and iRODS
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
WLCG Cloud Traceability Working Group face to face report Ian Collier 11 February 2015.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.
1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN-IT Update Ian Bird On behalf of IT Multi-core and Virtualisation Workshop,
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.
GDB July 2015 Jeremy’s quick summary notes Also refer to the meeting minutes
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Julia Andreeva on behalf of the MND section MND review.
The GridPP DIRAC project DIRAC for non-LHC communities.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VM Management Chair: Alexander Papaspyrou 2/25/
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES CVMFS deployment status Ian Collier – STFC Stefan Roiser – CERN.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
Definitions Information System Task Force 8 th January
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Accounting Review Summary from the pre-GDB related to CPU (wallclock) accounting Julia Andreeva CERN-IT GDB 13th April
Accounting John Gordon WLC Workshop 2016, Lisbon.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
PanDA HPC integration. Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI John Gordon EGI Virtualisation and Cloud Workshop Amsterdam 12 th May 2011.
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
Review of the WLCG experiments compute plans
Virtualization and Clouds ATLAS position
ATLAS Cloud Operations
WLCG Operations Coordination
How to enable computing
FTS Monitoring Ricardo Rocha
Proposal for obtaining installed capacity
Future Test Activities SA3 All Hands Meeting Dublin
MPI probes OMB Meeting 26th February 2013
PES Lessons learned from large scale LSF scalability tests
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
Discussions on group meeting
WLCG Collaboration Workshop;
HEPiX October 2009 Keith Chadwick.
Presentation transcript:

Machine/Job Features Update Stefan Roiser

Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual Machine pilot features FeatureStore Features info flow Worker Node A means to provide per worker node / job slot information from resource providers to users 9 Sep '15 - GDB StR - MJF Status2

Features and their Usage Machine Features: WN power, Shutdown time, # jobslots, # physical/logical cores Job Features: Limits on CPU/Wall time, scratch space and memory, # cores allocated, job start time What to use it for: Discover specific limits on this WN Calculate time left in queue Announce shutdown of WN to users … 9 Sep '15 - GDB StR - MJF Status3

Reference Implementations TechnologyConvener ApacheAndrew McNab HTCondorMarian Zvada LSFUlrich Schwickerath SGEManfred Alef SlurmUlf Tigerstedt Torque/PbsJan Just Keijser 9 Sep '15 - GDB StR - MJF Status4

MJF Taskforce Scope Check the completeness of the proposal for machine/job features Coordinate implementations used in WLCG and an interface for its usage to the VOs Provide means to monitor the correctness of the provided information Plan and execute the deployment of those implementations at all WLCG resources 9 Sep '15 - GDB StR - MJF Status5 ✓ ✓ ✓ Note: Correctness checking of the provided feature values is NOT in the scope of this TF

MJF SAM Probe 9 Sep '15 - GDB StR - MJF Status6 CERN GRIDKA Imperial College LPNHE Testing the existence of MJF on WLCG sites Running in LHCb preprod for some months Status: 4 LHCb supporting sites have MJF deployed … in contact with more UK and swiss sites, out of a total of >60 LHCb supporting sites Note: WARNING b/c of extra README file, otherwise OK Note: Also several cloud sites have MJF deployed

How to move on … LHCb asks for deployment of MJF at supporting sites by the end of this year Similar to what has been done for CVMFS Correctness of the provided features shall be checked against data collected by experiments See also Philippe’s talk If differences are spotted and not obvious bugs the TF can provide a platform for discussion/clarification 9 Sep '15 - GDB StR - MJF Status7

Links / Further Info Taskforce Twiki: ineJobFeatures ineJobFeatures Git Repo: LHCb SAM preprod instance: sam-lhcb-dev.cern.ch/templates/ember/ sam-lhcb-dev.cern.ch/templates/ember/ Egroup: wlcg-ops-coord-tf- 9 Sep '15 - GDB StR - MJF Status8

BACKUP 9 Sep '15 - GDB StR - MJF Status9

Example MJF SAM probe result: 9 Sep '15 - GDB StR - MJF Status10