May 24 2001http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May 24 2001 Olof Bärring, CERN.

Slides:



Advertisements
Similar presentations
CERN STAR TAP June 2001 Status of the EU DataGrid Project Fabrizio Gagliardi CERN EU-DataGrid Project Leader June 2001
Advertisements

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Fabric and Storage Management GridPP Fabric and Storage Management GridPP 24/24 May 2001.
CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.
Fabric Management at CERN BT July 16 th 2002 CERN.ch.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
LNL CMS M.Biasotto, Bologna, 29 aprile LNL Analysis Farm Massimo Biasotto - LNL.
German Cancio – WP4 developments Partner Logo WP4-install plans WP6 meeting, Paris project conference
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Collaborative Campus Grid - Practices and experiences in Leiden University Campus Grid (LUCGrid) Hui Li Feb 4, 2005.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
The CrossGrid project Juha Alatalo Timo Koivusalo.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Workload Management Massimo Sgaravatto INFN Padova.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
EDG LCFGng: concepts Fabric Management Tutorial - n° 2 LCFG (Local ConFiGuration system)  LCFG is originally developed by the.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
KNMI Applications on Testbed 1 …and other activities.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01.
A Grid Computing Use case Datagrid Jean-Marc Pierson.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
Budapest, September 5th, 2002 DataGrid Accounting System DGAS Current status & plans Stefano Barale INFN Budapest, September.
DataGrid Fabric Management (WP4) Gridification of Large Farms, a very brief overview David Groep, NIKHEF
Maite Barroso – WP4 Barcelona – 13/05/ n° 1 -WP4 Barcelona- Closure Maite Barroso 13/05/2003
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
M.Biasotto, CERN, 5 november Fabric Management Massimo Biasotto, Enrico Ferro – INFN LNL.
1CHEP2000 February 2000F. Gagliardi EU HEP GRID Project Fabrizio Gagliardi
Olof Bärring – EDG WP4 status&plans- 22/10/ n° 1 Partner Logo EDG WP4 (fabric mgmt): status&plans Large Cluster.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
EC Review – 01/03/2002 – WP9 – Earth Observation Applications – n° 1 WP9 Earth Observation Applications 1st Annual Review Report to the EU ESA, KNMI, IPSL,
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Partner Logo Olof Bärring, WP4 workshop 10/12/ n° 1 (My) Vision of where we are going WP4 workshop, 10/12/2002 Olof Bärring.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
WP10 Goals and accomplishments from WP10 point of view J. Montagnat, CNRS, CREATIS V. Breton, CNRS/IN2P3 DataGrid Biomedical Work Package.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
14 June 2001LHCb workshop at Bologna1 LHCb and Datagrid - Status and Planning F Harris(Oxford)
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Workload Management Workpackage
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
Monitoring and Fault Tolerance
WP4 Fabric Management 3rd EU Review Maite Barroso - CERN
Grid related projects CERN openlab LCG EDG F.Fluckiger
UK GridPP Tier-1/A Centre at CLRC
Fabric and Storage Management
WP1 activity, achievements and plans
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
The EU DataGrid Fabric Management Services
Presentation transcript:

May http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN

May http://cern.ch/hep-proj-grid-fabric2 Outline Background Architecture Short term prototypes (September 2001) GRID issues Conclusions

May http://cern.ch/hep-proj-grid-fabric3 Background 3 years EU funded project lead by Fabrizio Gagliardi, CERN Started 1/1/ principal contractors: CERN, CNRS, ESA, INFN, FOM, PPARC 15 assistant contractors

May http://cern.ch/hep-proj-grid-fabric4 Workpackages WP1: Workload Management WP2: Grid Data Management WP3: Grid Monitoring Services WP4: Fabric management WP5: Mass Storage Management WP6: Integration Testbed – Production quality International Infrastructure WP7: Network Services WP8: High-Energy Physics Applications WP9: Earth Observation Science Applications WP10: Biology Science Applications WP11: Information Dissemination and Exploitation WP12: Project Management

May http://cern.ch/hep-proj-grid-fabric5 WP4: Fabric Management “To deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes.”

May http://cern.ch/hep-proj-grid-fabric6 WP4: Fabric Management ~14 FTEs (6 funded by the EU) for 3 years split over 6 partners: CERN, FOM/NIKHEF, ZIB, Heidelberg Univ. PPARC, INFN The work divided into 6 subtasks –Configuration management –Automatic software installation & maintenance –Monitoring –Fault tolerance –Resource management –“Gridification”

May http://cern.ch/hep-proj-grid-fabric7 Dependencies Grid Scheduler Grid Monitoring & Information Service Gridification Configuration Management Resource Management Fault Tolerance Installation Management Monitoring Cluster Fabric GRID

May http://cern.ch/hep-proj-grid-fabric8 Configuration management GUI CLI CDB Compilation (one-way) Client machine MLD Translation HLDLLD Cached LLD Manipulations (read/write) Fetching only HLD = High Level Description LLD = Low Level Description MLD = Machine Level Description

May http://cern.ch/hep-proj-grid-fabric9 Installation management Software Maintainers Configuration Management SRS Local Node BSS NMS Resource Management Monitoring Fault Tolerance SRS = Software Repository NMS = Node Management BSS = Bootstrap Service

May http://cern.ch/hep-proj-grid-fabric10 Scheduling of Actions Node autonomy approach (chaotic) –High level configuration change propagated to all affected nodes –Monitoring senses a change of configuration –Fault tolerance fires an actuator to bring the node to its configured state (could be “re-install”) What happens to running jobs? Who tells scheduler that node is in maintenance? How are dependent actions handled (e.g. server intervention)?

May http://cern.ch/hep-proj-grid-fabric11 Scheduling of Actions Decompose complex actions into simple “atomic” actions that can be serialized centrally –Each configuration change would generate a simple action on the affected nodes –Scripts to bundle the actions together and executes them in a sensible order Use APIs to the different sub-components

May http://cern.ch/hep-proj-grid-fabric12 Change glibc on service A 1.Get list of ndoes L belonging to service A 2.For all nodes (L1…Ln) –Disable Li in scheduler queue A 3.Wait for completion of 2 4.For all nodes (L1…Ln) –Submit admin job to node Li 5.Wait for completion of 4 6.For all nodes (L1…Ln) –Re-enable node Li in scheduler queue A

May http://cern.ch/hep-proj-grid-fabric13 For September 2001 First prototype of the configuration management system –Low level (node) query interface –Caching “Interim” installation system –LCFG for upgrades and maintenance –SystemImager for initial system install and VACM console control for system preparation

May http://cern.ch/hep-proj-grid-fabric14 GRID issues “Gridification” Protect the fabric against GRID jobs –Local farms will still be used by local users –Firewalls (channeling of job I/O, interactive jobs, MPI over WAN, …) –Local authorization of grid users –Job information

May http://cern.ch/hep-proj-grid-fabric15 Conclusions DataGrid WP4 is not so much about the G-word. It is really about automating cluster management In the process of defining the global architecture. How do we best put the bits and pieces together? Ambitious delivery plans already for September