EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Experiences with the GLUE information schema.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
EDG Resource Broker for the Glue Schema Sergio Andreozzi INFN-CNAF Bologna (Italy)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of Interoperability Markus Schulz.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
© 2006 Open Grid Forum Information Modeling of Grid Resources: the OGF GLUE WG Approach DMTF Symposium - Portland, 17 July 2007 Sergio Andreozzi INFN-CNAF,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Managing Computational Activities on the Grid - from Specifications to Implementation: The GLUE 2 information model OGF25, 2nd March 2009, Catania Balázs.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
February 20, AgentCities - Agents and Grids Prof Mark Baker ACET, University of Reading Tel:
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Felix Ehm CERN IT-GD EGEE 2008 GLUE 2.0.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
Grid Workload Management Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Migration to the GLUE 2.0 information schema in the LCG/EGEE/EGI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks State of Interoperability Laurence Field.
© 2006 Open Grid Forum Enabling Pervasive Grids The OGF GIN Effort Erwin Laure GIN-CG co-chair, EGEE Technical Director
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks State of Interoperability Laurence Field.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
EGEE is a project funded by the European Union under contract INFSO-RI Copyright (c) Members of the EGEE Collaboration GLUE Schema Sergio.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Report from GGUS BoF Session at the WLCG.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Towards a WBEM-based Implementation of the OGF GLUE Information Model Sergio Andreozzi, INFN-CNAF, Bologna (Italy) Third EGEE User Forum 13 Feb 2008, Clermont-Ferrand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
Easy Access to Grid infrastructures Dr. Harald Kornmayer (NEC Laboratories Europe) Dr. Mathias Stuempert (KIT-SCC, Karlsruhe) EGEE User Forum 2008 Clermont-Ferrand,
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
DataTAG is a project funded by the European Union DataTAG WP4 meeting, Bologna 29/07/2003 – n o 1 GLUE Schema - Status Report DataTAG WP4 meeting Bologna,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical: The Information Systems.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG Management Board, CERN 25 November 2008.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GLUE Schema Configuration for SRM 2.2 Stephen.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GLUE 2: Deployment and Validation Stephen Burke egi.eu EGI OMB March 26 th.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Implementation of GLUE 2.0 support in the EMI Data Area Elisabetta Ronchieri on behalf of JRA1’s GLUE 2.0 Working Group INFN-CNAF 13 April 2011, EGI User.
EMI is partially funded by the European Commission under Grant Agreement RI EMI Status And Plans Laurence Field, CERN Towards an Integrated Information.
Towards GLUE Schema 2.0 Sergio Andreozzi INFN-CNAF Bologna, Italy
SRM2 Migration Strategy
A conceptual model of grid resources and services
Report on GLUE activities 5th EU-DataGRID Conference
Stephen Burke egi.eu EGI TF Prague September 20th 2012
Information System (BDII)
Sergio Andreozzi Laurence Field Balazs Konya
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Experiences with the GLUE information schema in the LCG/EGEE production Grid Stephen Burke, Sergio Andreozzi and Laurence Field CHEP07, Victoria, Canada

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 2 Overview Why we need a schema The GLUE project How GLUE is used in the LCG/EGEE Grid Experience, successes and problems Outlook This talk focuses on the use of the GLUE schema in the LCG/EGEE Grid – other Grids are available!

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 3 Why do we need a schema? A Grid consists of many sites with a wide variety of resources Users, applications and middleware need to know what resources are available and what their properties are –What Resource Brokers are available to CMS? –Find a Computing Element running SL4 with > 2 Gb memory –Find a Storage Element with 20 Tb of free space Grid management and operations staff need an overview of the state of the Grid –How many jobs are running in the UK? –How much disk space has ATLAS used? The schema allows the resource properties to be published and queried in a uniform way The information is transported via an information system, but the schema is logically independent of it

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 4 Desirable schema properties Not too big – don’t describe every detail, only the things which are really needed Not too small – need to capture all the important features Flexible – must be able to cope with a very wide range of resource configurations Precise – the semantics should be clear and unambiguous Simple – easy to understand what the attributes mean Calculable – it should be possible to determine the values of attributes in a short time (typically < 1 second) Extensible – it must be possible to evolve the schema without breaking existing software

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 5 Why standardise the schema? Many different Grids, interoperability is a major activity –For WLCG this means EGEE, OSG and NDGF Must publish the same information, with the same names and semantics –At least for a core set of attributes –Even trivial differences, e.g. units or case of text, can cause problems –Can sometimes write translators between different schemas, but this is complex and error-prone Different Grids can learn from each other’s experience –And avoid duplicating work But need to focus on what is really needed –Standardisation activities can be slow –Design by committee not always optimal

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 6 GLUE history The European DataGrid project (predecessor of EGEE) initially had its own schema (2001) The GLUE (Grid Laboratory for a Uniform Environment) project was a collaboration between EDG, EU DataTAG, iVDGL (predecessor of OSG) and Globus to promote interoperability –The GLUE schema 1.0 was defined in September 2002 after several months of discussion –Version 1.1 was released with some minor improvements in April 2003, and deployed by EDG and then LCG and EGEE –Version 1.2 was agreed in February 2005, finalised during 2005 and deployed (fairly gradually) by LCG/EGEE in 2006 –Version 1.3 was agreed in October 2006 and is starting to be deployed now

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 7 Evolution process Evolution has been fairly slow – two upgrades in four years –Collect problems and ideas, discuss by /phone – several months –One face-to-face meeting to agree changes – intensive but productive –Write documentation, update schema implementations and deploy them – also a few months –Update information providers – timescale varies, can be 1-2 years –Update clients – timescale varies, can be infinite! Backward compatibility maintained through the whole process – significant constraint –Some sites take a very long time to upgrade –Many legacy objects/attributes

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 8 Evolution issues The schema touches everything (middleware, users, sysadmins, …) so lots of dependencies –But most people are not schema experts –Small group involved directly in defining the schema, most with other jobs GLUE must meet the needs of many different Grids –Current contributors include EGEE, OMII-Europe, KnowArc, TERAGRID, NAREGI, UNICORE, NGS, OSG, APACGrid, … Different implementation technologies –Currently LDAP, relational (R-GMA), classad, XML –Places constraints on structure  No tables or primary keys in LDAP  No multivalued attributes in a relational schema  classads are flat lists

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 9 Schema structure The schema is defined in an abstract UML format –Objects with attributes and relations –Attributes have types, can be single- or multi-valued GlueSite – describes a Grid site –Location, contacts, affiliation, … GlueService – describes attributes of a generic service – Type, endpoint, status, ACL, … GlueCE – a complex set of objects and attributes describing a computing resource –Queue, policy, cluster, … GlueSE – a complex set of objects and attributes describing a storage resource –Storage area, control and access protocols, policies, … Also many subsidiary objects

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 10 GLUE schema in LCG/EGEE The Resource Broker allows job submission to be directed according to requirements on the GLUE schema attributes, and a ranking expression defines the order of preference –JDL (Job Definition Language) uses classads, which need to be mapped to the schema The data management clients query the information system for the attributes of Storage Elements Other tools present information directly to the user Monitoring tools collect summary information for the whole Grid The storage schema is currently used for a prototype storage accounting system –Although this is not in general a target use for the schema

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 11 Service discovery Glue 1.2 introduced the GlueService concept to publish generic service information –gLite has a Service Discovery API to query it –Can publish some service-specific information (key/value pairs) Slow takeup –Mainly data management so far GlueService is not explicitly linked to GlueCE/SE –Clients may make ad-hoc assumptions to link them, e.g. matching hostnames No generic information provider –Just static configuration –Some services have custom publishers –New publisher now being certified

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 12 Mapping the schema Real systems are very varied and complex –The schema is uniform and simple Not always obvious how to relate the two –Usually have to make assumptions and simplifications –May be wrong, or generate mismatches between sites CE: now have a framework with plugins for each batch system –Uniform assumptions, common code where possible SE: main target is SRM, but still under development First priority is to err on the safe side –Beware of black holes! If an attribute isn’t used people may not be careful to get it right, so then it can’t be used! Need schema validation tools to check for sanity

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 13 Publishing the information Need a precise definition of attributes, even where it seems trivial –Long discussion about OS names (not defined in schema) Sysadmins ideally should not need to define things by hand –Typos, misunderstanding of semantics Are attributes optional? –Technically yes in most cases, but not always obvious what it means –Usually no special value to mean N/A Units must be clear –Gb vs Gib etc Information providers sometimes slow, can load the system –Introduce caching, but then the information is older Some things are hard to calculate –EstimatedResponseTime –Used/Free space for storage Mistakes/hacks can be hard to fix –Incorrect assumptions get embedded in client code

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 14 CE issues Basic structure is CE – Cluster – SubCluster –CE is a batch queue, cluster is the hardware –SubClusters are homogeneous groups of nodes –Original schema design had many detailed host-level attributes – largely unused Resource Broker can’t choose SubClusters, so the concept isn’t usable –In practice we have one heterogeneous subcluster per cluster Queue concept for CE too simple –Some batch systems have no queues –Usually have fairshares within a queue –Glue 1.2 introduced the VOView to represent a share Some ambiguities in mapping to real systems –CPUs vs job slots –Memory per node, or per job? No information about disk space for scratch files

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 15 SE issues Original schema was defined for “classic SE” – simple disk server + gridftp –Plus other access protocols, e.g. rfio, file Now using Storage Resource Manager (SRM) –In transition from SRM v1 to v2 GLUE schema v 1.3 has several enhancements for SRM –But not much experience yet – just starting to deploy it Lots of debate about free/used/available/reserved space –Schema defines lots of attributes, we will see what can be published

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 16 Interaction with users Users need to understand the meaning and limitations of the schema attributes Production users now often have complex Requirement/Rank expressions built up from years of experience –Ordinary users may be less sophisticated Users often ignore attributes which are “usually” not relevant –Max queued jobs, OS type, CPU/wallclock time limits, … Need frameworks to shield users from the details

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 17 Summary The GLUE schema has developed over 6 years of practical use by EDG/LCG/EGEE –And other Grids It has proved to be sufficient to allow many users to submit large numbers of jobs, manage data and monitor the Grid –No show stoppers However, many rough edges –But now we know where most of the problems lie GLUE is now an OGF working group –Aiming for a major redesign – GLUE 2.0 –Includes experience from many more Grids –See poster session –… watch this space!

Enabling Grids for E-sciencE EGEE-II INFSO-RI GLUE schema experience - CHEP07 18 References GLUE 1: –GLUE 1.3 spec: GLUE 2: –Mailing list: