Presentation is loading. Please wait.

Presentation is loading. Please wait.

The evolution of the information system in EGI/WLCG

Similar presentations


Presentation on theme: "The evolution of the information system in EGI/WLCG"— Presentation transcript:

1 The evolution of the information system in EGI/WLCG
Stephen Burke egi.eu GridKa School August 30th 2013

2 Information system evolution - GridKa school
Overview What is a Grid information system? Historical development GLUE information model BDII technology Current situation Validation Future developments Information system evolution - GridKa school

3 What is an information system?
A Grid consists of a large number of services and resources, distributed over many sites around the world Users, VO managers and Grid operations managers need to understand what services exist, and relevant details about their properties and current state An information system collects information from across the Grid and allows it to be queried and displayed in a uniform way Information system evolution - GridKa school

4 Information system components
Information model The heart of the system: a uniform way to represent the properties of a diverse set of services and resources Information providers Software components which collect and publish information according to the information model Transport A technology which allows information to be aggregated from all the services in the Grid, and queried in a standard way Information system evolution - GridKa school

5 The GLUE information model
Information system evolution - GridKa school

6 What is the information model used for?
Users, applications and middleware need to know what resources are available and what their properties are What workload managers are available to the biomed VO? Find a computing system running SL6 with > 3 GB memory Find a storage system with 20 TB of free space which I can write to Grid management and operations staff need an overview of the state of the Grid How many jobs are running in the UK? How much disk space has biomed used? What is the total installed CPU power available to EGI? Which sites are running EMI 2 services? Information system evolution - GridKa school

7 Information system evolution - GridKa school
GLUE history The European DataGrid project (EDG, the predecessor of EGEE and EGI) initially had its own information model (2001) The GLUE (Grid Laboratory for a Uniform Environment) project was a collaboration between EDG, EU DataTAG, iVDGL (predecessor of OSG) and Globus to promote interoperability GLUE 1.0 was defined in September 2002 Version 1.1 was released with some minor improvements in April 2003, and deployed by EDG and then LCG and EGEE in 2003/4 Version 1.2 was agreed in February 2005, implemented in May 2005 and deployed (fairly gradually) by WLCG/EGEE in 2006 Version 1.3 was agreed in October 2006, implemented in December 2006 and deployed from 2007 Information system evolution - GridKa school

8 Information system evolution - GridKa school
GLUE 1 concepts Independent structures for computing and storage Computing part split into Cluster (hardware and software on batch workers) and CE (Grid interface and batch system) Storage part pre-dated the SRM protocol Storage area: a chunk of space accessible by one or more VOs Access protocol: GridFTP, http, file, … Some SRM support added in the 1.3 revision Generic Service object added in the 1.2 revision Basically a network endpoint Limited scope for adding new information without a schema revision Information system evolution - GridKa school

9 Information system evolution - GridKa school
Problems with GLUE 1 GLUE 1 has worked, but we had many accumulated issues The schema definitions were based on limited experience Initially only for CE and SE – completely different structure Service object different again Embedded assumptions which turned out to be too restrictive/specific Only two CPU benchmarks Definitions not always clear, documentation somewhat limited Many things effectively defined by WLCG/EGEE practice Evolution was fairly slow – two upgrades in four years The GLUE project finished – just an ad-hoc group of interested people Slow to deploy upgraded information providers and clients Backward compatibility maintained – significant constraint Changes often “shoe-horned” into the available structure Many legacy objects/attributes Information system evolution - GridKa school

10 Information system evolution - GridKa school
GLUE 2 We always intended to defer conceptual and structural changes to a major revision called GLUE 2 Complete redesign, no backward compatibility Incorporating lessons from many years of experience Decision made to define GLUE 2 within the Open Grid Forum (OGF) Many (~ 14) Grid projects participating Kickoff in January 2007; specification approved in March 2009 Positive outcomes of OGF process GLUE 2 is a Grid standard, widely accepted within OGF Interacts with other OGF standards (BES, SAGA, JSDL, …) Increased participation from, and hence acceptance by, other projects - interoperability Raised visibility/commitment within WLCG/EGEE/EGI/EMI EMI adopted GLUE 2 as a unified standard Information system evolution - GridKa school

11 Information system evolution - GridKa school
Key concepts User Domain Negotiates Share with Admin Domain Provides Service Manager Contacts Has Has End Point Maps User to Share Defined on Resource Has Runs Has Access Policy Mapping Policy Activity Information system evolution - GridKa school

12 Information system evolution - GridKa school
Major improvements Universal concept of a Service as a coherent grouping of Endpoints, Managers and Resources ComputingService and StorageService are specialisations, sharing a common structure as far as possible ComputingService has a better separation of Grid endpoint, LRMS and queue/fairshare StorageService designed for SRM v2 Generic concepts for Manager (software) and Resource (hardware) Clients can have a standard structure All objects are extensible We always find new cases we didn’t anticipate! So far no need for a GLUE 2.1 Some concepts made more generic/flexible by making them separate objects rather than attributes Location, Contact, Policy, Benchmark, Capacity More complete/rigorous definitions Information system evolution - GridKa school

13 Information system evolution - GridKa school
The BDII Information system evolution - GridKa school

14 Information system evolution - GridKa school
History A suite of Grid middleware was developed as part of the EU-funded EDG (2001-4) and EGEE ( ) projects EDG started with the tools available in version 2 of the Globus toolkit - the information system component was the Monitoring and Discovery system (MDS) based on the Lightweight Directory Access Protocol (LDAP) technology The intention was to develop a new technology to support the information system, but in practice the LDAP-based system was evolved incrementally into the current Berkeley Database Information Index (BDII) Information system evolution - GridKa school

15 Information system evolution - GridKa school
LDAP LDAP dates from 1995 (RFC 1777). It provides a network-accessible interface to a simple (non-relational) tree-structured database. It is supported as standard in Linux distributions; common applications are user databases and address books. Standard query tool: ldapsearch APIs for several languages Standard text format for data: LDAP Data Interchange Format (LDIF) Various backends are supported to store the underlying information, one of which is a relational database called the Berkeley Database. LDAP supports SASL authentication, but this is not used by the BDII. Information system evolution - GridKa school

16 Information system evolution - GridKa school
The BDII The BDII uses LDAP in a somewhat unusual way. The information is a snapshot of the current state, updated every few minutes. There is therefore no need for long-term persistency or backup, but there is a requirement to update the underlying database at a fairly high rate. Useful to hold the database in memory The database is updated either by running local information providers (scripts which output LDIF-formatted information) or by querying another BDII Information in the database can be cached to allow for the possibility of information being temporarily unavailable Information system evolution - GridKa school

17 Information system evolution - GridKa school
Rendering GLUE in LDAP The GLUE information model is an abstract structure, formed from objects, attributes and relations It needs to be mapped into the concepts of a specific technology For LDAP this is mostly straightforward GLUE was defined with LDAP in mind Limited data types, basically just string and integer Type checking needs to be external No built-in relations Use attributes containing the ID of a related object LDAP structures objects into a tree, but that has no direct mapping to GLUE GLUE 1 and GLUE 2 in separate trees with different roots Subtree per Grid site Information system evolution - GridKa school

18 Information system structure
The information system uses a 3-level hierarchy of BDIIs A resource BDII runs on an individual service node, and publishes information about it by running local information providers A site BDII pulls information from all resource BDIIs at a site And also adds some information about the site as a whole A top-level BDII pulls information from all site BDIIs Bootstrapped from a central database (GOC DB) There are many top BDIIs in the Grid Site and top BDIIs are Grid services and hence have their own resource BDIIs! Information system evolution - GridKa school

19 Distributed LDAP Hierarchy
Top-level BDII LDAP LDAP Site BDII LDAP LDAP LDAP LDAP LDAP LDAP LDAP LDAP LDAP Site A Site B Site C Resource BDII Information system evolution - GridKa school

20 Information system evolution - GridKa school
LDAP queries LDAP is suited to efficient query performance Common queries can be optimised by indexing important attributes in the underlying DB Example: extract the version of every site BDII (and count them): time ldapsearch -x -h lcg-bdii.cern.ch -p b o=grid '(&(objectclass=GlueService)(GlueServiceType=bdii_site))' GlueServiceVersion | grep Version: | wc -l 328 real 0m0.196s LDAP is unfamiliar to most users, and queries have a rather idiosyncratic syntax Need to package queries inside other tools for common use Information system evolution - GridKa school

21 Information providers
GLUE 1 publication has been in place for many years Many services use a standard provider GLUE 2 required new information providers Progressively rolled out from late 2010 Simple services came first Computing and storage more complex The EMI project made GLUE 2 publication a major objective ARC, Unicore, gLite and dCache middleware all included Globus services now also published in GLUE 2 GLUE 2 infrastructure is now largely complete Information system evolution - GridKa school

22 Information system evolution - GridKa school
Some statistics 364 sites 3788 service endpoints GLUE 1: 68k objects, 1.2 million attributes, 90 MB GLUE 2: 273k objects, 3.2 million attributes, 343 MB Just a snapshot, your mileage may vary Information system evolution - GridKa school

23 Information system evolution - GridKa school
Ensuring quality Information system evolution - GridKa school

24 Information system evolution - GridKa school
Resilience Important to make the system resilient to failure and overload Site and top BDIIs can have multiple load-balanced nodes behind a DNS alias BDIIs are stateless Top BDIIs cache information for up to 24 hours Service status attributes set to “unknown” to indicate that dynamic state information is stale Standard top BDIIs for each region are defined and monitored by EGI Target is 99% availability Client tools can use a list of top BDIIs for failover Information system evolution - GridKa school

25 Information system evolution - GridKa school
Validation The information in the information system comes from many different sources Hardwired values Configured by the sysadmin Standard unix commands Dynamic queries to services Many ways to go wrong – information is not always reliable Traditionally the validation of the information was rather ad-hoc Diagnose and fix after a problem is observed For GLUE 2 we wanted to do better Proactive detection of errors Information system evolution - GridKa school

26 Information system evolution - GridKa school
EGI GLUE 2 profile GLUE 2 is intentionally very flexible Many ways to use it, not necessarily interoperable EGI needed a profile to specify how it should be used in our context Detailed semantics of each attribute What should and should not be published Importance and expected use Rules to define what values are allowed Heuristics to detect anomalies Profile document written in 2012 With wide consultation of interested parties Information system evolution - GridKa school

27 Information system evolution - GridKa school
glue-validator A command-line tool (python) which systematically checks published information against the GLUE specification and (optionally) the EGI profile Many options to restrict/filter/format the output Primarily for GLUE 2, but many problems will be common with GLUE 1 Bugs/issues in middleware fed back to developers Now part of the EGI middleware acceptance criteria Known issues can be excluded from the validation output Can be run as a Nagios probe Integrated into the standard EGI monitoring system Automated checks for Grid sites - tickets raised by shifters Still experimental, but will be moved to production soon Information system evolution - GridKa school

28 Information system evolution - GridKa school
The future Information system evolution - GridKa school

29 Information system evolution - GridKa school
GLUE 2 migration GLUE 1 and GLUE 2 published in parallel Most use is still GLUE 1, but: Nearly all services now publish in GLUE 2 Validation means that the quality of the information is good and constantly monitored Workload and data management clients support GLUE 2 Future developments will be in GLUE 2 Hope to progressively move to using GLUE 2 over the next ~2 years GLUE 1 will remain as a backup for the foreseeable future Information system evolution - GridKa school

30 Information system evolution - GridKa school
Related systems GOC DB: a static central database of sites and services Used as a bootstrap for monitoring and to record downtimes Now moving to GLUE 2 as an information model XML data format Some overlap with the information system – we need a better definition of the relationship EMIR: A new service which stores a subset of slowly-changing GLUE 2 information which can be used for service discovery May be more efficient than the BDII? Information system evolution - GridKa school

31 Reacting to new developments
Clouds are somewhat similar to Grids, but have significant differences Services may have different properties Services and sites may appear and vanish dynamically EGI is developing a GLUE 2 extension for cloud services Still early days, very experimental The information model may need to be extended as computing and storage systems evolve Virtual machines, GPUs and many-core systems Federated storage Information system evolution - GridKa school

32 Information system evolution - GridKa school
Longer-term issues The BDII technology is stable and reliable, but: All attributes updated at the same frequency (minutes) The intrinsic rate of change varies from seconds to months One size fits all Top-level BDIIs all contain the full information for the whole Grid Different applications may have different requirements The BDII structure is configured by hand May be less good for dynamic, cloud-like systems? LDAP is non-standard in the Grid world No authentication or authorisation to read Plenty of room for new ideas! Information system evolution - GridKa school

33 Information system evolution - GridKa school
References OGF GLUE working group home page GLUE 2.0 specification EGI GLUE 2 Profile document Information system home page Information about LDAP GOC DB documentation EMIR manual Information system evolution - GridKa school


Download ppt "The evolution of the information system in EGI/WLCG"

Similar presentations


Ads by Google