Grid Projects: EU DataGrid and LHC Computing Grid Oxana Smirnova Lund University October 29, 2003, Košice
Outlook Precursors: attempts to meet tasks of HEP computing EDG: the first global Grid development project LCG: deploy computing environment for LHC experiments
Characteristics of HEP computing Eventindependence Event independence Data from each collision is processed independently: trivial parallelism Mass of independent problems with no information exchange Massivedatastorage Massive data storage Modest event size: 1 – 10 MB (not ALICE though) Total is very large – Petabytes for each experiment. Mostlyreadonly Mostly read only Data never changed after recording to tertiary storage But is read often! A tape is mounted at CERN every second! Resilience rather than ultimate reliability Individual components should not bring down the whole system Reschedule jobs on failed equipment Modestfloatingpointneeds Modest floating point needs HEP computations involve decision making rather than calculation
Department Desktop CERN – Tier 0 MONARC report: Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 622 Mbps 155 mbps Tier2 Lab a Uni. b Lab c Uni. n MONARC: hierarchical regional centres model
EU Datagrid project In certain aspects was initiated as a MONARC follow- up, introducing the Grid technologies Started on January 1, 2001, to deliver by end 2003 Aim: to develop a Grid middleware suitable for High Energy physics, Earth Observation, biomedical applications and live demonstrations 9.8 MEuros EU funding over 3 years Development based on existing tools, e.g., Globus, LCFG, GDMP etc Maintains development and applications testbeds, which include several sites across the Europe
EDG overview : Main partners CERN – International (Switzerland/France) CNRS – France ESA/ESRIN – International (Italy) INFN – Italy NIKHEF – The Netherlands PPARC – UK Slide by EU DatGrid
Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden EDG overview : Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France) Slide by EU DatGrid
EDG work-packages WP1: Work Load Management System WP2: Data Management WP3: Grid Monitoring / Grid Information Systems WP4: Fabric Management WP5: Storage Element, MSS support WP6: Testbed and demonstrators WP7: Network Monitoring WP8: High Energy Physics Applications WP9: Earth Observation WP10: Biology WP11: Dissemination WP12: Management
Simplified Grid deployment approach Homogeneous structure All the sites must run with the same OS and kernel (Linux, RedHat7.3) Recommended central installation via LCFG service (installs entire machine from scratch on each reboot) Exceptions are possible, but not supported Invasive installation Requires massive existing cluster re-configuration Needs to be installed on every compute node
Basic EDG services Workload management Resource Broker (RB) and Job Submission Service (JSS) Logging and Bookkeeping Service (L&B) Information Index (II) User Interface (UI) Data management Replica Location Service (RLS) Replica Metadata Catalog (RMC) Replica Optimization Service (ROS) Information and monitoring service Relational Grid Monitoring Architecture (R-GMA) Fabric management Mass storage management Virtual Organization management
Typical EDG site composition Site-specific: User Interface (UI) Computing Element or Service (CE) Gatekeeper (GK) Worker Nodes (WN) do have client APIs for accessing EDG services and information Storage Element (SE) Monitoring Node (MON) R-GMA servlets for the site ROS Common: Resource Broker (RB) RLS Local Replica Catalog (LRC) RMC Information Catalog (IC)
Organization of user access Users must have valid personal Globus-style certificates Group or anonymous certificates are not allowed Certificate Issuing Authority (CA) must be endorsed by the EDG Security Group If there is no approved CA in your country/region, France catches all Users must belong to one of the accepted Virtual Organizations (VO) LHC experiments, biomedical and Earth Observation applications, and some EDG teams VO lists are managed by experiments/teams representatives Users can belong to several VOs Users with identical names or a user with several certificates can not belong to a same VO Local system administrators still have a full control To “log into the Grid”, users make use of the private certificate to issue a public proxy Grid sites accept requests only from users whose certificates are signed by CAs that a site accepts
EDG applications testbed EDG is committed to create a stable testbed to be used by applications for real tasks This started to materialize in August 2002… …and coincided with the ATLAS DC1 CMS joined in December ALICE, LHCb – smaller scale tests At the moment (October 2003) consists of ca. 15 sites in 8 countries Most sites are installed from scratch using the EDG tools (require/install RedHat 7.3) Some have installations on the top of existing resources A lightweight EDG installation is available Central element: the Resource Broker (RB), distributes jobs between the resources Most often, a single RB is used Some tests used RBs “attached” to User Interfaces In future, may be an RB per Virtual Organization (VO) or/and per user ?
EDG Applications Testbed snapshot
Basic EDG functionality as of today UI CASTOR RLS CE RB do rfcp rfcp replicate RM jdl +R-GMA NFS RSL Output RM Input Output
EDG status The EDG1 was not a very satisfactory prototype Highly unstable behavior Somewhat late deployment Many missing features and functionalities The EDG2 is released and deployed for applications on October 20, 2003 Many services have been re-written since EDG1 Some functionality have been added, but some have been lost Stability is still the issue, esp. the Information System performance Little has been done to streamline applications environment deployment No production-scale tasks have been shown to perform reliably yet No development will be done beyond this point Bug fixing will continue for a while Some “re-engineering” is expected to be done by the next EU-sponsored project – EGEE
The future: LCG LCG LHC Computing Grid Goal: to deploy an adequate information and computational infrastructure for the LHC experiments Means of achieving: using the modern distributed computing and data analysis tools and utilities – The Grid Resources: large computing centers around the World as the basic elements Research institutes, laboratories and universities are also members of the data analysis chain No need to concentrate the computing power at CERN
LCG Timeline September 2001: the project is approved by the CERN Council Duration: 2002 to 2008 Phase 1: prototyping, testing Phase 2: deployment of the LHC computing infrastructure2 November 2003: a functioning LCG-1 prototype (a criterion: 30 consecutive days of non-stop operation); includes 10 regional centers May 2004: research lab and institutes are joining with their resources December 2004: LCG-3, 50% of expected by 2007 performance IX/ Phase 1 Phase 2 XI/03V/04XII/04
LCG organization Financing: CERN and other states participating in LHC projects Business partners LHC experiments National research foundations and computing centers Projects financed my EU and other international funds Structure: Applications CERN fabric Grid technology Grid deployment FOR MORE INFO:
First priority: LCG-1 Computing clusterNetwork resourcesData storage Operating systemLocal schedulerFile system User accessSecurity Data transfer Information schema Global schedulerData managementInformation system User interfaces Applications Major components and levels: Hardware System software Passive services Active services High level services Closed system (?) HPSS, CASTOR… RedHat Linux NFS, … PBS, Condor, LSF,… VDT (Globus, GLUE) EU DataGrid LCG, experiments
grid for a physics study group grid for a regional group Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department Desktop Germany Tier 1 USA UK France Italy Taiwan CERN Tier 1 Japan The LHC Computing Centre CERN Tier 0 LHC Grid: what became of the MONARC hierarchy
LCG status Grid component: almost entirely the EDG solution Major difference: LCG-1 still has the “old” MDS for the information system Deployed at the LCG testbed, non-overlapping with the EDG in general, includes non-EU countries like US, Russia or Taiwan More stable so far than EDG (for MDS?..) Little or no Grid development In future, may consider alternative Grid solutions, e.g., the AliEn (though unlikely) Grid Technology area is on the verge of being dismissed, as LCG will not be doing Grid development LHC Applications component: A lot of very serious development Many areas are covered, from generators to Geant4 to data management etc Unfortunately, has little interaction and co-operation with Grid developers
LCG-1 Testbed
Summary Initiated by CERN, EDG came as the first global Grid R&D project aiming at deploying working services Sailing in uncharted waters, EDG ultimately provided a set of services, allowing to construct a Grid infrastructure Perhaps the most notable EDG achievement is introduction of authentication and authorization standards, now recognized worldwide LCG took a bold decision to deploy EDG as their Grid component for the LCG-1 release The Grid development does not stop with EDG: LCG is open for new solutions, with a strong preference towards OGSA