The EU DataGrid The European DataGrid Project Team
The EDG Intro– Tutorial - n° 2 Tutorial Roadmap Project Introduction Security Architecture The EDG Testbed Coffee Break Specific Middleware Issues n Job Management n Data Management n Monitoring & Fabric Management Application Examples
The EDG Intro– Tutorial - n° 3 Glossary RBResource Broker VOVirtual Organisation CE Computing Element SE Storage Element GDMP GRID Data Mirroring Package LDAP Lightweighted Directory Access Protocol LCFG Local Configuration System LRMS Local Resource management system (Batch) (PBS, LSF) WMSWorkload Management System LFNLogical File Name (like MyMu.dat) SFNSite File Name ( like storageEl1.cern.ch:/home/data/MyMu.dat )
The EU DataGrid Project Introduction The European DataGrid Project Team
The EDG Intro– Tutorial - n° 5 Contents The EDG Project scope Achievements EDG structure Middleware Workpackages: Goals, Achievements DataGrid in Numbers Relation to Sister Projects
The EDG Intro– Tutorial - n° 6 The Grid Vision Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource n From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… n central location, n central control, n omniscience, n existing trust relationships.
The EDG Intro– Tutorial - n° 7 Grids: Elements of the Problem Resource sharing n Computers, storage, sensors, networks, … n Sharing always conditional: issues of trust, policy, negotiation, payment, … Coordinated problem solving n Beyond client-server: distributed data analysis, computation, collaboration, … Dynamic, multi-institutional virtual orgs n Community overlays on classic org structures n Large or small, static or dynamic
The EDG Intro– Tutorial - n° 8 Goals DataGrid is a project funded by European Union whose objective is to exploit and build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases. Enable data intensive sciences by providing world wide Grid test beds to large distributed scientific organizations ( “Virtual Organizations, Vos”) Start ( Kick off ) : Jan 1, 2001 End : Dec 31, 2003 Applications/End Users Communities : HEP, Earth Observation, Biology Specific Project Objectives: n Middleware for fabric & grid management n Large scale testbed n Production quality demonstrations n Collaborate and coordinate with other projects (Globus, Condor, CrossGrid, DataTAG, etc) n Contribute to Open Standards and international bodies ( GGF, Industry&Research forum)
The EDG Intro– Tutorial - n° 9 DataGrid Main Partners CERN – International (Switzerland/France) CNRS - France ESA/ESRIN – International (Italy) INFN - Italy NIKHEF – The Netherlands PPARC - UK
The EDG Intro– Tutorial - n° 10 Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France)
The EDG Intro– Tutorial - n° 11 Project Schedule Project started on 1/Jan/2001 Testbed 0 (early 2001) n International test bed 0 infrastructure deployed s Globus 1 only - no EDG middleware Testbed 1 ( 2002 ) n First release of EU DataGrid software to defined users within the project: s HEP experiments (WP 8), Earth Observation (WP 9), Biomedical applications (WP 10) Testbed 2 (End 2002) n Builds on Testbed 1 to extend facilities of DataGrid n Focus on production quality Testbed 3 (2003) n Advanced functionality; currently being deployed. Project stops on 31/Dec/2003
The EDG Intro– Tutorial - n° 12 DataGrid Work Packages The EDG collaboration is structured in 12 Work Packages n WP1: Work Load Management System n WP2: Data Management n WP3: Grid Monitoring / Grid Information Systems n WP4: Fabric Management n WP5: Storage Element n WP6: Testbed and demonstrators – Production quality International Infrastructure n WP7: Network Monitoring n WP8: High Energy Physics Applications n WP9: Earth Observation n WP10: Biology n WP11: Dissemination n WP12: Management
The EDG Intro– Tutorial - n° 13 DataGrid Architecture Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book- keeping
The EDG Intro– Tutorial - n° 14 EDG Interfaces Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book- keeping Computing Elements SystemManagers Scientist s OperatingSystems File Systems StorageElements Mass Storage Systems HPSS, Castor User Accounts Certificate Authorities ApplicationDevelopers Batch Systems PBS, LSF
The EDG Intro– Tutorial - n° 15 WP1: Work Load Management Goals n Maximize use of resources by efficient scheduling of user jobs Achievements n Definition of architecture for scheduling & res. mgmt. and accounting & reservation n Development of "super scheduling" component using application data and computing elements requirements n Support for MPI jobs n Logical job check pointing n Interactive jobs Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Bookkeeping
The EDG Intro– Tutorial - n° 16 EDG middleware architecture: The Workload Management System (WP1) WP1 is responsible for the Workload Management System (WMS). The WMS is currently composed by the following parts: n User Interface (UI) : access point for the user to the GRID ( using JDL) n Resource Broker (RB) : the broker of GRID resources, matchmaking n Job Submission System (JSS) : Condor-G; interfacing batch systems n Information Index (II) : an LDAP server used as a filter to select resources n Logging and Bookkeeping services (LB) : MySQL databases to store Job Info
The EDG Intro– Tutorial - n° 17 WP1: Work Load Management Components Job Description Language Resource Broker Job Submission Service Information Index User Interface Logging & Bookkeeping Service Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n UI : python (LB client : C++) n RB : C++ n JSS : C++, python n II : LDAP server n LB: MySQL, C++ n Input/Output Sandboxes: GridFTP Job Managem. SQL Database Services WMS main interfaces: n Globus Gatekeeper n WP2 Replica Catalog APIs n WP3 Information Systems n WP7 network monitoring info providers n End User (using JDL files, on the UI)
The EDG Intro– Tutorial - n° 18 WP2: Data Management Goals n Coherently manage and share petabyte-scale information volumes in high-throughput production-quality grid environments Achievements n Survey of existing tools and technologies for data access and mass storage systems n Definition of architecture for data management n Deployment of Grid Data Mirroring Package (GDMP) in Testbed 1 n Deployment of EDG Replica Manager in Testbed 2 n Close collaboration with Globus, PPDG/GriPhyN & Condor s Common design of RLS n Working with GGF on standards Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Bookkeeping
The EDG Intro– Tutorial - n° 19 EDG middleware architecture: WP2 (Data Management ) WP2 is responsible for Data Management, which includes file and replica management, metadata access and data security. WP2 components: Replica Manager: the main manager for triggering replica execution all over the GRID, including replica optimization and interfacing the replica catalog service Replica Catalog: a GRID service used to resolve Logical File Names into a set of corresponding Physical File Names – Globus Replica Catalog and Replica Location Service (RLS) GDMP: the GRID Data Mirroring Package, used to create replicas of any filetype all over the GRID Storage Elements in a synchronized way, by automatic updating the replica catalog Spitfire: provides a Grid enabled middleware service for access to relational databases : it consists of the Spitfire Server module and the Spitfire Client libraries and command line executables.
The EDG Intro– Tutorial - n° 20 WP2: Data Management Deployed Components GridFTP Replica Manager - edg-replica- manager and Reptor Replica Catalog - globus-replica- catalog GDMP Spitfire Collective Services Info & Monitor Grid Scheduler Replica Manager Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Job Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n RM: C++ n Reptor: Java based Web Services n RC : Globus Replica Catalog wrapper n GDMP : C++ n Spitfire : Java, Web Services Data Managem. SQL Database Services WP2 main interfaces: n The GRID Storage Element n WP1 Resource Broker APIs n WP3 GRID Info services n WP7 network monitoring info providers n End User (using GDMP) Storage Element Services
The EDG Intro– Tutorial - n° 21 WP3: Grid Monitoring Services Goals n Provide information system for discovering resources and monitoring status Achievements n Survey of current technologies n Coordination of schemas in testbed 1 n Development of Ftree caching backend based on OpenLDAP (Light Weight Directory Access Protocol) to address shortcoming in MDS v1 n Relational Grid Monitoring Architecture (R- GMA) n GRM and PROVE adapted to grid environments to support end-user application monitoring Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorizat ion Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book-keeping
The EDG Intro– Tutorial - n° 22 WP3’s task is to provide information about The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information (as implemented by WP1's Logging and Bookkeeping). Grid applications This is information published by user jobs. This is used for performance monitoring. R-GMA n relational implementation of the GGF GMA n interoperable with MDS WP3 : GRID Monitoring and Info Providers
The EDG Intro– Tutorial - n° 23 WP3: GRID Monitoring Components MDS / FTree R-GMA GRM/Prove Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n MDS : LDAP, Globus GRIS, GIIS n FTree : OpenLDAP, caching n R-GMA : Java, C++, MySQL, TomCat n GRM / PROVE : P-GRADE Job Managem. SQL Database Services WP3 main interfaces: n WP1 Resource Broker ( InfoIndex) n WP2 RM optimizer n all GRID services producing info (SE,CE..) n WP7 network monitoring
The EDG Intro– Tutorial - n° 24 WP4: Fabric Management Goals n manage clusters (~thousands) of nodes Achievements n Survey of existing tools, techniques and protocols n Defined an agreed architecture for fabric management n Initial implementations deployed at several sites in testbed 1 & 2 Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book-keeping
The EDG Intro– Tutorial - n° 25 WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG. User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services n Gridification – Grid interface to fabric resources n Resource Management – manage underlying batch services Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance n Configuration Management n Installation Management n Fabric Monitoring EDG middleware architecture: WP4 : Fabric Management Components
The EDG Intro– Tutorial - n° 26 WP4: Fabric Management Components LCFG Fabric Monitoring PBS & LSF info providers Image installation Config. Cache Mgr Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: LCFG : C++, XML, HTTP Job Managem. SQL Database Services WP4 main interfaces: WP1 Resource Broker ( InfoIndex) WP2 Data management WP5 Storage Element WP3 GRID Info Services
The EDG Intro– Tutorial - n° 27 WP5: Mass Storage Management Goals n Provide common user and data export/import interfaces to existing local mass storage systems Achievements n Review of Grid data systems, tape and disk storage systems and local file systems n Definition of Architecture and Design for DataGrid Storage Element n Collaboration with Globus on GridFTP/RFIO n Collaboration with PPDG on control API n First attempt at exchanging Hierarchical Storage Manager (HSM) tapes n SRM compliant interface to MSS Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Bookkeeping
The EDG Intro– Tutorial - n° 28 WP5 delivers the Grid interface to Storage. Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services. Main interfaces: n Data, gridftp will be used to transfer files over the WAN and the files will optionally be available to local nodes by NFS. n Information, Existing MDS information providers will be extended to provide the extra information in the GLUE storage schema. n Control, functions such as reservation, pinning, deletion, and transfer time estimation. Will provide an SRM 2 interface. WP5 : Mass Storage Management
The EDG Intro– Tutorial - n° 29 WP5: Mass Storage Management Achievements n Definition of Architecture and Design for DataGrid storage Element n Collaboration with Globus on GridFTP/RFIO n Collaboration with PPDG on control API n Staging from/to CASTOR at CERN succesfully implemented and tested n Succesfully Interfaced to GDMP Supported Storage Systems: n UNIX disk systems n HPSS (High Performance Storage System) n CASTOR (through RFIO) n GridFTP servers n DMF n Enstore Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services WP5 (SE) main interfaces: WP1 Resource Broker & JSS WP2 RM, RC WP7 for GRIDftp monitoring WP3 GRID Info Services
The EDG Intro– Tutorial - n° 30 WP6: TestBed Integration Goals n Deploy testbeds for the end-to-end application experiments & demos n Integrate successive releases of the software components Achievements n Integration of EDG sw and deployment n Working implementation of multiple Virtual Organizations (VOs) s & basic security infrastructure n Definition of acceptable usage contracts and creation of Certification Authorities group n Definition of test plan n User’s, administrator’s, and developer’s guides Components Globus packaging & EDG config Build tools End-user documents Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Bookkeeping WP6 additions to Globus GlobusEDG release
The EDG Intro– Tutorial - n° 31 Tasks for the WP6 integration team Testing and integration of the Globus package Exact definition of RPM lists (components) for the various testbed machine profiles (CE service, RB, UI, SE service, NE, WN, ) – check dependencies Perform preliminary centrally (CERN) managed tests on EDG m/w before green light for spread EDG testbed sites deployment Provide, update end user documentation for installers/site managers, developers and end users Define EDG release policies, coordinate the integration team staff with the various WorkPackage managers – keep high inter-coordination. Assign the reported bugs to the corresponding developers/site managers (BugZilla) Complete support for the iTeam testing VO
The EDG Intro– Tutorial - n° 32 WP6: TestBed Integration and demonstrators WP6 goals: the EDG testbed n Integration of EDG sw releases and deployment all over the EDG testbed : the integration team n Working implementation of multiple VOs & basic security infrastructure n Definition of acceptable usage contracts and creation of Certification Authorities group n Set up of the Authorization Working Group to manage authorization policies on the testbed n 2 Testbeds: s Dev. TB for integration s Application TB for application usage s Certification TB planned Components Support for test-VO, mkgridmap tools Globus packaging & EDG config Build tools, CVS central s/w repository End-user documents Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authenticatio n Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services
The EDG Intro– Tutorial - n° 33 WP7: Network Services Goals n Review the network service requirements for DataGrid n Establish and manage the DataGrid network facilities n Monitor the traffic and performance of the network n Deal with the distributed security aspects Achievements n Analysis of network requirements for testbed 1 & study of available network physical infrastructure n Use of European backbone GEANT since Dec n Initial network monitoring architecture defined and first tools deployed n Collaboration with Dante & DataTAG n Working with GGF (Grid High Performance Networks) & Globus (monitoring/MDS) n Network cost estimation for workload and data management Components network monitoring tools: PingER Udpmon Iperf Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Bookkeepgin
The EDG Intro– Tutorial - n° 34 Applications (WP8-10) High Energy Physics Biomedical Applications Earth Observation Science Applications
The EDG Intro– Tutorial - n° 35 Grid aspects covered by EDG VO servers LDAP directory for mapping users (with certificates) to correct VO Storage Element Grid-aware storage area, situated close to a CE User Interface Submit & monitor jobs, retrieve output Replica Manager Replicates data to one or more CEs Job Submission Service Manages submission of jobs to Res. Broker Replica Catalog Keeps track of multiple data files “replicated” on different CEs Information index Provides info about grid resources via GIIS/GRIS hierarchy Information & Monitoring Provides info on resource utilization & performance Resource Broker Uses Info Index to discover & select resources based on job requirements Grid Fabric Mgmt Configure, installs & maintains grid sw packages and environ. Logging and Bookkeeping Collects resource usage & job status Network performance, security and monitoring Provides efficient network transport, security & bandwidth monitoring Computing Element Gatekeeper to a grid computing resource Testbed admin. Certificate auth.,user reg., usage policy etc.
The EDG Intro– Tutorial - n° 36 Software 50 use cases 18 software releases >300K lines of code People >350 registered users 12 Virtual Organisations 16 Certificate Authorities >200 people trained 278 man-years of effort 100 years funded DataGrid in Numbers Testbeds >15 regular sites >10’000s jobs submitted >1000 CPUs >5 TeraBytes disk 3 Mass Storage Systems Scientific applications 5 Earth Obs institutes 9 bio-informatics apps 6 HEP experiments
The EDG Intro– Tutorial - n° 37 Through links with sister projects, there is the potential for a truely global scientific applications grid Demonstrated at IST2002 and SC2002 in November Related Grid Projects