The EU DataGrid Architecture The European DataGrid Project Team

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow.
EU DataGrid TestBed 2 Component Review Paul Millar (University of Glasgow) (slides based on a presentation by Erwin Laure)
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Job Submission The European DataGrid Project Team
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
10 April 2003Deploy Grid in Israel Universities1 Deploy Grid testbed in Israel universities Lorne Levinson David Front Weizmann Institute.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
The EU DataGrid The European DataGrid Project Team
Olof Bärring – EDG WP4 status&plans- 22/10/ n° 1 Partner Logo EDG WP4 (fabric mgmt): status&plans Large Cluster.
The impact of R-GMA (upon WP1 and WP4). EDG (Paris) 6 Mar James MagowanImpact of R-GMA Grid Monitoring Architecture (GMA) We use it not only for.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Data Management The European DataGrid Project Team
DGC Paris Spitfire A Relational DB Service for the Grid Leanne Guy Peter Z. Kunszt Gavin McCance William Bell European DataGrid Data Management.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
The EDG Testbed Deployment Details
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Sergio Fantinel, INFN LNL/PD
Gridifying the LHCb Monte Carlo simulation system
The EU DataGrid Data Management
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

The EU DataGrid Architecture The European DataGrid Project Team

The EDG Architecture Tutorial - n° 2 Contents  Middleware architecture overview  EDG structure n Job scheduling n Fabric management n Data Management n Monitoring n Storage n Networking  Summary

The EDG Architecture Tutorial - n° 3 EDG middleware architecture Globus hourglass  Current EDG architectural functional blocks: n Basic Services ( authentication, authorization, Replica Catalog, secure file transfer,Info Providers) rely on Globus 2.0 (GSI, GRIS/GIIS,GRAM, MDS) OS & Net services Basic Services High level GRID middleware LHC VO common application layer Other apps ALICEATLASCMSLHCb Specific application layer Other apps GLOBUS 2.0 GRID middleware

The EDG Architecture Tutorial - n° 4 DataGrid Architecture Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication & Accounting Replica Catalog Storage Element Services Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book- keeping

The EDG Architecture Tutorial - n° 5 EDG middleware architecture: EDG interfaces Computing Elements System Managers Scientists Operating System File Systems Storage Elements Mass Storage Systems HPSS, Castor User Accounts Certificate Authorities Application Developers Batch Systems Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Config Managem. Config Managem. Node Installation Managem. Node Installation Managem. Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Managem. Fabric Storage Managem. Grid Application Layer Data Managem. Job Managem. Metadata Managem. Object to File Map Logging & Book- keeping

The EDG Architecture Tutorial - n° 6 EDG middleware architecture: The Workload Management System (WP1)  WP1 is responsible for the Workload Management System (WMS). The WMS is currently composed by the following parts: n User Interface (UI) : access point for the user to the GRID ( using JDL) n Resource Broker (RB) : the broker of GRID resources, matchmaking n Job Submission System (JSS) : Condor-G; interfacing batch systems n Information Index (II) : an LDAP server used as a filter to select resources n Logging and Bookkeeping services (LB) : MySQL databases to store Job Info

The EDG Architecture Tutorial - n° 7 WP1: Work Load Management Components Job Description Language Resource Broker Job Submission Service Information Index User Interface Logging & Bookkeeping Service Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping  Implementation: n UI : python (LB client : C++) n RB : C++ n JSS : C++, python n II : LDAP server n LB: MySQL, C++ n Input/Output Sandboxes: GridFTP Job Managem. SQL Database Services  WMS main interfaces: n Globus Gatekeeper n WP2 Replica Catalog APIs n WP3 Information Systems n WP7 network monitoring info providers n End User (using JDL files, on the UI)

The EDG Architecture Tutorial - n° 8 EDG middleware architecture: WP1 (WMS)

The EDG Architecture Tutorial - n° 9 EDG middleware architecture: WP2 (Data Management )  WP2 is responsible for Data Management, which includes file and replica management, metadata access and data security. WP2 components:  Replica Manager : the main manager for triggering replica execution all over the GRID, including replica optimization and interfacing the replica catalog service  Replica Catalog : a GRID service used to resolve Logical File Names into a set of corresponding Physical File Names – Globus Replica Catalog  GDMP : the GRID Data Mirroring Package, used to create replicas of any filetype all over the GRID Storage Elements in a synchronized way, by automatic updating the replica catalog  Spitfire : provides a Grid enabled middleware service for access to relational databases : it consists of the Spitfire Server module and the Spitfire Client libraries and command line executables.

The EDG Architecture Tutorial - n° 10 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D File Transfer

The EDG Architecture Tutorial - n° 11 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer

The EDG Architecture Tutorial - n° 12 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Selection: Get ‘best’ file

The EDG Architecture Tutorial - n° 13 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file

The EDG Architecture Tutorial - n° 14 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription

The EDG Architecture Tutorial - n° 15 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage

The EDG Architecture Tutorial - n° 16 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage

The EDG Architecture Tutorial - n° 17 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns

The EDG Architecture Tutorial - n° 18 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns

The EDG Architecture Tutorial - n° 19 Current State  File Transfer: Use GridFTP – deployed n Close collaboration with Globus n NetLogger (Brian Tierney and John Bresnahan)  Replication: GDMP – deployed n Wrapper around Globus ReplicaCatalog n All functionality in one integrated package n Using Globus 2 n Uses GridFTP for transferring file  Replication: edg-replica-manager – deployed  Replication: Replica Location Service Giggle – in testing n Distributed Replica Catalog  Replication: Replica Manager Reptor – in testing  Optimization: Replica Selection OptorSim – in simulation  Metadata Storage: SQL Database Service Spitfire – deployed n Servlets on HTTP(S) with XML (XSQL) n GSI enabled access + extensions  GSI interface to CASTOR – delivered

The EDG Architecture Tutorial - n° 20 WP2: Data Management Deployed Components GridFTP Replica Manager - edg-replica- manager Replica Catalog - globus-replica- catalog GDMP Spitfire Collective Services Info & Monitor Grid Scheduler Replica Manager Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Job Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping  Implementation: n RM: C++ classes (under development) n RC : Globus Replica Catalog wrapper n GDMP : C++ n Spitfire : Java, Web Services Data Managem. SQL Database Services  WP2 main interfaces: n The GRID Storage Element n WP1 Resource Broker APIs n WP3 GRID Info services n WP7 network monitoring info providers n End User (using GDMP) Storage Element Services

The EDG Architecture Tutorial - n° 21  Copy data file to storage element: globus-url-copy file:///${chemin}/L69999 gsiftp://lxshare0219.cern.ch/flatfiles/SE1/lhcb/L69999  Register stored data in the catalog: /opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf;/opt/edg/bin/gdmp_register_local_file -d /flatfiles/SE1/lhcb"  Publish catalog: /opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf; /opt/edg/bin/gdmp_publish_catalogue - n"  Copy output to MSS : n rfcp L /castor/cern.ch/lhcb/mc/L Example of Data Management by LHCb

The EDG Architecture Tutorial - n° 22 Replica Optimiser Replica Manager Replica Catalogue SE CE Replica Optimiser Replica Manager SE CE physical file transfer communication Client The Replica Manager APIs

The EDG Architecture Tutorial - n° 23 The Replica Manager APIs  RM.copy(PhysicalFileName source, PhysicalFileName destination, String protocol):Status n allows for third-party transfer n transfer between: s two StorageElements or s ComputingElement and Storage Element s Space management policies under development

The EDG Architecture Tutorial - n° 24  RM.add/deletePhysicalFileName(LogicalFileName lfn, PhysicalFileName pfn) n Replica Catalogue operations only - no file transfer  RM.copyAndAddPhysicalFile(PhysicalFileName source, PhysicalFileName destination, LogicalFileName lfn, String protocol):Status n third-party transfer but : files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)!  RM.deletePhysicalFile(LogicalFileName lfn, PhysicalFileName pfn) The Replica Manager APIs

The EDG Architecture Tutorial - n° 25 WP2 next generation Replication Services Replica Manager Replica Metadata Replica Location File Transfer Optimization Transaction Consistency Preprocessing Postprocessing Subscription Client Reptor Giggle RepMeC Optor GDMP

The EDG Architecture Tutorial - n° 26 Replication Services Architecture Replica Location Index Site Replica Manager Storage Element Computing Element Optimiser Resource Broker User Interface Pre-/Post- processing Core API Optimisation API Processing API Local Replica Catalog Replica Location Index Replica Metadata Catalog Replica Location Index Site Replica Manager Storage Element Computing Element Optimiser Pre-/Post- processing Local Replica Catalog

The EDG Architecture Tutorial - n° 27 Metadata Management and Security Project Spitfire  'Simple' Grid Persistency n Grid Metadata n Application Metadata n Unified Grid enabled front end to relational databases.  Metadata Replication and Consistency  Publish information on the metadata service Secure Grid Services  Grid authentication, authorization and access control mechanisms enabled in Spitfire  Modular design, reusable by other Grid Services

The EDG Architecture Tutorial - n° 28 Spitfire Architecture OracleDB2PostGresMySQL  Atomic RDBMS is always consistent  No local replication of data  Role-based authorization  XSQL Servlet as one access mode for ‘simple’ web access  Web/Grid Services Paradigm n SOAP interfaces n JDBC interface to RDBMS  Plugability and extensibility OracleLayerDB2LayerPGLayerMyLayer Local Spitfire Layer Connecting Layer Global Spitfire Layer SOAP

The EDG Architecture Tutorial - n° 29  WP3’s task is to provide information about The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information (as implemented by WP1's Logging and Bookkeeping). Grid applications This is information published by user jobs. This is used for performance monitoring. WP3 : GRID monitoring and Info Providers

The EDG Architecture Tutorial - n° 30  Main WP3 components: n MDS v 2.1: the Globus Monitoring and Discovery Services based on Soft State Registration protocols and LDAP aggregate directory services n Ftree : EDG developed directory service based on OpenLDAP plus caching to address shortcoming in MDS v1, optimizing data access performances n R-GMA: Relational GMA (Grid Monitoring Architecture [Consumers, Producers and Directory Services, GGF] ) implementation which makes information from producers available to consumers as relations (tables). It also uses relations to handle the registration of producers. R-GMA is consistent with GMA principles. n GRM / PROVE: Application monitoring and visualization tools of the P- GRADE graphical parallel programming environment, properly modified for application monitoring in the DataGrid. The instrumentation library of GRM is generalized for a flexible trace event specification. The components of GRM will be connected to the R-GMA using its Producer and Consumer APIs. WP3 : GRID monitoring and Info Providers

The EDG Architecture Tutorial - n° 31 R-GMA  Use the GMA from GGF  A relational implementation  Applied to both information and monitoring  Creates impression that you have one RDBMS per VO Producer Consumer Registry subscribe lookup

The EDG Architecture Tutorial - n° 32 Relational Approach  Producers announce:SQL “CREATE TABLE” publish:SQL “INSERT”  Consumers collect:SQL “SELECT”

The EDG Architecture Tutorial - n° 33 R-GMA  API – Servlet communication n http(s) in n XML back Sensor Code Producer API Application Code Consumer API ProducerServlet Registry API Registry Servlet Schema API Schema Servlet Consumer Servlet Registry API

The EDG Architecture Tutorial - n° 34 Schema & Contributions CPULoad (Global Schema) CountrySiteFacilityLoadTimestamp UKRALCDF UKRALATLAS UKGLACDF UKGLAALICE CHCERNALICE CHCERNCDF CPULoad (Producer3) CHCERNATLAS CHCERNCDF CPULoad (Producer 1) UKRALCDF UKRALATLAS CPULoad (Producer 2) UKGLACDF UKGLAALICE

The EDG Architecture Tutorial - n° 35 Contributions are Views CPULoad (Producer 1) UKRALCDF UKRALATLAS CPULoad (Producer 2) UKGLACDF UKGLAALICE SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’RAL’ SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’GLA’

The EDG Architecture Tutorial - n° 36 WP3: GRID Monitoring Components MDS / FTree R-GMA GRM/Prove Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping  Implementation: n MDS : LDAP, Globus GRIS, GIIS n FTree : OpenLDAP, caching n RGMA : Java, C++, MySQL, TomCat n GRM / PROVE : P-GRADE Job Managem. SQL Database Services  WP3 main interfaces: n WP1 Resource Broker ( InfoIndex) n WP2 RM optimizer n all GRID services producing info (SE,CE..) n WP7 network monitoring

The EDG Architecture Tutorial - n° 37  WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG.  User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services n Gridification – Grid interface to fabric resources n Resource Management – manage underlying batch services  Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance n Configuration Management n Installation Management n Fabric Monitoring EDG middleware architecture: WP4 : Fabric Management Components

The EDG Architecture Tutorial - n° 38 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview

The EDG Architecture Tutorial - n° 39 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - Interface between Grid-wide services and local fabric; - Provides local authentication, authorization and mapping of grid credentials. - Interface between Grid-wide services and local fabric; - Provides local authentication, authorization and mapping of grid credentials.

The EDG Architecture Tutorial - n° 40 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides transparent access (both job and admin) to different cluster batch systems; - enhanced capabilities (extended scheduling policies, advanced reservation, local accounting). - provides transparent access (both job and admin) to different cluster batch systems; - enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).

The EDG Architecture Tutorial - n° 41 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides the tools to install and manage all software running on the fabric nodes; -Agent to install, upgrade, remove and configure software packages on the nodes. -bootstrap services and software repositories. - provides the tools to install and manage all software running on the fabric nodes; -Agent to install, upgrade, remove and configure software packages on the nodes. -bootstrap services and software repositories.

The EDG Architecture Tutorial - n° 42 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview -provides a central storage and management of all fabric configuration information; -Compile HLD templates to LLD node profiles - central DB and set of protocols and APIs to store and retrieve information. -provides a central storage and management of all fabric configuration information; -Compile HLD templates to LLD node profiles - central DB and set of protocols and APIs to store and retrieve information.

The EDG Architecture Tutorial - n° 43 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides the tools for gathering monitoring information on fabric nodes; -central measurement repository stores all monitoring information; - fault tolerance correlation engines detect failures and trigger recovery actions. - provides the tools for gathering monitoring information on fabric nodes; -central measurement repository stores all monitoring information; - fault tolerance correlation engines detect failures and trigger recovery actions.

The EDG Architecture Tutorial - n° 44 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5)

The EDG Architecture Tutorial - n° 45 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Submit job

The EDG Architecture Tutorial - n° 46 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - publish resource and accounting information

The EDG Architecture Tutorial - n° 47 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Optimized selection of site

The EDG Architecture Tutorial - n° 48 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) -Authorize -Map grid  local credentials -Authorize -Map grid  local credentials

The EDG Architecture Tutorial - n° 49 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) -Select an optimal batch queue and submit -Return job status and output -Select an optimal batch queue and submit -Return job status and output

The EDG Architecture Tutorial - n° 50 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation

The EDG Architecture Tutorial - n° 51 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Node malfunction detected

The EDG Architecture Tutorial - n° 52 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation -Remove node from queue -Wait for running jobs(?) -Remove node from queue -Wait for running jobs(?)

The EDG Architecture Tutorial - n° 53 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Update configuration templates

The EDG Architecture Tutorial - n° 54 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Trigger repair

The EDG Architecture Tutorial - n° 55 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Repair (e.g. restart, reboot, reconfigure, …)

The EDG Architecture Tutorial - n° 56 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Node OK detected

The EDG Architecture Tutorial - n° 57 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation -Put back node in queue

The EDG Architecture Tutorial - n° 58 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation Automation

The EDG Architecture Tutorial - n° 59 LCFG (Local ConFiGuration system)  Widely used fabric tool, whose purpose is to handle automated installation and configuration in a very diverse and evolving environment  Mechanism: n Abstract configuration parameters are stored in a central repository located in the LCFG server. n Scripts on the host machine (LCFG client) read these configuration parameters and either generate traditional configuration files, or directly manipulate various services.

The EDG Architecture Tutorial - n° 60 WP4: Fabric Management Components LCFG Fabric Monitoring PBS & LSF info providers Image installation Config. Cache Mgr Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping  Implementation: LCFG : C++, XML, HTTP Job Managem. SQL Database Services  WP4 main interfaces: WP1 Resource Broker ( InfoIndex) WP2 Data management WP5 Storage Element WP3 GRID Info Services

The EDG Architecture Tutorial - n° 61  WP5 delivers the Grid interface to Storage.  Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services. WP5 : Mass Storage Management

The EDG Architecture Tutorial - n° 62 Interface 1 Interface 3 Interface 2 Message Queue Session Manager System LogHouse Keeping MetaData MSS Interface MSS Interface MSS1MSS2 Top layer Core Bottom layer Clients ( RB,JSS, RM, GDMP, InfoServices(WP3),User Applic running on CEs, CLIs) Storage Element The SE architecture

The EDG Architecture Tutorial - n° 63 ClientSE Replica Manager/Catalog Storage The Client asks a catalog to provide the location of a file 2.The catalog responds with the name of an SE 3.The client asks the SE for the file 4.The SE asks the storage system to provide the file 5.The storage system sends the file to the client through the SE or 6.directly 5 6 SE Interactions

The EDG Architecture Tutorial - n° 64 WP5: Mass Storage Management  Achievements n Definition of Architecture and Design for DataGrid storage Element n Collaboration with Globus on GridFTP/RFIO n Collaboration with PPDG on control API n Staging from/to CASTOR at CERN succesfully implemented and tested n Succesfully Interfaced to GDMP  Supported Storage Systems: n UNIX disk systems n HPSS (High Performance Storage System) n CASTOR (through RFIO) n GridFTP servers n DMF n Enstore Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services  WP5 (SE) main interfaces: WP1 Resource Broker & JSS WP2 RM, RC WP7 for GRIDftp monitoring WP3 GRID Info Services

The EDG Architecture Tutorial - n° 65 WP6: TestBed Integration and demonstrators  WP6 goals: the EDG testbed n Integration of EDG sw releases (currently 1.2) and deployment all over the EDG testbed : the integration team n Working implementation of multiple VOs & basic security infrastructure n Definition of acceptable usage contracts and creation of Certification Authorities group n Set up of the Authorization Working Group to manage authorization policies on the testbed Components Support for test-VO, mkgridmap tools Globus packaging & EDG config Build tools, CVS central s/w repository End-user documents Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authenticatio n Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services

The EDG Architecture Tutorial - n° 66 Further Information  DataGrid Dx.2 Deliverables: x=1..5  DataGrid D12.4 Deliverable