Download presentation
Presentation is loading. Please wait.
1
The EU DataGrid Architecture The European DataGrid Project Team http://www.eu-datagrid.org Peter.Kunszt@cern.ch
2
The EDG Architecture Tutorial - n° 2 Contents Middleware architecture overview EDG structure n Job scheduling n Fabric management n Data Management n Monitoring n Storage n Networking Summary
3
The EDG Architecture Tutorial - n° 3 EDG middleware architecture Globus hourglass Current EDG architectural functional blocks: n Basic Services ( authentication, authorization, Replica Catalog, secure file transfer,Info Providers) rely on Globus 2.0 (GSI, GRIS/GIIS,GRAM, MDS) OS & Net services Basic Services High level GRID middleware LHC VO common application layer Other apps ALICEATLASCMSLHCb Specific application layer Other apps GLOBUS 2.0 GRID middleware
4
The EDG Architecture Tutorial - n° 4 DataGrid Architecture Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication & Accounting Replica Catalog Storage Element Services Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Logging & Book- keeping
5
The EDG Architecture Tutorial - n° 5 EDG middleware architecture: EDG interfaces Computing Elements System Managers Scientists Operating System File Systems Storage Elements Mass Storage Systems HPSS, Castor User Accounts Certificate Authorities Application Developers Batch Systems Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Config Managem. Config Managem. Node Installation Managem. Node Installation Managem. Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Managem. Fabric Storage Managem. Grid Application Layer Data Managem. Job Managem. Metadata Managem. Object to File Map Logging & Book- keeping
6
The EDG Architecture Tutorial - n° 6 EDG middleware architecture: The Workload Management System (WP1) WP1 is responsible for the Workload Management System (WMS). The WMS is currently composed by the following parts: n User Interface (UI) : access point for the user to the GRID ( using JDL) n Resource Broker (RB) : the broker of GRID resources, matchmaking n Job Submission System (JSS) : Condor-G; interfacing batch systems n Information Index (II) : an LDAP server used as a filter to select resources n Logging and Bookkeeping services (LB) : MySQL databases to store Job Info
7
The EDG Architecture Tutorial - n° 7 WP1: Work Load Management Components Job Description Language Resource Broker Job Submission Service Information Index User Interface Logging & Bookkeeping Service Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n UI : python (LB client : C++) n RB : C++ n JSS : C++, python n II : LDAP server n LB: MySQL, C++ n Input/Output Sandboxes: GridFTP Job Managem. SQL Database Services WMS main interfaces: n Globus Gatekeeper n WP2 Replica Catalog APIs n WP3 Information Systems n WP7 network monitoring info providers n End User (using JDL files, on the UI)
8
The EDG Architecture Tutorial - n° 8 EDG middleware architecture: WP1 (WMS)
9
The EDG Architecture Tutorial - n° 9 EDG middleware architecture: WP2 (Data Management ) WP2 is responsible for Data Management, which includes file and replica management, metadata access and data security. WP2 components: Replica Manager : the main manager for triggering replica execution all over the GRID, including replica optimization and interfacing the replica catalog service Replica Catalog : a GRID service used to resolve Logical File Names into a set of corresponding Physical File Names – Globus Replica Catalog GDMP : the GRID Data Mirroring Package, used to create replicas of any filetype all over the GRID Storage Elements in a synchronized way, by automatic updating the replica catalog Spitfire : provides a Grid enabled middleware service for access to relational databases : it consists of the Spitfire Server module and the Spitfire Client libraries and command line executables.
10
The EDG Architecture Tutorial - n° 10 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D File Transfer
11
The EDG Architecture Tutorial - n° 11 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer
12
The EDG Architecture Tutorial - n° 12 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Selection: Get ‘best’ file
13
The EDG Architecture Tutorial - n° 13 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file
14
The EDG Architecture Tutorial - n° 14 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription
15
The EDG Architecture Tutorial - n° 15 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage
16
The EDG Architecture Tutorial - n° 16 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage
17
The EDG Architecture Tutorial - n° 17 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns
18
The EDG Architecture Tutorial - n° 18 File Management Site A Storage Element AStorage Element B Site B File B File AFile X File YFile B File AFile C File D Replica Catalog: Map Logical to Site files File Transfer Replica Manager: ‘atomic’ replication operation single client interface orchestrator Pre- Post-processing: Prepare files for transfer Validate files after transfer Replica Selection: Get ‘best’ file Replication Automation: Data Source subscription Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns
19
The EDG Architecture Tutorial - n° 19 Current State File Transfer: Use GridFTP – deployed n Close collaboration with Globus n NetLogger (Brian Tierney and John Bresnahan) Replication: GDMP – deployed n Wrapper around Globus ReplicaCatalog n All functionality in one integrated package n Using Globus 2 n Uses GridFTP for transferring file Replication: edg-replica-manager – deployed Replication: Replica Location Service Giggle – in testing n Distributed Replica Catalog Replication: Replica Manager Reptor – in testing Optimization: Replica Selection OptorSim – in simulation Metadata Storage: SQL Database Service Spitfire – deployed n Servlets on HTTP(S) with XML (XSQL) n GSI enabled access + extensions GSI interface to CASTOR – delivered
20
The EDG Architecture Tutorial - n° 20 WP2: Data Management Deployed Components GridFTP Replica Manager - edg-replica- manager Replica Catalog - globus-replica- catalog GDMP Spitfire Collective Services Info & Monitor Grid Scheduler Replica Manager Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Job Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n RM: C++ classes (under development) n RC : Globus Replica Catalog wrapper n GDMP : C++ n Spitfire : Java, Web Services Data Managem. SQL Database Services WP2 main interfaces: n The GRID Storage Element n WP1 Resource Broker APIs n WP3 GRID Info services n WP7 network monitoring info providers n End User (using GDMP) Storage Element Services
21
The EDG Architecture Tutorial - n° 21 Copy data file to storage element: globus-url-copy file:///${chemin}/L69999 gsiftp://lxshare0219.cern.ch/flatfiles/SE1/lhcb/L69999 Register stored data in the catalog: /opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf;/opt/edg/bin/gdmp_register_local_file -d /flatfiles/SE1/lhcb" Publish catalog: /opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf; /opt/edg/bin/gdmp_publish_catalogue - n" Copy output to MSS : n rfcp L1600061 /castor/cern.ch/lhcb/mc/L1600061 Example of Data Management by LHCb
22
The EDG Architecture Tutorial - n° 22 Replica Optimiser Replica Manager Replica Catalogue SE CE Replica Optimiser Replica Manager SE CE physical file transfer communication Client The Replica Manager APIs
23
The EDG Architecture Tutorial - n° 23 The Replica Manager APIs RM.copy(PhysicalFileName source, PhysicalFileName destination, String protocol):Status n allows for third-party transfer n transfer between: s two StorageElements or s ComputingElement and Storage Element s Space management policies under development
24
The EDG Architecture Tutorial - n° 24 RM.add/deletePhysicalFileName(LogicalFileName lfn, PhysicalFileName pfn) n Replica Catalogue operations only - no file transfer RM.copyAndAddPhysicalFile(PhysicalFileName source, PhysicalFileName destination, LogicalFileName lfn, String protocol):Status n third-party transfer but : files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)! RM.deletePhysicalFile(LogicalFileName lfn, PhysicalFileName pfn) The Replica Manager APIs
25
The EDG Architecture Tutorial - n° 25 WP2 next generation Replication Services Replica Manager Replica Metadata Replica Location File Transfer Optimization Transaction Consistency Preprocessing Postprocessing Subscription Client Reptor Giggle RepMeC Optor GDMP
26
The EDG Architecture Tutorial - n° 26 Replication Services Architecture Replica Location Index Site Replica Manager Storage Element Computing Element Optimiser Resource Broker User Interface Pre-/Post- processing Core API Optimisation API Processing API Local Replica Catalog Replica Location Index Replica Metadata Catalog Replica Location Index Site Replica Manager Storage Element Computing Element Optimiser Pre-/Post- processing Local Replica Catalog
27
The EDG Architecture Tutorial - n° 27 Metadata Management and Security Project Spitfire 'Simple' Grid Persistency n Grid Metadata n Application Metadata n Unified Grid enabled front end to relational databases. Metadata Replication and Consistency Publish information on the metadata service Secure Grid Services Grid authentication, authorization and access control mechanisms enabled in Spitfire Modular design, reusable by other Grid Services
28
The EDG Architecture Tutorial - n° 28 Spitfire Architecture OracleDB2PostGresMySQL Atomic RDBMS is always consistent No local replication of data Role-based authorization XSQL Servlet as one access mode for ‘simple’ web access Web/Grid Services Paradigm n SOAP interfaces n JDBC interface to RDBMS Plugability and extensibility OracleLayerDB2LayerPGLayerMyLayer Local Spitfire Layer Connecting Layer Global Spitfire Layer SOAP
29
The EDG Architecture Tutorial - n° 29 WP3’s task is to provide information about The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information (as implemented by WP1's Logging and Bookkeeping). Grid applications This is information published by user jobs. This is used for performance monitoring. WP3 : GRID monitoring and Info Providers
30
The EDG Architecture Tutorial - n° 30 Main WP3 components: n MDS v 2.1: the Globus Monitoring and Discovery Services based on Soft State Registration protocols and LDAP aggregate directory services n Ftree : EDG developed directory service based on OpenLDAP plus caching to address shortcoming in MDS v1, optimizing data access performances n R-GMA: Relational GMA (Grid Monitoring Architecture [Consumers, Producers and Directory Services, GGF] ) implementation which makes information from producers available to consumers as relations (tables). It also uses relations to handle the registration of producers. R-GMA is consistent with GMA principles. n GRM / PROVE: Application monitoring and visualization tools of the P- GRADE graphical parallel programming environment, properly modified for application monitoring in the DataGrid. The instrumentation library of GRM is generalized for a flexible trace event specification. The components of GRM will be connected to the R-GMA using its Producer and Consumer APIs. WP3 : GRID monitoring and Info Providers
31
The EDG Architecture Tutorial - n° 31 R-GMA Use the GMA from GGF A relational implementation Applied to both information and monitoring Creates impression that you have one RDBMS per VO Producer Consumer Registry subscribe lookup
32
The EDG Architecture Tutorial - n° 32 Relational Approach Producers announce:SQL “CREATE TABLE” publish:SQL “INSERT” Consumers collect:SQL “SELECT”
33
The EDG Architecture Tutorial - n° 33 R-GMA API – Servlet communication n http(s) in n XML back Sensor Code Producer API Application Code Consumer API ProducerServlet Registry API Registry Servlet Schema API Schema Servlet Consumer Servlet Registry API
34
The EDG Architecture Tutorial - n° 34 Schema & Contributions CPULoad (Global Schema) CountrySiteFacilityLoadTimestamp UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002 CHCERNALICE0.919055611022002 CHCERNCDF0.619055511022002 CPULoad (Producer3) CHCERNATLAS1.619055611022002 CHCERNCDF0.619055511022002 CPULoad (Producer 1) UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 CPULoad (Producer 2) UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002
35
The EDG Architecture Tutorial - n° 35 Contributions are Views CPULoad (Producer 1) UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 CPULoad (Producer 2) UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002 SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’RAL’ SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’GLA’
36
The EDG Architecture Tutorial - n° 36 WP3: GRID Monitoring Components MDS / FTree R-GMA GRM/Prove Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: n MDS : LDAP, Globus GRIS, GIIS n FTree : OpenLDAP, caching n RGMA : Java, C++, MySQL, TomCat n GRM / PROVE : P-GRADE Job Managem. SQL Database Services WP3 main interfaces: n WP1 Resource Broker ( InfoIndex) n WP2 RM optimizer n all GRID services producing info (SE,CE..) n WP7 network monitoring
37
The EDG Architecture Tutorial - n° 37 WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG. User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services n Gridification – Grid interface to fabric resources n Resource Management – manage underlying batch services Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance n Configuration Management n Installation Management n Fabric Monitoring EDG middleware architecture: WP4 : Fabric Management Components
38
The EDG Architecture Tutorial - n° 38 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview
39
The EDG Architecture Tutorial - n° 39 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - Interface between Grid-wide services and local fabric; - Provides local authentication, authorization and mapping of grid credentials. - Interface between Grid-wide services and local fabric; - Provides local authentication, authorization and mapping of grid credentials.
40
The EDG Architecture Tutorial - n° 40 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides transparent access (both job and admin) to different cluster batch systems; - enhanced capabilities (extended scheduling policies, advanced reservation, local accounting). - provides transparent access (both job and admin) to different cluster batch systems; - enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).
41
The EDG Architecture Tutorial - n° 41 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides the tools to install and manage all software running on the fabric nodes; -Agent to install, upgrade, remove and configure software packages on the nodes. -bootstrap services and software repositories. - provides the tools to install and manage all software running on the fabric nodes; -Agent to install, upgrade, remove and configure software packages on the nodes. -bootstrap services and software repositories.
42
The EDG Architecture Tutorial - n° 42 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview -provides a central storage and management of all fabric configuration information; -Compile HLD templates to LLD node profiles - central DB and set of protocols and APIs to store and retrieve information. -provides a central storage and management of all fabric configuration information; -Compile HLD templates to LLD node profiles - central DB and set of protocols and APIs to store and retrieve information.
43
The EDG Architecture Tutorial - n° 43 Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) WP4 Architecture logical overview - provides the tools for gathering monitoring information on fabric nodes; -central measurement repository stores all monitoring information; - fault tolerance correlation engines detect failures and trigger recovery actions. - provides the tools for gathering monitoring information on fabric nodes; -central measurement repository stores all monitoring information; - fault tolerance correlation engines detect failures and trigger recovery actions.
44
The EDG Architecture Tutorial - n° 44 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5)
45
The EDG Architecture Tutorial - n° 45 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Submit job
46
The EDG Architecture Tutorial - n° 46 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - publish resource and accounting information
47
The EDG Architecture Tutorial - n° 47 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Optimized selection of site
48
The EDG Architecture Tutorial - n° 48 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) -Authorize -Map grid local credentials -Authorize -Map grid local credentials
49
The EDG Architecture Tutorial - n° 49 User job management (Grid and local) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) -Select an optimal batch queue and submit -Return job status and output -Select an optimal batch queue and submit -Return job status and output
50
The EDG Architecture Tutorial - n° 50 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation
51
The EDG Architecture Tutorial - n° 51 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Node malfunction detected
52
The EDG Architecture Tutorial - n° 52 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation -Remove node from queue -Wait for running jobs(?) -Remove node from queue -Wait for running jobs(?)
53
The EDG Architecture Tutorial - n° 53 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Update configuration templates
54
The EDG Architecture Tutorial - n° 54 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Trigger repair
55
The EDG Architecture Tutorial - n° 55 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Repair (e.g. restart, reboot, reconfigure, …)
56
The EDG Architecture Tutorial - n° 56 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Node OK detected
57
The EDG Architecture Tutorial - n° 57 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation -Put back node in queue
58
The EDG Architecture Tutorial - n° 58 Automated management of large clusters WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation Automation
59
The EDG Architecture Tutorial - n° 59 LCFG (Local ConFiGuration system) Widely used fabric tool, whose purpose is to handle automated installation and configuration in a very diverse and evolving environment Mechanism: n Abstract configuration parameters are stored in a central repository located in the LCFG server. n Scripts on the host machine (LCFG client) read these configuration parameters and either generate traditional configuration files, or directly manipulate various services.
60
The EDG Architecture Tutorial - n° 60 WP4: Fabric Management Components LCFG Fabric Monitoring PBS & LSF info providers Image installation Config. Cache Mgr Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Implementation: LCFG : C++, XML, HTTP Job Managem. SQL Database Services WP4 main interfaces: WP1 Resource Broker ( InfoIndex) WP2 Data management WP5 Storage Element WP3 GRID Info Services
61
The EDG Architecture Tutorial - n° 61 WP5 delivers the Grid interface to Storage. Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services. WP5 : Mass Storage Management
62
The EDG Architecture Tutorial - n° 62 Interface 1 Interface 3 Interface 2 Message Queue Session Manager System LogHouse Keeping MetaData MSS Interface MSS Interface MSS1MSS2 Top layer Core Bottom layer Clients ( RB,JSS, RM, GDMP, InfoServices(WP3),User Applic running on CEs, CLIs) Storage Element The SE architecture
63
The EDG Architecture Tutorial - n° 63 ClientSE Replica Manager/Catalog Storage 6 2 3 4 1 1.The Client asks a catalog to provide the location of a file 2.The catalog responds with the name of an SE 3.The client asks the SE for the file 4.The SE asks the storage system to provide the file 5.The storage system sends the file to the client through the SE or 6.directly 5 6 SE Interactions
64
The EDG Architecture Tutorial - n° 64 WP5: Mass Storage Management Achievements n Definition of Architecture and Design for DataGrid storage Element n Collaboration with Globus on GridFTP/RFIO n Collaboration with PPDG on control API n Staging from/to CASTOR at CERN succesfully implemented and tested n Succesfully Interfaced to GDMP Supported Storage Systems: n UNIX disk systems n HPSS (High Performance Storage System) n CASTOR (through RFIO) n GridFTP servers n DMF n Enstore Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services WP5 (SE) main interfaces: WP1 Resource Broker & JSS WP2 RM, RC WP7 for GRIDftp monitoring WP3 GRID Info Services
65
The EDG Architecture Tutorial - n° 65 WP6: TestBed Integration and demonstrators WP6 goals: the EDG testbed n Integration of EDG sw releases (currently 1.2) and deployment all over the EDG testbed : the integration team n Working implementation of multiple VOs & basic security infrastructure n Definition of acceptable usage contracts and creation of Certification Authorities group n Set up of the Authorization Working Group to manage authorization policies on the testbed Components Support for test-VO, mkgridmap tools Globus packaging & EDG config Build tools, CVS central s/w repository End-user documents Collective Services Info & Monitor Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authenticatio n Accounting Replica Catalog Storage Element Services Fabric services Config Management Config Management Node Installation Management Node Installation Management Monitoring Fault Tolerance Monitoring Fault Tolerance Resource Managem. Fabric Storage Management Fabric Storage Management Grid Application Layer Data Managem. Metadata Managem. Object to File Mapping Logging & Book- keeping Job Managem. SQL Database Services
66
The EDG Architecture Tutorial - n° 66 Further Information DataGrid Dx.2 Deliverables: x=1..5 DataGrid D12.4 Deliverable
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.