Download presentation
Presentation is loading. Please wait.
Published byAmanda Crawford Modified over 9 years ago
1
Enabling Grids for E-sciencE www.eu-egee.org gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005
2
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 2 Outline Contents of gLite release 1.0 –Components, architecture, and service interplay –Major differences to LCG-2 –Major open issues –Future plans Deployment plan –LCG-2 vs gLite –gLite/LCG-2 coexistence –Current status of certification – Preproduction service
3
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 3 WMS Reengineering of the LCG-2 Workload Management System –Support partitioned jobs and jobs with dependencies Might be considered as setup for the production machinery Would help solving problems of pre-staging etc … Side effects should be investigated and considered as well. –Task Queue Persistent queue for submitted jobs –Information Supermarket Read only information system cache Updated by Information systems (CE in push mode) CEMon (CE in pull mode) Combination of both Allows WMS to work in push and pull mode
4
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 4 WMS Cont’d –Interface to Data Management EDG-RLS StorageIndex DLI –Condor-C Job submission mechanism between the WM and the CE Improvement in term of reliability in respect of globus GRAM –CE moving towards a VO based scheduler Future: –Web services –Bulk submission
5
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 5 WMS Cont’d Major problems –Failure rate ~12% (retrycount = 0), otherwise 100% success Several reasons being investigated (e.g. race conditions) Shallow re-submission (i.e. retry of submission, not execution) might help –Matchmaking is being blocked sometimes Fix provided for Release 1.1 (end of April) –Condor as backend not yet working –Not yet final architecture of CE: One Schedd per local user id Need setuid services and head node monitoring (Globus+JRA3)
6
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 6 Mostly new developments based on (and re-using some) AliEn services Storage Element –Storage Resource Manager rely on existing implementations –POSIX-I/O gLite-I/O –Access protocolsgsiftp, gsidcap, rfio, … Catalogs –File Catalog –Replica Catalog –File Authorization Service –Metadata Catalog File Transfer –Data Scheduler planned for Release 2 –File Transfer ServicegLite FTS and glite-url-copy –File Placement ServicegLite FPS Data Management Services gLite FiReMan Catalog (MySQL and Oracle) gLite Metadata Catalog
7
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 7 Addressing shortcomings of LCG-2 data management –RLS performance –Lack of consistent grid-storage interfaces –Unreliable data transfer layer Fireman Catalog –Hierarchical Name Space –Bulk Operations –ACLs –Web Services Interface –POOL Interface –Performance/scalability gLite I/O –Support of ACL’s –Support of Fireman catalog in addition to RLS File Transfer Service –Did not exist on LCG-2: to be evaluated in SC3 Data Management Cont’d
8
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 8 Catalogs Currently: Global catalog Future: can think about Local File catalog –Central location index (Storage Index) necessary –Lightweight and therefore more scalable –Yet to be demonstrated Comparison Fireman-LFC –Single files: LFC faster –Fireman offers Bulk capabilities. LFC does not. “Within the LHC experiments there is yet no decision - they really want to test both LFC and Fireman and select based on their needs. It is likely that different applications will choose a different catalogue. Since these are application dependent we will probably be in a situation where we have different VOs using different catalogues.”
9
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 9 Information and Monitoring Services R-GMA (Relational Grid Monitoring Architecture) –Implements GGF GMA standard –Development started in EDG, deployed on the production infrastructure for accounting Producer Service Registry Service Consumer Service API Mediator Schema Service Consumer application Producer application Publish Tuples Send Query Receive Tuples Register Locate Query Tuples SQL “CREATE TABLE” SQL “INSERT” SQL “SELECT”
10
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 10 Job Monitoring Currently ATLAS relies on the ProdDB and GridICE server. R-GMA and GridICE can coexist –At the momen, information from the GridICE sensors can be published by R-GMA –Monitoring tools (Magoo) can therefore interface to R-GMA directly –GridICE server still quite useful to quickly visualize status of resources R-GMA can be used for application monitoring R-GMA is being used since several months already for accounting and operations. Don’t forget Laurence Field mini tutorial on R-GMA architecture and usage –End of May (not sure on the final date yet).
11
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 11 VOMS Derived from DataTag and EDG Used for VO mgmt –VOMS certificates understood by WMS and Data Mgmt (as of release 1.1) RFC compliance Major problems –Incompatibility with previous VOMS versions Due to RFC compliance
12
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 12 Future Plans - WMS Move CE to final architecture –One scheduler per VO (can be provided by VO) –Head-node monitor and fork/set-uid service WS interface to WMS –With better support for bulk job-submission Support for pilot jobs Integration of network information (JRA4) Closer integration with Data Mgmt –Common job and data transfer DAGs –Data matchmaking for ranking “shallow” job-resubmission CE history used for ranking Use information from R-GMA in the information supermarket
13
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 13 Future Plans - DM Security in DM chain –Delegation –Work out security model (local vs. Grid) –Support SRMs with native ACL support –VOMS roles for ACL’s Distributed/Partitioned Catalogs –Need to define model Integration of network information (JRA4) Explore XROOTD Harmonize metadata interface (ARDA, PTF) Data Scheduler (equivalent to WMS for data transfer requests)
14
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 14 Outline Contents of gLite release 1.0 –Components, architecture, and service interplay –Major differences to LCG-2 –Major open issues –Future plans Deployment plan –LCG-2 vs gLite –gLite/LCG-2 coexistence –Current status of certification – Preproduction service
15
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 15 LCG2 Services Client Libs Services CE R-GMA MonBox SE classic SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM LCG WN VOMS VO (ldap) RLS LFC Services/VO LCG UI MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services R-GMA Registries monitoring SFT FCR Global Services APEL
16
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 16 gLite 1 Services/VO VOMS FireMan CE push/pull R-GMA MonBox SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SRM SE gLite I/O gLiteWN R-GMA Registries Global Services DGAS?? Client Libs Services Generic Services gLite UI Services are generic, but file ownership limits access to gLite I/O service MyProxy WLM CIC Services MyProxy WLM CIC Services MyProxy WLM CIC Services Link between CEs and SEs via CEMon (later by R-GMA, temp. BDII)
17
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 17 gLite -LCG2 Coexistence Services/VO VOMS FireMan SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM gLite I/O R-GMA Registries Global Services DGAS Client Libs Services Generic Services CE push/pull R-GMA MonBox CE R-GMA MonBox SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM VOMS VO (ldap) RLS LFC Services/VO gLite UI LCG UI MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services R-GMA Registries monitoring SFT FCR Global Services APEL MyProxy WLM CIC Services MyProxy WLM CIC Services MyProxy WLM CIC Services gLiteWN&LCG2 BDII Services are generic, but file ownership limits access to gLite I/O service, could be worked around….. Can be hosted on the same node WLM can get info on CEs and SEs through CE Mon or BDII WLM can interface via DLI to LFC, but the gLiteI/O not
18
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 18 Model Pro: The real gLite experience No mixing, except for the WNs and UIs No intense interoperability test needed New versions can be released more quickly after they become available LCG-2 can be evolved independently Cons: –Many additional services More to come Not a valid model for small sites –Complex trickery needed to allow access to new/old data for the two systems –LCG-2 can be evolved independently
19
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 19 gLite - LCG2 step 1 Client Libs Services Generic Services CE push/pull CE R-GMA MonBox SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM VOMS VO (ldap) RLS LFC Services/VO gLite UI LCG UI MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services MyProxy BDII RB CIC Services R-GMA Registries monitoring SFT FCR Global Services APEL MyProxy WLM CIC Services MyProxy WLM CIC Services MyProxy WLM CIC Services gLiteWN&LCG2 Classic SE
20
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 20 MyProxy BDII WLM MyProxy BDII WLM MyProxy BDII WLM CIC Services gLite - LCG2 step 2 Client Libs Services Generic Services CE push/pull R-GMA MonBox SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM VOMS LFC Services/VO gLite UI R-GMA Registries monitoring SFT FCR Global Services APEL gLiteWN&LCG2
21
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 21 MyProxy BDII WLM MyProxy BDII WLM MyProxy BDII?? WLM CIC Services gLite - LCG2 step 2 Client Libs Services Generic Services CE push/pull R-GMA MonBox SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM VOMS LFC FireMan Services/VO gLite UI R-GMA Registries monitoring SFT FCR?? Global Services DGAS gLiteWN gLiteWN,LCG-utils, gfal… gLite IO Ownership adjusted for access via gLite IO Contains LFC data LFC SE CASTOR SE CASTOR SE dCache SE dCache SE DPM SE DPM SE SRM Users can decide to use different (multiple) catalogues, as long as they provide a DLI interface Users can opt to use direct access to the storage via the LCG tools. However there will be limitations concerning the accessibility of the data.
22
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 22 Finer Points Interoperation with other grids –Mechanism to select between different client libs on the WN (done) –Need to keep an information system that can be used by LCG-3 and OSG(grid3) Operations –We have to adapt our operations services to gLite production Substantial effort Needed for preproduction service –Monitoring has to be ported Partially done already Time? –Should be driven by VOs and experience gained –Fixed schedules tend to be not workable
23
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 23 To be clear on what is being certified Release 1.0 of gLite http://glite.web.cern.ch/glite/packages/R1.0/R20050331/default.asp Information system is a combination of BDII and CE Mon –Locations of SEs can only be stored in the BDII. This data is entered manually into the BDII and is static. –In this release no gLite services use R-GMA as information system, although R-GMA services are in the release and will be tested. No File Placement / File Transfer service –Using gridFTP on the certification test bed No accounting
24
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 24 When will gLite 1.0 be certified? Criteria for certification need to be agreed, but will probably include items such as: –Meeting SA1’s requirements on “deployability” –Successful completion of the gLite certification test suite –Job failure rate ≤ LCG-2 job failure rate –Stability of the system being acceptable –Acceptable number of critical/major bugs outstanding First attempt at certifying gLite therefore impossible to predict when it will be finished
25
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 25 Pre-production Service Pre-production planned in two phases: –Phase I: Started when deployment of Certification testbed was successfully completed (middle of last week). Sites: CNAF, NIKHEF, PIC(?), CESGA and CERN These sites will install all gLite site components + some gLite core service(s) (for resilience) + LCG-2 site components (for investigating migration) CESGA site is already being used by ARDA CMS! –Phase II: Will start when the phase I sites are fully operational >12 sites in phase II All sites will install gLite; some will also install LCG-2
26
Enabling Grids for E-sciencE Simone.Campana@cern.ch ATLAS production meeting, May-2-2005 26 Infn-ECGI activity ECGI = Experiment Computing Grid Integration Working group created to –Help LHC experiments to understand and experiment functionalities and features offered by the new gLite MW –Provide adequate user documentation Will work in strict collaboration and coordination with –the Experiment Integration Support (EIS) at CERNEIS –the developers of the EGEE/LCG Grid Middleware. Whenever possible and needed the group will also –create system administration documentation –use dedicated resources for testing purposes.dedicated resources http://infn-ecgi.pi.infn.it/index.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.