Presentation is loading. Please wait.

Presentation is loading. Please wait.

C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction.

Similar presentations


Presentation on theme: "C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction."— Presentation transcript:

1 C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction & Goals EDG Architecture EDG Deployment & Use External Software Typical Failure Modes Future Developments

2 C. Loomis – Status EDG – Dec. 12, 2002 – 2 European DataGrid (EDG) European DataGrid  EU-funded, 3-year project (2001-3)  Goals: —develop grid middleware —deploy onto working testbed —demonstrate grid technology with working applications  Strong application component unique! EDG Organization WP1Workload Mgt. WP2Data Mgt. WP3Info. & Monitoring Sys. WP4Fabric Mgt. WP5Storage Mgt. WP6Testbed WP7Networking WP8HEP Apps. WP9Biomedical Apps. WP10Earth Ob. Apps. WP11Dissemination WP12Project Mgt. 6 Partners; 21 Associates

3 C. Loomis – Status EDG – Dec. 12, 2002 – 3 EDG Goals Actors End Users Virtual Organization Site Administrators Transparent Access  Allow users transparent access to authorized resources with single authentication.  Allow users to delegate authorization to services.  High-level selection of resources, including datasets. Virtual Organizations  Allow groups of people to acquire resources from sites.  Allow organization to manage resource use among members. Optimization  Allow optimal use of resources at site and grid levels.

4 C. Loomis – Status EDG – Dec. 12, 2002 – 4 EDG Architecture Global Batch System:  Centralized Architecture.  Heavy infrastructure. Computing Element Storage Element Site X Information Systems submit query retrieve broker chooses optimal site for job Resource Broker User Interface publish state MDS Replica Catalogs

5 C. Loomis – Status EDG – Dec. 12, 2002 – 5 Comments Optimization of Resources  Centralized Architecture  Resource Broker —must know state of grid and schedule effectively —requires knowledge of site policies and user/job details  Information System (MDS & RC) —must respond quickly to high-volume and high-rate queries Central Points-of-Failure  Resource Broker (redundancy at VO-level)  MDS (unique hierarchy; some redundancy possible) With high-rate submissions:  RB requires lots of memory, CPU, disk space.  MDS requires lots of file descriptors, CPU.

6 C. Loomis – Status EDG – Dec. 12, 2002 – 6 Authentication & Authorization Computing Element Storage Element Site X Update CRL retrieve membership lists proxy sent for authentication User Certification Authorities Virtual Organizations register request certificate accept/reject request ~10 Different VOs ATLAS, CMS, … ~15 National CAs France, INFN, … /C=FR/O=CNRS/OU=LAL/CN=Charles Loomis/Email=loomis@lal.in2p3.fr

7 C. Loomis – Status EDG – Dec. 12, 2002 – 7 Comments Infrastructure  ~15 National CAs as production service  10 Virtual Organizations —High-Energy Physics: ALICE, BaBar, ATLAS, CMS, DZero, LHCb —Earth Observation —Biomedical Applications —Misc.: WP6, ITeam, Guidelines Limited Central Points-of-Failure  VO Membership Server (for VO members)  Certification Authority (for CA members) Caching, infrequent updates minimize problems; compromise security.

8 C. Loomis – Status EDG – Dec. 12, 2002 – 8 Deployment & Use Development Testbed (1.4.0)  To facilitate testing and integration of new middleware.  3 sites (3 countries) SiteLocationCPUs CC-IN2P3Lyon (F)4005 CERNGeneva (CH)1646 CNAFBologna (I)40 LegnaroLegnaro (I)50 NIKHEFAmsterdam (NL)22 PadovaPadova (I)12 RALRutherford (GB)162 Production Testbed (1.4.0)  For applications to use & stress software in “semi-production” environment.  8 sites (5 countries) Application Use  CMS Event Simulation  ATLAS Event Simulation  Regular Tutorial Use Stability  Filled Grid this week!

9 C. Loomis – Status EDG – Dec. 12, 2002 – 9 Globus Experience GSI Security (OK)  Some limitations with size of proxies. GridFTP (OK)  Recent protocol change because of security fix. Replica Catalog (OK, limited)  Unannounced, unnecessary schema change. GateKeeper/JobManager (Poor)  Race conditions under load leading to failures.  High resource use; poor response to errors. Information System-MDS (Poor)  Serious problems with stability.  Query times increase dramatically under load.

10 C. Loomis – Status EDG – Dec. 12, 2002 – 10 Globus Experience (cont.) Interaction  Generally responsive to identified problems.  Little advance warning of major changes. —Schema changes. —Rewrite of JobManager/Batch System interface. Testing  Essentially non-existent by Globus.  Major delays in EDG because of MDS and Gatekeeper.  Finding/testing/fixing of major problems done outside Globus. Globus “high-level” services inappropriate for production environment.

11 C. Loomis – Status EDG – Dec. 12, 2002 – 11 Condor Experience CondorG  Used for reliable job submission from Resource Broker.  Responsive to problems and provide quick fixes.  Encountered few problems in our testing. Condor  Supported “batch” system for EDG.  Largely untested, but expect to use with next major release.

12 C. Loomis – Status EDG – Dec. 12, 2002 – 12 Typical Failure Modes Operations:  CRL generation (CA); CRL update (sites)  Network accessibility (VO LDAP servers)  Misconfiguration of services (typically SE) Poor implementation (BUGS)  Most catastrophic ones eliminated. Resource Exhaustion  File descriptors, ports, disk space. Design Limitations  Central points-of-failure (RB, MDS).

13 C. Loomis – Status EDG – Dec. 12, 2002 – 13 Future Developments EDG Plans:  Advanced data management —Real “Storage Element”. —Replica Location Service (distributed Replica Catalog) —Replica Manager (higher-level user interface)  Job Management —job splitting, checkpointing —interactive jobs  Replace MDS with R-GMA.  More robust, consistent security model.  Local resources better tied to grid credentials. OGSA (Open Grid Services Architecture)  New services written as web services.  Probably no complete conversion with EDG lifetime.

14 C. Loomis – Status EDG – Dec. 12, 2002 – 14 SlashGrid Grid File System:  Uses grid credentials for access to local files.  Frees grid user from local unix account. —Simplifies mapping of users to accounts. —Allows true account recycling. More Uses:  Could hide remote access to data.  Provide compatibility to Globus security model.  … Implementation:  User-space daemon on top of CODA kernel module.  Plug-in interface allows easy extension.

15 C. Loomis – Status EDG – Dec. 12, 2002 – 15 Authentication & Authorization (VOMS) Computing Element Storage Element Site X Update CRL proxy sent for authentication and authorization User Certification Authorities VOMS request “ticket” request certificate accept/reject request ~15 National CAs France, INFN, … Local Authorization Decision!

16 C. Loomis – Status EDG – Dec. 12, 2002 – 16 Conclusions Software & Testbed:  Production-quality security infrastructure in place.  Production and development testbeds: —Deployed. —Starting to see heavy use by end-users. —Reasonable stability for the first time.  Failure modes: —Moving from bugs and operations problems to design and resource limitations. Unanswered Questions:  Can optimization be achieved? At what level?  How can resources be limited, reserved, and shared?  Can efficient scheduling be done with inhomogeneous site policies?


Download ppt "C. Loomis – Status EDG – Dec. 12, 2002 – 1 Status of the European DataGrid Project Charles Loomis (LAL/CNRS) LAL December 12, 2002 Outline Introduction."

Similar presentations


Ads by Google