Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 P.Kunszt Openlab 17.3.2003 Lessons learned from Data Management in the EU DataGrid Peter Kunszt CERN IT/DB EU DataGrid Data Management

Similar presentations


Presentation on theme: "1 P.Kunszt Openlab 17.3.2003 Lessons learned from Data Management in the EU DataGrid Peter Kunszt CERN IT/DB EU DataGrid Data Management"— Presentation transcript:

1 1 P.Kunszt Openlab 17.3.2003 Lessons learned from Data Management in the EU DataGrid Peter Kunszt CERN IT/DB EU DataGrid Data Management Peter.Kunszt@cern.ch

2 2 P.Kunszt Openlab 17.3.2003 Outline The EU DataGrid Data Management Architecture Mechanisms used, conclusions and requests

3 3 P.Kunszt Openlab 17.3.2003 EDG overview : goals DataGrid is a project funded by European Union whose objective is to exploit and build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases. Enable data intensive sciences by providing world wide Grid test beds to large distributed scientific organisations ( “Virtual Organisations, VOs”) Start ( Kick off ) : Jan 1, 2001 End : Dec 31, 2003 Applications/End Users Communities : HEP, Earth Observation, Biology Specific Project Objetives: –Middleware for fabric & grid management –Large scale testbed –Production quality demonstrations –Contribute to Open Standards and international bodies ( GGF, Industry&Research forum)

4 4 P.Kunszt Openlab 17.3.2003 EDG overview : Main Partners CERN – International (Switzerland/France) CNRS - France ESA/ESRIN – International (Italy) INFN - Italy NIKHEF – The Netherlands PPARC - UK

5 5 P.Kunszt Openlab 17.3.2003 Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden EDG overview : Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France)

6 6 P.Kunszt Openlab 17.3.2003 EDG overview : structure, work packages The EDG collaboration is structured in 12 Work Packages –WP1: Work Load Management System –WP2: Data Management –WP3: Grid Monitoring / Grid Information Systems –WP4: Fabric Management –WP5: Storage Element –WP6: Testbed and demonstrators –WP7: Network Monitoring –WP8: High Energy Physics Applications –WP9: Earth Observation –WP10: Biology –WP11: Dissemination –WP12: Management } Applications

7 7 P.Kunszt Openlab 17.3.2003 Grid Data Management Dependencies Performance Reliability Availability Usability Media Hardware Operating System Local File System Network Software Protocols Storage System

8 8 P.Kunszt Openlab 17.3.2003 RFIO NFS EDG Architecture v1.x GridFTP server Castor Staging daemon Storage Element Computing Element WN Replica Catalog GDMP User Interface edg-replica-manager

9 9 P.Kunszt Openlab 17.3.2003 I/O and Storage Modes of I/O on the EDG testbed –NFS mounted –RFIO (Castor) –GridFTP Mass Storage –Castor

10 10 P.Kunszt Openlab 17.3.2003 I/O and Storage Conclusions / shortcomings –NFSv2 on Linux not really suitable Does not scale to large Fabrics Cannot access remote files No proper security mapping –RFIO needs work Security and user control is not suitable for Grid users Remote I/O also has security issues Not standard, i.e. needs to be especially deployed at Grid sites –GridFTP Buggy protocol Compatibility issues between versions Only one implementation

11 11 P.Kunszt Openlab 17.3.2003 I/O and Storage Request for I/O level –fine grained access control lists –a wide variety of protocols –a wide variety of authentication, authorization and policy layers Storage management level –data pinning and lifetime management –space reservation capabilities –transparent mass storage bindings –inter-storage copy and communication

12 12 P.Kunszt Openlab 17.3.2003 Catalogs Replica Catalog: Storing logical to physical name mappings EDG used the Globus Replica-Catalog: –LDAP-based –Single point of access –Logical Name scheme is bound to Physical Name Conclusions –Such a solution does not scale for file catalogs, i.e. LDAP- based solutions are not suitable –Users did not like the Logical Name being restricted Request for –Fine grained access control of catalog data –Consistency checking in catalog –Pre-registration RLS, RMC to address most issues

13 13 P.Kunszt Openlab 17.3.2003 Catalogs Information Services: storing service status information EDG used Globus-MDS (Meta-computing Directory Service) –distributed LDAP with a given schema –local information services and global indices Conclusions –too many synchronization problems –not scalable enough –insufficient caching mechanisms Request for –Robust up-to-date information service in general –Management layer (schema evolution, ACL) –Different capabilities for different kind of information (location info, archived info, statistics, tickers) R-GMA to address most issues

14 14 P.Kunszt Openlab 17.3.2003 More Lessons Learned: Manageability Virtual Organization management: –The user base of a VO was managed in EDG through a single LDAP catalog. –VO membership needs to be properly exposed/interpreted by all services, applying VO and site policies –The administration of the VO catalog needs to be simplified, better ease of use. VOMS = Virtual Organization Membership Service –To address most of these issues, first version to be deployed this year

15 15 P.Kunszt Openlab 17.3.2003 More Lessons Learned: Security Security Infrastructure is a hard problem –Was not properly tackled by EDG. –GSI is a means to authenticate but not to authorize Issues –Delegation of rights to services –Service certificates –Automatic renewal of certificates –Kerberos tickets from certificates –User’s private keys VOMS to address some issues, extended capabilities of services – new EDG security design

16 16 P.Kunszt Openlab 17.3.2003 More Lessons Learned: Generic issues No clear concept in Grid community how to deal with data access and storage Grid Database bindings are not well tested yet. There is a clear drive for common interfaces between components – Open Grid Service Architecture effort. Service discovery and service monitoring architecture not well defined yet.

17 17 P.Kunszt Openlab 17.3.2003 And let’s not forget: Organization and Sociology The Grid is an inherently distributed environment also in this sense. Reaching agreements is hard work. –Common Timeline –Common Interfaces –Common Procedures –Common Policies –Definition of supported hardware –Common User support structure Forcing solutions down people’s throats does not work. Diversity of local policies is too large, but needs to be accomodated.

18 18 P.Kunszt Openlab 17.3.2003 Summary and Outlook Our first experimental phase of Grids is almost over We now have some experience with trying to run a production Grid and know its biggest deficiencies Components causing the most trouble are being replaced now. New experience will be gained with LCG-1 in the second half of this year. There are a lot of opportunities for openlab!


Download ppt "1 P.Kunszt Openlab 17.3.2003 Lessons learned from Data Management in the EU DataGrid Peter Kunszt CERN IT/DB EU DataGrid Data Management"

Similar presentations


Ads by Google