Download presentation
Presentation is loading. Please wait.
Published byCecily Wilcox Modified over 9 years ago
1
DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch EU DataGrid WP2 Manager EU DataGrid Data Management Services
2
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 2 Talk Outline Introdution to EU DataGrid workpackage 2 WP2 Service Design and Interactions Replication Services Spitfire Security Conclusions and outlook WP2 Members Diana Bosio, James Casey, Akos Frohner, Leanne Guy, Peter Kunszt, Erwin Laure, Levi Lucio, Heinz Stockinger, Kurt Stockinger - CERN Giuseppe Andronico, Federico DiCarlo, Andrea Domenici, Flavia Donno, Livio Salconi – INFN William Bell, David Cameron, Gavin McCance, Paul Millar, Caitriona Nicholson – PPARC, University of Glasgow Joni Hahkala, Niklas Karlsson, Ville Nenonen, Mika Silander, Marko Niinimäki – Helsinki Institute of Physics Olle Mulmo, Gian Luca Volpato – Swedish Research Council
3
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 3 The EU DataGrid Project 9.8 M Euros EU funding over 3 years, twice as much from partners 90% for middleware and applications (HEP, Earth Obs. and Bio Med.) Three year phased developments & demos (2001-2003) 2 nd annual project review successfully passed in Feb 2003! Total of 21 partners Research and Academic institutes as well as industrial companies Related projects and activities: DataTAG (2002-2003) CrossGrid (2002-2004) GRIDSTART (2002-2004) Grace (2002-2004)
4
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 4 EU DataGrid Project Objectives DataGrid is a project funded by European Union whose objective is to exploit and build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases. Enable data intensive sciences by providing world wide Grid test beds to large distributed scientific organisations ( “Virtual Organisations, VO”) Start ( Kick off ) : Jan 1, 2001 End : Dec 31, 2003 Applications/End Users Communities : HEP, Earth Observation, Biology Specific Project Objetives: Middleware for fabric & grid management Large scale testbed Production quality demonstrations To collaborate with and complement other European and US projects Contribute to Open Standards and international bodies ( GGF, Industry&Research forum)
5
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 5 DataGrid Main Partners CERN – International (Switzerland/France) CNRS - France ESA/ESRIN – International (Italy) INFN - Italy NIKHEF – The Netherlands PPARC - UK
6
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 6 Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France)
7
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 7 Project Schedule Project started on 1/Jan/2001 TestBed 0 (early 2001) International test bed 0 infrastructure deployed Globus 1 only - no EDG middleware Successful Project Review by EU: March 2002 TestBed 1 ( 2002 ) Successful 2 nd Project Review by EU: February 2003 TestBed 2 (Now) Some complete re-writes of components. Builds on TestBed 1 experience. TestBed 3 (Oktober 2003) Project stops on 31/Dec/2003, maybe a couple of months extension to wrap up and document results (no additional funding)
8
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 8 EDG Highlights All EU deliverables (40, >2000 pages) submitted in time for the review according to the contract technical annex First test bed delivered with real production demos All deliverables (code & documents) available via www.edg.org www.edg.org http://cern.ch/eu-datagrid/Deliverables/default.htm requirements, surveys, architecture, design, procedures, testbed analysis etc. Project re-orientation last year in August: From R&D Testbed to ‘production grid’
9
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 9 Working Areas Applications Middleware Infrastructure Management Testbed The DataGrid project is divided in 12 Work Packages distributed in four Working Areas
10
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 10 Work Packages WP1: Work Load Management System WP2: Data Management WP3: Grid Monitoring / Grid Information Systems WP4: Fabric Management WP5: Storage Element WP6: Testbed and demonstrators WP7: Network Monitoring WP8: High Energy Physics Applications WP9: Earth Observation WP10: Biology WP11: Dissemination WP12: Management
11
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 11 Trying hard to have a real GRID.. Testbed 0 : Grid technology was not mature enough Configuration and deployment issues Stability problems Obscure errors Project reorientation: Stability, Stability, Stability – TB 1 TB1 revealed a set of design bugs in Globus GASS Cache issue – fixed by Condor (rewritten) MyProxy issues – could never be used MDS did not scale – had to set up fake local info system Reingeneering of essential components – TB 2 New resource broker R-GMA instead of MDS as info system Concrete Support channels (VDT) New configuration tool LCFG-ng (from U of Edinburgh!)
12
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 12 Grid middleware architecture hourglass Current Grid architectural functional blocks: OS, Storage & Network services Basic Grid Services High Level Grid Services HEP Application Services (LCG) Common application layer CMSATLASCMSLHCb Specific application layer GLOBUS 2.2 EU DataGrid middleware Earth Observation and Biomed
13
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 13 EU DataGrid WP2 Data Management Work Package Responsible for Transparent data location and secure access Wide-area replication Data access optimization Metadata access NOT responsible for (but it has to be done) Data storage (WP5) Proper Relational Database bindings (Spitfire) Remote I/O (GFAL) Security infrastructure (VOMS)
14
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 14 WP2 Service Paradigms Choice of technology: Java-based servers using Web Services o Tomcat, Oracle 9iAS, soon WebSphere Interface definitions in WSDL Client stubs for many languages (Java, C, C++) o Axis, gSOAP Persistent service data in Relational Databases o MySQL, Oracle, soon DB2 Modularity Modular service design for pluggability and extensibility No vendor specific lock-ins Evolvable Easy adaptation to evolving standards (OGSA, WSDL 1.2) Largely independent of underlying OS, RDBMS – works on Windows too!
15
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 15 Storage Element Replication Services: Basic Functionality Replica Manager Replica Location Service Replica Metadata Catalog Storage Element Files have replicas stored at many Grid sites on Storage Elements. Each file has a unique Grid ID. Locations corresponding to the GUID are kept in the Replica Location Service. Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog. The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.
16
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 16 Storage Element Higher Level Replication Services Replica Manager Replica Location Service Replica Optimization Service Replica Metadata Catalog SE Monitor Network Monitor Replica Subscription Service Storage Element The Replica Manager may call on the Replica Optimization service to find the best replica among many based on network and SE monitoring. The Replica Subscription Service issues Replication commands automatically, based on a set of subscription rules defined by the user. Hooks for user-defined pre- and post- processing for replication operations are available.
17
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 17 Storage Element Interactions with other Grid components Replica Manager Replica Location Service Replica Optimization Service Replica Metadata Catalog SE Monitor Network Monitor Information Service Resource Broker User Interface or Worker Node Replica Subscription Service Storage Element Virtual Organization Membership Service Applications and users interface to data through the Replica Manager either directly or through the Resource Broker. Management calls should never go directly to the SE.
18
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 18 Replication Services Status Current Status All components are deployed right now – except for the RSS Initial tests show that expected performance can be met Need proper testing in a ‘real user environment’ – EDG2; LCG1 Features for next release Currently Worker Nodes need outbound connectivity – Replica Manager Service needed. Needs proper security delegation mechanism. Logical collections support Service-level authorization GUI
19
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 19 Spitfire: Grid-enabling RDBMS Capabilities: Simple Grid enabled front end to any type of local or remote RDBMS through secure SOAP-RPC Sample generic RDBMS methods may easily be customized with little additional development, providing WSDL interfaces Browser integration GSI authentication Local authorization mechanism Status: current version 2.1 Used by EU DataGrid Earth Observation and Biomedical applications. Next Step: OGSA-DAI interface
20
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 20 Spitfire added value : Security Grid security TrustManager deals with GSI proxy certificates Support for VOMS certificate extensions Secure java, c/c++, perl clients Local Authorization Mapping through Gridmap file supported Fine grained authorization hooks : a mapping service is provided to map VOMS extensions (group, role, capability) to DB roles that depending on the DB may be row-level authorization mechanisms (GRANT/DENY). Installation kit Easy installation and configuration of all security options
21
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 21 Spitfire customization Spitfire started as a ‘proof of technology’ for Web Services and java. Customizable into specific services dealing with persistent data All WP2 services are in this sense ‘Spitfire’ services (see later) Test platform for latest available codebase Gained experience with WSDL, JNDI, Tomcat, Axis, gSOAP Next things to try : JBOSS (for JMS, JMX) Experimental add-ons Secure browser using JSP (proxy certificates for mozilla, netscape, ie..) Distributed query agent drop-in Todo: OGSA-DAI interface as far as possible
22
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 22 RLS Architecture (evolved!) A hierarchical RLS topology: LRCs update RLIs, RLIs may forward information RLI LRC RLI LRC RLIs indexing over the full namespace (all LRCs are indexed) receiving updates directly RLI receiving updates from other RLIs LRC sending updates to all Tier 1 RLIs RLI LRC RLI
23
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 23 EDG Grid Catalogs (1/2) Replica Location Service (RLS) Local Replica Catalog (LRC) o Stores GUID to Physical File Name (PFN) mappings o Stores attributes on PFNs o Local Replica Catalogs in Grid : One per Storage Element (per VO) o Tested to 1.5M entries Replica Location Index (RLI) o Allow fast lookup of which sites store GUID -> PFN mappings for a given GUID o Replica Location Indices in the Grid :Normally one per Site (per VO), which indexes all LRCs in the Grid o Being deployed as part of EDG 2.1 in July n In the process of integration into other components o Tested to 10M entries in an RLI
24
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 24 EDG Grid Catalogs (2/2) Replica Metadata Catalog (RMC) Stores Logical File Name (LFN) to GUID mappings – user-defined aliases Stores attributes on LFNs and GUIDs One logical Replica Metadata Catalog in Grid (per VO) o Single point of synchronization – current assumption in EDG model o bottleneck ? - move to replicated distributed database No Application Metadata Catalog provided – see Spitfire But Replica Metadata Catalog has support for small level of application metadata – O(10) RMC usage not as well understood as Replica Location Service Architectural changes likely Use cases required
25
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 25 Typical Location of Services in LCG-1 Replica Location Index Local Replica Catalog Storage Element CNAF Replica Location Index Local Replica Catalog Storage Element RAL Replica Location Index Local Replica Catalog Storage Element CERN Replica Location Index Local Replica Catalog Storage Element IN2P3 Replica Metadata Catalog Storage Element
26
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 26 Catalog Implementation Details Catalogs implemented in Java as Web Services, and hosted in a J2EE application server Uses Tomcat4 or Oracle 9iAS for application server Uses Jakarta Axis for Web Services container Java and C++ client APIs currently provided using Jakarta Axis (Java) and gSoap (C++) Catalog data stored in a Relational Database Runs with either Oracle 9i or MySQL Catalog APIs exposed as a Web Service using WSDL Easy to write a new client if we don’t support your language right now Vendor neutral approach taken to allow different deployment options
27
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 27 Quality of Service Quality of Service depends upon both the server software and architecture used as well as the software components deployed on it Features required for high Quality of Service High Availability Manageability Monitoring Backup and Recovery with defined Service Level Agreements Approach Use vendor solutions for availability and manageability where available Use common IT-DB solutions for monitoring and recovery Components architected to allow easy deployment in high-availability environment A variety of solutions with different characteristics are possible
28
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 28 Tradeoffs in different solutions Manageability Availability Single Instance MySQL/Tomcat Clustered Oracle 9i/Tomcat Clustered Oracle 9i/9iAS Single Instance Oracle 9i/9iAS
29
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 29 Current Deployment Plans EDG All sites use MySQL/Tomcat single instance solution LCG-1 CERN deploys LRC/RLI/RMC on Oracle 9iAS/Oracle 9i single instance Tier-1 sites invited to use either Oracle 9iAS/Oracle or Tomcat4/MySQL single instance for their LRC/RLIs CERN IT-DB working on “easy-install” packaging of Oracle Oracle sees ease of install as a high priority for Oracle 10i o release date - Nov 2003 Allow deployment of an Oracle based solution without requiring a lot of Oracle expertise Testing of components for high-availability solution in progress Based on Oracle 9i Plan to be available for year-end 2003
30
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 30 System Architecture – High Availability Standard n-tier architecture Front end application layer load-balancer o Oracle 9iAS Web Cache Cluster of stateless application servers o Oracle 9iAS J2EE container Clustered database nodes o Oracle 9i/RAC Shared SAN storage o Fibre Channel storage
31
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 31 Security: Infrastructure for Java- based Web Services Trust Manager Mutual client-server authentication using GSI (ie PKI X509 certificates) for all WP2 services Supports everything transported over SSL Authorization Manager Supports coarse grained authorization: Mapping user->role->attribute Fine grained authorization through policies, role and attribute maps Web-based Admin interface for managing the authorization policies and tables Status: Fully implemented, authentication is enabled on the service level Delegation implementation needs to be finished Authorization needs more integration, waiting for deployment of VOMS
32
Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 32 Conclusions and outlook Re-focus on production has been a good but painful choice from hype to understanding the implications of wanting to run a production Grid reengineering of several components was necessary however, the project was not well prepared for this change – the timelines had to be constantly revised in the last year The second generation Data Management services have been designed and implemented based on the Web Service paradigm Flexible, extensible service framework Deployment choices : robust, highly available commercial products supported (eg. Oracle) as well as open-source (MySQL, Tomcat) First experiences with these services show that their performance meets the expectations Real-life usage will show its strengths and weaknesses on the LCG-1 and EDG2.0 testbeds during the rest of this year. Proceed with standardization efforts: DAI, RLS Carry over the experience into the next project : EGEE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.