Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE is a project funded by the European Union under contract IST-2003-508833 EGEE Middleware Robin Middleton with much (most) material from Bob Jones.

Similar presentations


Presentation on theme: "EGEE is a project funded by the European Union under contract IST-2003-508833 EGEE Middleware Robin Middleton with much (most) material from Bob Jones."— Presentation transcript:

1 EGEE is a project funded by the European Union under contract IST-2003-508833 EGEE Middleware Robin Middleton with much (most) material from Bob Jones Frederic Hemmer Erwin Laure GridPP EB-TB Meeting, 13 th May 2004 www.eu-egee.org

2 Contents Introduction EGEE structure, activities, … Middleware = JRA1  Organisation  Design Team  Service Oriented Architecture  Initial Components  EGEE Middleware Prototype  Integration, Testing & SCM  JRA1 & External Links

3 Introduction

4 Overview 70 partners (funded) + many unfunded contributions  ~11 Federations ~ €32M EU funds  ~ €60M in total 2 years (initially) started 1 st April 2004 The EGEE Vision  To deliver production level Grid services, the essential elements of which are manageability, robustness, resilience to failure, and a consistent security model, as well as the scalability needed to rapidly absorb new resources as these become available, while ensuring the long-term viability of the infrastructure.  To carry out a professional Grid middleware re-engineering activity in support of the production services. This will support and continuously upgrade a suite of software tools capable of providing production level Grid services to a base of users which is anticipated to rapidly grow and diversify.  To ensure an outreach and training effort which can proactively market Grid services to new research communities in academia and industry, capture new e-Science requirements for the middleware and service activities, and provide the necessary education to enable new users to benefit from the Grid infrastructure.

5 EGEE Implementation From day 1 (1 st April 2004) Production grid service based on the LCG infrastructure running LCG-2 grid middleware (SA) LCG-2 will be maintained until the new generation has proven itself (fallback solution) In parallel develop a “next generation” grid facility (JRA) Produce a new set of grid services according to evolving standards (Web Services) Run a development service providing early access for evaluation purposes Will replace LCG-2 on production facility in 2005 Globus 2 basedWeb services based EGEE-2EGEE-1LCG-2LCG-1 EDGVDT... LCG EGEE...AliEn

6 EGEE Activities

7 Orientation EGEE includes 11 activities Services  SA1: Grid Operations, Support and Management  SA2: Network Resource Provision Joint Research  JRA1: Middleware Engineering and Integration  JRA2: Quality Assurance  JRA3: Security  JRA4: Network Services Development Networking  NA1: Management  NA2: Dissemination and Outreach  NA3: User Training and Education  NA4: Application Identification and Support  NA5: Policy and International Cooperation Equivalent EDG Work Packages / Groups WP6 WP7 WP1-5 & 6 QAG Security Group WP7 WP12 WP11 WP8-10 ?

8 Services Activities SA1 : Grid Operations & Support  Objectives: Create & operate a production quality infrastructure 48 partners, approx 45% of total project budget regional structure Builds on the existing LCG infrastructure to provide expanded grid facility for many application domains SA2 : Network Resource Provision  Objectives: Ensure EGEE access to network services provided by GEANT and the NRENs to link users, resources and operational management 3 partners, approx 1.5% of total project budget Most work will be associated with defining SLR/S/As

9 Joint Research Activities JRA1: Middleware Engineering and Integration  Objectives Provide robust, supportable middleware components Integrate grid services to provide a consistent functional basis for the EGEE grid infrastructure Verify the middleware forms a dependable and scalable infrastructure that meets the needs of a large, diverse eScience user community  5 partners, approx 16% of total project budget  Middleware design team active  Core software team has been working quickly to produce the design of an initial prototype Taking input from HEP ARDA project as well as final requirements/assessments from EDG project  Initial prototype foreseen at end of April Not all services implemented, not for general distribution  EDG testbed infrastructure being reused for JRA1 clusters

10 Joint Research Activities (II) JRA4: Network Services Development  Objectives : Network oriented joint research to provide end-to-end services Network reservation, performance monitoring and diagnostics tools Explore links to how Grid resources are organise/allocated Investigation of potential impact IPv6 on grids  5 partners, approx 2.5% of total project budget  Tight collaboration with DANTE and the NRENs, especially through future GN2 project and potential network oriented FP6 projects JRA3: Security  Objectives : Enable secure European Grid infrastructure operation Overall security architecture and framework Policies to be adopted by other EGEE activities (middleware, operations etc.)  5 partners, approx 3% of total project budget JRA2: Quality Assurance  Objectives: Foster production & delivery of quality Grid software & operations. 2 partners, approx 2% of total project budget Many procedures and guidelines already defined

11 Networking Activities NA4: Application identification and support  Objectives: Identify and support a broad range of applications from diverse domains, starting with the pilot domains: HEP and Biomedical 20 partners, approx 12.5% of total project budget ARDA project is interface with HEP applications Initial BMI applications identified Industrial forum set-up in a self-financing mode NA3: User Training and Induction  Objectives: Develop training programme addressing beginners and advanced users. Internal EGEE induction courses. 22 partners, approx 4% of total project budget  Plans for initial training courses well advanced Will be able to offer training in the summer on dedicated infrastructure NA2: Dissemination and Outreach  Objectives: Disseminate the benefits of the EGEE infrastructure to new user communities  20 partners, approx 5% of total project budget

12 Who’s who G. Zaquine A. Aimar I. Bird C. Vistoli F. Hemmer E.Laure M. Aktinson J. Dyer V. Breton F. Harris J-P. Gautier J. Orellana A. Edlund ? F. Gagliardi B. Jones A. Blatecky E. Jessen T. Priol D. Snelling

13 JRA1 Middleware (Re-)Engineering & Integration

14 JRA1 : Organisation

15 Software Clusters Tools, Testing & Integration (CERN) clusters Development clusters:  UK  CERN  IT/CZ  Nordic Clusters have a reasonable sized (distributed) development testbed  Taken over from EDG  Nordic cluster to be finalized Link with Integration & Tools clusters established Clusters up and running! Nordic (security) cluster  JRA3

16 Design Team Formed in December 2003 Current members:  UK: Steve Fisher  IT/CZ: Francesco Prelz  Nordic: David Groep  VDT: Miron Livny  CERN: Predrag Buncic, Peter Kunszt, Frederic Hemmer, Erwin Laure Started service design based on component breakdown defined by the LCG ARDA RTAG Leverage experiences and existing components from AliEn, VDT, and EDG. A working document Overall design & API’s https://edms.cern.ch/document/458972 Basis for architecture (DJRA1.1) and design (DJRA1.2) document

17 Guiding Principles Lightweight (existing) services  Easily and quickly deployable Interoperability  Allow for multiple implementations Resilience and Fault Tolerance Co-existence with deployed infrastructure  Run as an application Service oriented approach  Follow WSRF standardization  No mature WSRF implementations exist to date, hence: start with plain WS – WSRF compliance is not an immediate goal  Review situation end 2004

18 High Level Service Decomposition Taken from the ARDA blueprint Nordic CERN UK IT/CZ Some services have no clear attribution to cluster (according to TA) Some services involve collaboration of multiple clusters

19 Initial Focus Data management  Storage Element SRM based; allow POSIX-like access Workload management  Computing Element Allow pull and push mode Information and monitoring Security  Need to integrate components with quite different security models  Start with a minimalist approach based on VOMS and myProxy

20 Storage Element ‘Strategic’ SE  High QoS: reliable, safe..  Has usually an MSS  Place to keep important data  Needs people to keep running  Heavyweight ‘Tactical’ SE  Volatile, ‘lightweight’ space  Enables sites to participate in an opportunistic manner  Lower QoS strategic tactical QoS Portability

21 Storage Element Interfaces SRM interface  Management and control  SRM (with possible evolution) Posix-like File I/O  File Access  Open, read, write  Not real posix (like rfio) SRM interface rfiodcapchirpaio Castor dCacheNeST Disk POSIX API File I/O Management User

22 Catalogs File Catalog  Filesystem-like view on logical file names Replica Catalog  Keep track of replicas of the same file (Meta Data Catalog)  Attributes of files on the logical level  Boundary between generic middleware and application layer

23 Files and Catalogs Scenario File Catalog Metadata Catalog Replica Catalog LFNGUID Master SURL SURL Metadata

24 Computing Element Layered service interfacing  various batch systems (LSF, PBS, Condor)  Grid systems like GT2, GT3, and Unicore CondorG as queuing system on the CE  Allows CE to be used in push and pull mode Call-out module to change job ownership (security) Lightweight service  should be possible to dynamically install e.g. within an existing globus gatekeeper Globus gatekeeper Local batch queue GT2, GT3, Unicore CE CondorG AliEnCE EDG Broker Task Queue Change UID

25 Information Service Adopt a common approach to information and monitoring infrastructure. There may be a need for specialised information services  e.g. accounting, package management, grid information, monitoring, provenance, logging  these may be built on an underlying information service A range of visualisation tools may be used Using R-GMA

26 Authentication/Authorization Different models and mechanisms Authentication based on Globus/GSI, AFS, SSH, X509, tokens Authorization  AliEn: exploits mechanism of RDBMS backend  EDG: gridmap file; VOMS credentials and LCAS/LCMAPS  VDT: gridmap file; CAS, VOMS (client) Security and protection at a level acceptable by fabric managers and end users needs to be discussed and “blessed” in advance.

27 A minimalist approach to security Need to integrate components with quite different security model Start with a minimalist approach  Based on VOMS (proxy issuing) and myProxy (proxy store)  User stores proxy in myProxy from where it can be retrieved by access services and sent to other services  Credential chain needs to be preserved Allow service to authenticate client  Local authorization could be done via LCAS if required  User is mapped to group accounts or components like LCMAPS are used to assign local user identity

28 Towards a prototype Focus on key services discussed; exploit existing components Initially an ad-hoc installation at Cern and Wisconsin Aim to have first instance ready by end of April  Open only to a small user community  Expect frequent changes (also API changes) based on user feedback and integration of further services Enter a rapid feedback cycle  Continue with the design of remaining services  Enrich/harden existing services based on early user-feedback Access service:  AliEn shell, APIs Information & Monitoring:  R-GMA CE:  AliEn CE, Globus gatekeeper, CondorG Security:  VOMS, myProxy Workload mgmt:  AliEn task queue; SE:  SRM (Castor), GridFTP, GFAL, aoid File Transfer Service:  AliEn FTD File and Replica Catalog:  AliEn File Catalog, RLS Initial prototype components for April’04 To be extended/ changed (e.g. WMS) This is not a release! It’s purely an ad-hoc installation

29 Planning Evolution of the prototype  Envisaged status at end of 2004: Key services need to fulfill all requirements (application, operation, quality, security, …) and form a deployable release Remaining services available as prototype  Need to develop a roadmap Incremental changes to prototype (where possible) Early user feedback through ARDA and early deployment on SA1 pre- production service Detailed release plan being planned  Converge prototype work with integration & testing activities Need to get rolling now! First components will start using SCM in May

30 Integration A master Software Configuration Plan is being finalized now It contains basic principles and rules about the various areas of SCM and Integration (version control, release management, build systems, bug tracking, etc) Compliant with internationally agreed standards (ISO 10007-2003 E, IEEE SCM Guidelines series) Most EGEE stakeholders have already been involved in the process to make sure everybody is aware of, contributes to and uses the plan An EGEE JRA1 Developer's Guide will follow shortly in collaboration with JRA2 (Quality Assurance) based on the SCM Plan It is of paramount importance to deliver the plan and guide as early as possible in the project lifetime

31 Testing The 3 initial testing sites are CERN, NIKHEF and RAL  More sites can join the testing activity at a later stage !  Must fulfil site requirements Testing activities will be driven by the test plan document Test plan being developed based on user requirements documents:  Application requirements from NA4: HEPCAL I&II, AWG documents, Bio- informatics requirements documents from EDG  Deployment requirements being discussed with SA1  ARDA working document for core Grid services  Security: work with JRA3 to design and plan security testing The test plan is a living document: it will evolve to remain consistent with the evolution of the software Coordination with NA4 testing and external groups (e.g. Globus) established Solid steps towards MJRA1.3 (PM5)

32 Convergence with Integr & Tstg Development clusters need to get used to SCM During May, initial components of the prototype need to follow SCM  Proposed components: R-GMA VOMS RLS GFAL (is this 3 rd party?) SRM (will there be an EGEE implementation or just 3 rd party?) New developments need to follow SCM from the beginning ISSUE: perl modules seem not to fit well

33 Convergence with Integr & Tstg II IT/CZ  Put EDG code under SCM for training purposes and be prepared to move components to EGEE when needed  VOMS in May UK  Full R-GMA under SCM in May CERN/DM  RLS in May  GFAL ?

34 Development Roadmap Prototype work as starting point Priorities need to be adjusted based on user feedback Incremental, frequent releases All discussions and decisions take place in the design team  Project-wide body being formed to oversee this activity PTF  Project Technical Forum Boundary conditions:  Architecture document due end of Month 3 (June)  Design document due end of Month 5 (August)

35 JRA1/SA1 - Process description No official delivery of requirements from SA1 to JRA1 stated in the TA The definition, discussion and agreement of the requirements has already started, done through dedicated meetings This is an ongoing process:  Not all the requirements defined yet  Set of requirements agreed, need basic agreement to start working! But can be reviewed at any time there is a valid reason for it

36 JRA1/SA1 - Requirements 1. Middleware delivery to SA1 2. Release management 3. Deployment scenarios 4. Middleware configuration  JRA1 will provide a standard set of configuration files and documentation with examples that SA1 can use to design tools. Format to be agreed between SA1-JRA1  It is the responsibility of SA1 to provide configuration tools to the sites 5. Enforcement of the procedures 6. Platforms to support Primary platform: Red Hat Enterprise 3.0, gcc 3.2.3 and icc8 compilers (both 32 and 64- bits). Secondary platform: Windows (XP/2003), vc++ 7.1 compiler (both 32 and 64-bits) 7. Versions for compilers, libraries, third party software 8. Programming languages 9. Packaging and software distribution 10. Others Sites must be allowed to organize the network as they wish, internal or external connectivity, NAT, firewall, etc, all must be possible, no special constraints. WNs must not require Outgoing IP connectivity; Not inbound connectivity either.

37 JRA1/JRA3 A lot of progress has been achieved here  Security Group formed, JRA1 members identified  First meeting scheduled on May 5-6, 2004 GAP analysis planned by then  VOMS Administration support clarified Handled by JRA3  Issue: VOMS effort reporting

38 JRA1/JRA4 SCM plan presented and discussed More discussions on which components of JRA4 will be required in the overall architecture/design need to take place

39 American Involvement in JRA1 UWisc  Miron Livny part of the design Team  Condor Team actively involved in reengineering resource access In collaboration with Italian Cluster ISI  Identification of potential contributions started (e.g. RLS) Focused discussions being planned Argonne  Collaboration on Testing started  Support for key Globus Components enhancements being discussed

40 JRA1 and other activities NA4  HEP: ARDA project started; ensures close relations between HEP and middleware  Bio: activities with similar spirit needed – focused meeting tentatively being planned for May SA1  Revision of requirements (platforms) JRA2  QAG started  Monthly meeting established JRA3  Necessary structures established  Focused Meeting in May JRA4  Architectural components required need to be clarified Other projects  Potential drain of resources for dissemination activities

41 The End

42

43 UK Cluster R-GMA  Interface to various graphical tools  Monitoring largely driven by application and infrastructure needs  Information system: clarify role in job-submission/data mgmt cycles (e.g. role of GLUE)  Interface to other monitoring systems (e.g. Grid3)  Understand R-GMA role in Accounting Job provenance Logging & bookkeeping …

44 IT/CZ Cluster Resource Access (aka ‘CE’)  Interface to various batch systems  Starting with the integration of CondorG  JobFetcher (implementing ‘pull’ model)  Site policy mgmt, enforcement, and advertisement WMS  High level optimizer components at TaskQueue Matchmaking Job adjustment  VO policy management and enforcement  TaskQueue interactions

45 IT/CZ Cluster II Accounting  LCG accounting system (usage records) has to be considered  Role of DGAS needs to be understood L&B  Assessment of its role in Accounting Job provenance …  Relationship to R-GMA VOMS  Relationship to JRA3 Integration into Access Service

46 CERN/DM Cluster SE  Posix-like file I/O GFAL/aio relationship  SRM interface Will EGEE provide an implementation; will we ship an implementation; will we just make it a requirement? Space reservation not in v1.1 – migration path to v2.1? File Catalog  Schema evolution/customization to different user-groups  Server implementation  Metadata catalog interaction

47 CERN/DM Cluster II Replica Catalog  Deployment model (wrt File Catalog)  Schema evolution  Distributed Catalog Metadata Catalog  Mostly in application domain File Transfer Service  Overlaps with WMS/CE Local and global (VO) policy enforcement Error handling and recovery; transaction handling and boundaries; load-balancing and fail-over modes Upgrade resilience Data subscription service  GDMP functionality  How does it relate to FTS Integration into Access Service


Download ppt "EGEE is a project funded by the European Union under contract IST-2003-508833 EGEE Middleware Robin Middleton with much (most) material from Bob Jones."

Similar presentations


Ads by Google