Databases and the Grid OGSA-DAI Architecture & Requirements Malcolm Atkinson OGSA-DAI Chief Architect Director of National e-Science Centre 30 th May 2002 OGSA Early Adopters Workshop Argonne National Laboratories
Overview UK e-Science Scale, Coordination, Structure, Projects Database Task Force & GGF DAI-WG OGSA-DAI Project Scope, Scale, Participants, Plans Architecture Relationship with OGSA Requirements
£80m Collaborative projects E-Science Steering Committee DG Research Councils Director Directors Management Role Directors Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme Tony Hey
Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton UK Grid Network AccessGrid always-on video walls National e- Science Centre
NeSCs Roles TAG NeSC eSI GSC Application PilotsIRCs …e-Science Centres e-Scientists, Grid users, Grid services & Grid Developers UK Core DirectorateGlobal Grid Forum … CS Research DBTF ATF GNT Coordination, Stimulation & Education ETF STF
UK Architectural Task Force (ATF) Malcolm Atkinson (NeSC)Geof. Coulson (Lancaster U.) Jon Crowcroft (Cambridge U.)David De Roure (Southampton U.) Vijay Dialani (Southampton U.)Andrew Herbert (Microsoft) Ian Leslie (Cambridge U.)Andrew Martin (Oxford U.) Ken Moody (Cambridge U.)Steven Newhouse (ICSTM & LeSC) Tony Storey (IBM) …………… Plus consultations UK Role in Open Grid Services Architecture, Version th March teams ATF Obtained Agreement: OGSA as Foundation for UK work, 18 April 2002
e-Science Institute
National e-Science Centre Edinburgh + Glasgow Universities Physics & Astronomy 2 Informatics, Computing Science EPCC £6M EPSRC/DTI + £2M SHEFC over 3 years e-Science Institute visitors, workshops, co-ordination, outreach middleware development 50 : 50 industry : academia last-mile networking e-Science Institute visitors, workshops, co-ordination, outreach middleware development 50 : 50 industry : academia last-mile networking
UK Pilot Projects Research Councils Autonomy > 30 Projects $5 million to $0.3 million Wide Range of Disciplines Industrial Involvement Integration and Access to Information e-Science Centre Projects > 50% Industrial Involvement
Equator: Technological innovation in physical and digital life AKT: Advanced Knowledge Technologies DIRC: Dependability of Computer-Based Systems MIAS: From Medical Images and Signals to Clinical Information IRC Grand Challenge Projects
Particle Physics and Astronomy e-Science Projects GridPP links to EU DataGrid, CERN LHC Computing Project, US GriPhyN and PPDataGrid Projects, and iVDGL Global Grid Project AstroGrid links to EU AVO and US NVO projects OGSA-DAI Early Adopter
Comb-e-Chem:Structure-Property Mapping Southampton, Bristol, Roche, Pfizer, IBM DAME: Distributed Aircraft Maintenance Environment York, Oxford, Sheffield, Leeds, Rolls Royce Reality Grid: A Tool for Investigating Condensed Matter and Materials QMW, Manchester, Edinburgh, IC, Loughborough, Oxford, Schlumberger, … EPSRC e-Science Projects (1)
EPSRC e-Science Projects (2) MyGrid: Personalised Extensible Environments for Data Intensive in silico Experiments in Biology Manchester, EBI, Southampton, Nottingham, Newcastle, Sheffield, GSK, Astra-Zeneca, IBM, Sun GEODISE: Grid Enabled Optimisation and Design Search for Engineering Southampton, Oxford, Manchester, BAE, Rolls Royce Discovery Net: High Throughput Sensing Applications Imperial College, Infosense, … OGSA-DAI Early Adopter
MyGrid e-Science Workbench Goal is to develop workbench to support: Experimental process of data accumulation Use of community information Scientific collaboration Provide facilities for resource selection, data management and process enactment Bioinformatics applications Functional genomics, pattern database annotation Manchester, EBI, Newcastle,Nottingham, Sheffield, Southampton GSK, AstraZeneca, Merck, IBM, Sun,...
DBTF Web Pages
DBTF Membership Malcolm Atkinson (NESC) Vijay Dialani (Southampton University) Norman Paton (Manchester University) Dave Pearson (Oracle UK) Tony Storey (IBM Hursley) Paul Watson (Newcastle University)
DBTF: Aims & Actions Requirements Capture Pilot Project Meetings Report Dave Pearson Roadmap UK Coordination GGF Articulation Standards BoF GGF4 Papers GGF5 Implementation Projects OGSA-DAI Architecture Liase with ATF Liase with Globus team Education e-Science Institute Pilot Projects GSC Evolving GGF DAIS WG Broader community
Overview UK e-Science Scale, Coordination, Structure, Projects Database Task Force & GGF DAI-WG OGSA-DAI Project Scope, Scale, Participants, Plans Architecture Relationship with OGSA Requirements
Cambridge Oxford Glasgow Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton OGSA-DAI Partners EPCC & NeSC Newcastle IBM USA IBM Hurseley Oracle Manchester EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle $5 million, 18 months, started 1 st February 2002
OGSA-DAI Scope Definition and development of generic Grid data services which provide access to and integration of data held in databases, and the management of data within a distributed environment. Database A stored, structured collection of data Accessed using an API that takes account of the structure of the data stored Includes Relational and object databases XML repositories Adequately described collections of files
Databases in the Grid Computational Complexity Data Complexity
Scope of Database Services Discovery of Data by Content Query and Update Statements Metadata Management & Evolution Transactions (Flavours of) Distributed queries and updates Specialised types Encapsulated (safe) Function application Notification (driven by triggers, etc.)
OGSA-DAI Objectives Produce specifications for generic data services based on a common design framework consistent with Open Grid Service Architecture Design specifications as basis of standards recommendations via Database Access and Integration Services Working Group to the Global Grid Forum Deliver Grid data services software in future releases of the Globus Toolkit (GT3 December 2002) Refine identified requirements evaluate design options develop demonstrators transfer skills to the Grid community Develop reference implementations of generic data services Ensure that the Grid model and OGSA standards address fully the needs of data access and integration Ensure Grid data services meet the levels of service required performance, scalability, resilience, availability, and manageability evolution and distribution large user populations and large data volumes
OGSA-DAI Plan Two Phases Phase 1: Started Feb 02 ends GGF5 Detailed Plan – Requirements, Designs & Prototypes 6 Work Packages Project Management (Oracle, EPCC) Architecture (NeSC, DBTF) XML Data Management (NeSC & EPCC) Distributed Query Systems (Manchester & Newcastle) Metadata & Registries (NeSC & EPCC) Relational Databases (IBM UK) Phase 2: 12 months Structure and Objectives to be Refined in Major Review GGF5 DAIS WG meeting a major input
OGSA-DAI Time Line Feb 02May 02Jul 02Sep 02Dec 02Feb 03May 03Sep 03 Ship for GT3 Integration RDB + GT2 / OGSA Prototypes Available XML + OGSA Prototype Available Design Documents & Demos for DAIS GGF5 RDB + GT2 / OGSA Prototypes for Early Adopters XML + OGSA Prototypes for Early Adopters WS + GSI UK support ( > 60 downloads) Phase 2 Starts Phase 1 Starts
Milestones & Deliverables 3 rd Jul 2002 GGF 5 Deliverables 1st Draft – OGSA-DAI Design Specification Working Grid data service prototype with workshop material Draft Phase 2 functional scope for each Work Package 30 th Sept 2002 End Phase 1 Phase 1 Review Report and recommendations including: revisions to Phase 2 streams of work, Work Package structure, content, and scope Completed, Tested, Work Package prototypes with evaluation report detailing functional scope and deficiencies, design options, measures for acceptance RDBMS/Globus-2 prototype implementation Phase 2 scope Agreed 2 nd Draft – OGSA-DAI design specification Dissemination programme for UK e-Science community Transition programme for UK Grid Support Team and Globus Development Team 31 st Dec 2002 Globus Toolkit Release 1 st Grid data services reference implementation for Globus Toolkit 3 1 st Grid data services specification for Globus Toolkit 3 Scope of functional content for 2 nd Globus Toolkit release and specification 1 st release training and support courses 31 st Mar 2003 Interim UK e-Science community release Interim Grid data services implementation for UK e-Science community Release training and support courses, with documentation 31 st Jul 2003 Globus Toolkit Release 2 nd Grid data services reference implementation for Globus Toolkit 3 2 nd Grid data services specification for Globus Toolkit 3 2 nd release training and support courses Publications and papers to support reference implementations through WG discussions and GGF standards processes Final Project Report
OGSA-DAI: Key Components Grid Database Services (GDS) GXDS, GRDS, GSFDS, … Perform DB actions Extra Data Service Elements DB-action-Management Functions Notifications from Triggers Grid Database Service Factories (GDSF) Create the above Extra Data Service Elements Database Service Registries (DSR) Specialised Registries to find DBs, Services & Factories Grid Data Transfer Services (GDTS) Described at Requirement Level Flexible & mapped to grid-FTP, MQ Series, …
OGSA-DAI Architecture 1 request for factory DSR GDSF client
OGSA-DAI Architecture 2 response with GDSFs GSHs 1 request for factory DSR GDSF client
OGSA-DAI Architecture 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF client
4 creation of 3 GDSs OGSA-DAI Architecture 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client
4 creation of 3 GDSs OGSA-DAI Architecture 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client
4 creation of 3 GDSs OGSA-DAI Architecture 6 scripts requesting DB actions 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client
4 creation of 3 GDSs OGSA-DAI Architecture 6 scripts requesting DB actions 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client 7 transfer data batch to GDS 2 stream to GDS 3
4 creation of 3 GDSs OGSA-DAI Architecture 6 scripts requesting DB actions 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client 7 transfer data batch to GDS 2 stream to GDS 3 8 stream data to GDS 2
4 creation of 3 GDSs OGSA-DAI Architecture 6 scripts requesting DB actions 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client 9 transfer data batch to client 7 transfer data batch to GDS 2 stream to GDS 3 8 stream data to GDS 2
4 creation of 3 GDSs OGSA-DAI Architecture 6 scripts requesting DB actions 5 response with 3 GSHs 2 response with GDSFs GSHs 1 request for factory 3 script for 3 GDSs DSR GDSF GDS 1 GDS 2 GDS 3 client 9 transfer data batch to client 7 transfer data batch to GDS 2 stream to GDS 3 8 stream data to GDS 2 10 stream data to specified destination
OGSA-DAI & OGSA <((-:} Description, e.g. portType Works Well Adding only one portType / GDS(F) | DSR Expect to make extensive use of Data Service Elements Special to DBs: Static & Dynamic Component Management Notification Grid-FTP Accounting Security: Authentication, Authorisation & Privacy Reliable invocation …
OGSA-DAI & OGSA <))-:} Lifetime Issues Conditions for termination Controlled clean-up opportunity Scope of State Evolution Notification Issues Registering & using same notification system For DBs, e.g. triggers do we have to construct a dummy Service Data Element? Type System Issues Standards needed for wide range of types Service Definition Issues How to create / obtain standard definitions for common services
OGSA-DAI Summary On Schedule & Going Well Expect Contributions via GGF5 Expect Contributions to GT3 Releases Early Days Testing Architectural Design Using OGSA Working with Early Adopter Pilot Projects AstroGrid & MyGrid Planned release of prototypes Influence OGSA-DAI direction Via DAIS-WG